The first sentence of this BBC story about Intel's profits took me slightly aback:
For the three months to 27 March, the Californian-based company made a profit of $1.7bn, almost double the $915m recorded for the same period in 2003.
Shouldn't that be "California-based"? I thought to myself.
Checking relative frequencies on the web, I got an answer: 416,000 ghits for "California-based" vs. 2,580 for "Californian-based". 99.4% of the web agrees with my judgment -- and also with grammar and logic, it seems to me, since "X-based" should be a compositional compound noun, meaning "based in (or on) X", where X is a noun. Nobody would say "based in Californian." QED. The feeble 0.6% are just confused, I thought smugly. Perhaps they are attracted by the irrelevant analogy of other adjective-noun sequences. So much for the Beeb, how the mighty have fallen, etc.
But wait a minute, said the still small voice of conscience. How about "European-based"? Doesn't that sound just as good as "Europe-based", or maybe even better? Checking the web, I found 42,600 ghits for "Europe-based" vs. 60,700 for "European-based": 41% for the noun, 59% for the adjective. Even-steven from a grammatical point of view (though an adjectival landslide in electoral terms!)
And looking at the next few examples of relevant noun/adjective pairs that occurred to me makes the picture even murkier. "Boston-based" is 80,000 times commoner than "Bostonian-based", but "Canada-based" is about 34% less common than "Canadian-based", and so on:
noun |
adjective |
ratio |
|
Athens/Athenian | 6,460 |
11 |
587 |
Boston/Bostonian based | 240,000 |
3 |
80,000 |
California/Californian based | 416,000 |
2,580 |
161 |
Canada/Canadian based | 70,300 |
94,400 |
0.745 |
China/Chinese based | 39,000 |
7,450 |
5.25 |
Egypt/Egyptian based | 4,920 |
4,520 |
1.09 |
Europe/European based | 42,600 |
60,700 |
0.702 |
France/French based | 24,800 |
29,100 |
0.852 |
Germany/German based | 44,800 |
44,300 |
1.01 |
Greece/Greek based | 3,320 |
2,970 |
1.12 |
Ireland/Irish based | 34,400 |
16,300 |
2.11 |
Israel/Israeli based | 20,100 |
6,750 |
2.98 |
Japan/Japanese | 43,800 |
8,940 |
4.90 |
Korea/Korean based | 14,900 |
5,680 |
2.62 |
Latvia/Latvian | 558 |
250 |
2.32 |
Nigeria/Nigerian based | 2,070 |
853 |
2.43 |
Norway/Norwegian based | 8,400 |
3,800 |
2.21 |
Paris/Parisian based | 91,000 |
297 |
306 |
Pennsylvania/Pennsylvanian based | 45,100 |
38 |
1,187 |
Russia/Russian based | 10,400 |
6,600 |
1.58 |
Scotland/Scottish based | 31,100 |
28,900 |
1.08 |
Tunisia/Tunisian based | 336 |
130 |
2.59 |
Turkey/Turkish | 4,840 |
1,570 |
3.08 |
Vienna/Viennese based | 21,000 |
32 |
656 |
(Some of these should probably be removed from consideration, at least pending reanalysis, because the "adjective" forms are really nouns much of the time, as in "Greek-based" meaning "based on the Greek language". I don't think this will change the overall picture much. It's possible that a more careful accounting for other sense differences and other details of semantic relationships would clear things up, but I doubt it.)
Adding it all up, it's about 79% for the nouns, 21% for the adjectives. A victory for logical grammar, but hardly a resounding one. There are several pockets of stalwart adjectival resistance (or craven concession to adjectival irrationality?): Europe, France, Canada, at .70, .85, .75 noun/adjective ratios respectively. Germany is on the edge at 1.01.
Seriously, it's clear that different place-names are behaving differently here. What's the principle, if any? Word length? Unigram (word) frequency? Longitude? Affix? Country vs. City? Few of my first few hypotheses are even true, and none of them explain much of the variance.
And what if we picked a different head noun, such as "X-oriented" or "X-bound" or "X-educated"? Would the statistics be similar, or different?
And does all this have anything to do with the compound nouns that don't involve a de-verbal head at all, but are created by adding "-ed" to a modified noun, as in "red haired"? In other words, is the construction (at least sometimes and for some people) [[Canadian base]+ed] ? If so, does that offer any traction in explaining the enormous variation in usage statistics sketched above? I don't see how, but at least it would provide a grammatically and logically plausible analysis for such phrases.
Then again, maybe I'm just being old-fashioned in expecting a coherent compositional account of how regular phrasal patterns acquire their form and meaning, as opposed to the currently-spreading view that "our interpretive capacities take into account holistic informational characteristics of linguistic constructions and don't simply generate meanings by way of 'bottom up' recursion principles."
Posted by Mark Liberman at May 15, 2004 07:38 AM