February 23, 2006

W's Conundrum

This all started with bit of presidential mis-morphology:

And I want those who are questioning it to step up and explain why all of a sudden a Middle Eastern company is held to a different standard than a Great British [sic] company.

W came up with the wrong answer to the question "what's the form for Great Britain as a modifier?", but the right answer is by no means obvious.

As a modifier of company, the English language sometimes prefers to use place names in their base form, instead of using the corresponding adjective. Thus the string "Pennsylvania company" has 16,078 MSN hits ("mhits"?), while "Pennsylvanian company" has only 30. However, this is emphatically not true in certain cases. Consider "a French company" with 66,807 mhits, compared to "*a France company" with only 132. Likewise we expect "a German company", "a Russian company", "a Chinese company" -- not "a Germany company", "a Russia company", "a China company".

So there seems to be a general rule: when the place name is the name of a country, always use the adjectival form. But where the name of a country doesn't have any adjectival form, we're normally fine with using the name as a modifier: "a UK company" has 97,286 mhits, and there's likewise nothing wrong with "a U.S. company" (80,732 mhits), or "a Cayman Islands company" (2,510 mhits).

But what about Great Britain? There's no directly corresponding adjective, it seems -- "Great British" is (alas for W) a bad joke. And yet the base form of the name is not a plausible modifier: "a Great Britain company" garners 0 mhits, though Google manages to find a paltry 49.

One hypothesis would be that words like UK and US can be zero-derived adjectives as well as nouns, while Great Britain can't, presumably because of blocking by British. But "Great British" is also ruled out, perhaps because British is in fact the (irregular) adjectival form of "Great Britain" as well as "Britain", or perhaps for some other reason -- though Adj+N place names generally accept adjectivization of the N part with good grace: "West Virginian", "Northern Irish", "East Anglian", etc.

An alternative hypothesis would be that this is just another set of quasi-regular facts about English morphology.

On the subject of weird facts, note this asymmetry in mhits:

(with periods) mhits (without periods) mhits with/without ratio
"a U.K. company" 4,867 "a UK company" 97,286 0.05
"a U.S. company" 80,732 "a US company" 73,898 1.09

In the context "a __ company", U.S. is 22 times more likely to retain its periods than U.K. is. What's up with that?

It's not a Europe/U.S. difference, or at least not entirely, since American initialisms vary widely in this measure.

(with periods)
mhits
(without periods)
mhits
with/without ratio
U.C.L.A.
100,415
UCLA
3,765,828
0.03
A.C.L.U.
922,808
ACLU
809,865
1.14
N.C.A.A.
22,384
NCAA
8,949,487
0.003
U.S.A.
16,891,438
USA
209,320,539
0.08

It might well involve a desire to avoid confusion with a capitalized version of the pronoun us, but again, the difference between ACLU and NCAA must be simply a matter of convention, so it's hard to be sure that the treatment of U.S. isn't purely conventional as well.

And one more strange fact:

phrase
mhits
phrase
mhits
"a U.S. company"
76,860
"A US company"
74,034
"a U.S.A. company"
297
"a USA company"

5,597

Personally, I feel that U.S.A. (with or without periods) is not a possible modifier in this case. And the web agrees with me, 96.2% of the time anyhow. Why in the world should there be anything wrong with "a USA company" as an instantiation of the "a <country name> company" pattern?

(If you're not already sick of the quirks of toponymic morphosyntax, take a look at the 5/15/2004 post "All your base are belong to which lexical category".)

[To forestall anguished emails, let me say that I'm aware that the correct solution to W's conundrum would have been to use the form British (or maybe to pick another phrasing, like "a company based in Great Britain".) The point is that this particular fact of usage apparently has to be learned by rote -- if you try to solve the puzzle by rule, or by intelligent inference from other examples, you're very likely to get an ambiguous or flat-out wrong answer. Many of W's reported morphological miscues are over-generalizations, apparently substituting intelligent guesswork for rote memorization.]

Posted by Mark Liberman at February 23, 2006 04:01 PM