February 29, 2004

Laura Miller on Nushu at Keywords

Laura Miller, writing as an "invited author" on the excellent weblog Keywords, has a terrific post about the Nushu writing system. She debunks a recent Washington Post article that "is representative of the type of misunderstandings that continues to be perpetuated in reporting on this topic". But she also gives a lucid account of a fascinating subject, where the persistence of misunderstandings is only one of the reasons to be interested. Read the whole thing!

Posted by Mark Liberman at 10:35 PM

Enough said

Today's NYT magazine has very, very strange article by Jack Hitt about language death, language preservation and attempts at language revival in Patagonia.

The oddest thing about this article is how completely uninterested the author seems to be in Qawasqar, the language whose death he is -- well, sort of languidly observing, more than reporting on. In the whole 4700-word article, the only concrete example of the language is presented in this passage:

In time, I got to hear some actual Kawesqar spoken, and it sounded a lot like Hollywood's generic Apache, but with a few unique and impossible sounds. I learned to say ''Æs ktæl sa Jack, akuókat cáuks ktæl?'' (''My name is Jack, what's yours?'') That second word, ktæl, means ''name'' and is (sort of) pronounced ka-tull. It happens entirely in the back of the mouth, in a really challenging way.

There is no other vocabulary discussed, nothing else about the sound inventory or syllable structure, nothing about the morphology or syntax except for a brief and completely illogical fairy-tale -- without any original-language examples -- about how "the Kawesqar's nomadic past" made the future tense rare, since "given the contingency of moving constantly by canoe, it was all but unnecessary", whereas they have several past tenses, so that "you can mean a few seconds ago, a few days ago, a time so long ago that you were not the original observer ... and, finally, a mythological past". Speaking for myself, if I were compelled to move constantly around by canoe along the coast of Patagonia, I suspect I'd be more interested in discussing where to find our next limpet, or whether we could make it around the point before the storm hit, than in making fine distinctions about more or less remote limpets and sleet-storms past. Hitt attributes this little just-so story to (unnamed) "general linguists", and he seems to accept the explanation while remaining unconvinced that it matters enough to motivate him to figure out how to express the various "tenses".

We seem to be meant to infer that really, there is nothing interesting about this language. At least it's clear that the writer has no real interest in it. Beyond these few fragments of careless linguistic description, we get several paragraphs about the author's tobacco consumption habits, sprinkled in among desultory descriptions of old people not speaking Kawesqar, odd quotations (or misquotations) from various linguists, fragmentary historical references, and musings about what languages might be good for, anyhow. Oh, and at the end, as one more cigarette is lit, we do learn that the word for match is "fire", borrowed from English.

The author's lack of curiosity is not limited to languages. Here's one of his historical asides:

When Charles Darwin first encountered the Kawesqar and the Yaghans, years before he wrote ''The Origin of Species,'' he is said to have realized that man was just another animal cunningly adapting to local environmental conditions. But that contact and the centuries to follow diminished the Kawesqar, in the 20th century, to a few dozen individuals.

What's this "he is said to have realized"? Darwin wrote about Patagonia and Tierra del Fuego at length in chapter 9 and chapter 10 of The Voyage of the Beagle, as Hitt could have discovered in 30 seconds with Google.

Dawin doesn't describe any contact with indigenous peoples of the region he refers to as Patagonia (in chapter 9), but he did meet members of several groups of Tierra del Fuegans, and discusses his experiences, impressions and reflections at length in chapter 10. I think that this is the passage that Hitt's "he is said to have realized" is probably referring to:

Whilst beholding these savages, one asks, whence have they come? What could have tempted, or what change compelled a tribe of men, to leave the fine regions of the north, to travel down the Cordillera or backbone of America, to invent and build canoes, which are not used by the tribes of Chile, Peru, and Brazil, and then to enter on one of the most inhospitable countries within the limits of the globe? Although such reflections must at first seize on the mind, yet we may feel sure that they are partly erroneous. There is no reason to believe that the Fuegians decrease in number; therefore we must suppose that they enjoy a sufficient share of happiness, of whatever kind it may be, to render life worth having. Nature by making habit omnipotent, and its effects hereditary, has fitted the Fuegian to the climate and the productions of his miserable country.

This is one of many hints about selection and adaptation in Darwin's memoir of his voyage, which was indeed written and published quite a while before he wrote the Origin of Species.

Just before the passage quoted above, Darwin writes:

The different tribes have no government or chief; yet each is surrounded by other hostile tribes, speaking different dialects, and separated from each other only by a deserted border or neutral territory: the cause of their warfare appears to be the means of subsistence. Their country is a broken mass of wild rocks, lofty hills, and useless forests: and these are viewed through mists and endless storms. The habitable land is reduced to the stones on the beach; in search of food they are compelled unceasingly to wander from spot to spot, and so steep is the coast, that they can only move about in their wretched canoes. They cannot know the feeling of having a home, and still less that of domestic affection; for the husband is to the wife a brutal master to a laborious slave. Was a more horrid deed ever perpetrated, than that witnessed on the west coast by Byron, who saw a wretched mother pick up her bleeding dying infant-boy, whom her husband had mercilessly dashed on the stones for dropping a basket of sea-eggs! How little can the higher powers of the mind be brought into play: what is there for imagination to picture, for reason to compare, or judgment to decide upon? to knock a limpet from the rock does not require even cunning, that lowest power of the mind. Their skill in some respects may be compared to the instinct of animals; for it is not improved by experience: the canoe, their most ingenious work, poor as it is, has remained the same, as we know from Drake, for the last two hundred and fifty years.

Overall, Darwin was clearly shocked by what he saw and heard about the life and customs of the people of this region. Among other horror stories, he writes that

From the concurrent, but quite independent evidence of the boy taken by Mr. Low, and of Jemmy Button, it is certainly true, that when pressed in winter by hunger, they kill and devour their old women before they kill their dogs: the boy, being asked by Mr. Low why they did this, answered, "Doggies catch otters, old women no." This boy described the manner in which they are killed by being held over smoke and thus choked; he imitated their screams as a joke, and described the parts of their bodies which are considered best to eat. Horrid as such a death by the hands of their friends and relatives must be, the fears of the old women, when hunger begins to press, are more painful to think of; we are told that they then often run away into the mountains, but that they are pursued by the men and brought back to the slaughter-house at their own firesides!

Darwin's discussion of their phonology is similar in scope and content to Hitt's:

The language of these people, according to our notions, scarcely deserves to be called articulate. Captain Cook has compared it to a man clearing his throat, but certainly no European ever cleared his throat with so many hoarse, guttural, and clicking sounds.

Like Hitt, Darwin also provides just one phrase (from the language of one of the groups he encounters):

Young and old, men and children, never ceased repeating the word "yammerschooner," which means "give me." After pointing to almost every object, one after the other, even to the buttons on our coats, and saying their favourite word in as many intonations as possible, they would then use it in a neuter sense, and vacantly repeat "yammerschooner." After yammerschoonering for any article very eagerly, they would by a simple artifice point to their young women or little children, as much as to say, "If you will not give it me, surely you will to such as these."

The editorial practices of the NYT magazine are often puzzling, and rarely more than in this case. Why send someone to remotest Chile to observe language death, rather than to Maine or Arizona? Why send someone who apparently knows nothing about languages and has no interest in learning? And if you're going to send a linguistically-challenged observer to check up on people geographically and culturally similar to the hunter-gatherers who made such a big impression on Charles Darwin, why not suggest that if he's going to discuss Darwin's conclusions, perhaps he should read what Darwin wrote?

Jack Hitt is apparently an accomplished journalist, but in this case, I'm tempted to suggest that the NYT magazine should have saved themselves the air fare, and simply reprinted Darwin's chapter, which is now out of copyright.

Factual note: according to this map, Qawascar seems to be spoken on the west coast of Chile, at about latitude 50 S. Darwin give the latitude of Tierra del Fuego at 53 38' S. So there's clearly some physical distance between the speakers of Qawasqar and the people that Darwin describes (who are in any case somewhat diverse physically and culturally from east to west along the straits, according to his account). Darwin does indicate, however, that the climate and terrain are similar for several hundred miles north along the Chilean coast.

A more extensive, if fictional, account of life along that coast can be found in a novel by Patrick O'Brian, The Unknown Shore. It tells the story of the Wager, which in 1740 was separated from a Royal Navy squadron making a trip around the world, and was shipwrecked at latitude 48 on the Chilean coast. The weather, terrain and inhabitants are pictured much as Darwin described them on his visit a hundred years later. The main characters of O'Brian's novel -- midshipman Jack Byron and surgeon's mate Tobias Barrow -- spend a very hard hundred pages getting to Chiloe, mostly in the canoes of not especially friendly indigenes, and their journey gives a compelling fictional impression of how inhospitable the environment was, even some five degrees north of Tierra del Fuego and two degrees north of the island where Qawasqar is still hanging on.

[Update: Semantic Compositions discusses the same story, focusing on some aspects ignored in my comments, including a discussion (which SC aptly describes as "weaselly") of whether languages express a sort of "sense of place" (like wines express their terroir, though the analogy is not explicit); and what SC calls "an irritating morality play that the author tries to set up between a group of linguists who allegedly 'dismiss salvage efforts...as futile exercises' and linguists who 'will tell you that every language has its own unique theology and philosophy buried in its very sinews'."

As you can see, there's plenty in Hitt's article to irritate everyone.]

[Another, very different opinion: Patrick Belton at oxblog says that Hitt's piece "shows there can be good writing, even in the New York Times", and that "the magazine should be congratulated for the highly unusual and innovative step of bringing good writing to the profit-making press."

De gustibus non est disputandum, I guess. Perhaps what Belton liked about the piece was that it left out all the boring language bits in favor of the existential tobacco-consumption moments, one of which he quotes. ]

Posted by Mark Liberman at 07:26 PM

Is creating a dictionary a federal crime?

A while back a posting on Slashdot had this signature:

Diese ist durch das deutsche Urheberrecht geschützt. Die Übersetzung ins Englische verstösst gegen den DMCA.

At the risk of violating the DMCA, here's the translation for those who don't read German:

This is protected by German copyright. Translation into English violates the DMCA.

The DMCA is the Digital Millenium Copyright Act a law passed in the United States in 1998 at the behest of the entertainment industry in hopes of stopping unauthorized copying of copyrighted materials, such as music CDs. They like to call this "piracy", as if unauthorized copying were the moral equivalent of kidnapping, rape and murder. Some people have no sense of proportion. One reason that many people object to the DMCA is that it has the effect of allowing companies to prevent forms of copying considered legal under US copyright law under the "fair use" doctrine since it prevents any copying at all.

Section 1201(a)(1) prohibits the act of circumventing a technological measure used by copyright owners to control access to their work. Sections 1201(a)(2) and 1201(b) outlaw the manufacture, sale, distribution or trafficking of tools and technologies that enable circumvention.

Among the technological measures that the DMCA prohibits the circumvention of is encryption. What the above signature refers to is the fact that presenting something in an unfamiliar language may be considered a form of encryption and therefore a "technological measure". The story of the Navajo Codetalkers during the Second World War makes the equivalence clear. Therefore, translation may be considered a violation of the DMCA.

Putting something into German isn't a very good way of concealing it since so many people understand German. But that doesn't change the legal situation. The DMCA doesn't care whether the copyright owner has done a good job of protecting its material; circumventing the protection is illegal even if it is easy.

In October, as reported in The Register, SunComm Technologies threatened to sue Princeton computer science graduate student Alex Halderman under the DMCA over his paper Analysis of the MediaMax CD3 Copy-Prevention System. In SunComm's system, when a music "CD" is inserted into a computer's CD drive, a small program is installed that controls access to the CD and prevents copying. This relies on the "autorun" facility, whereby when a CD is inserted the computer automatically checks to see if it contains a program and runs it if it does. (The reason for the scare quotes around CD is that the term CD is a trademark of the Philips company. Since "CD"s with MediaMax CD3 copy protection do not conform to the specification, calling them CDs violates Philips' trademark.) Halderman pointed out that this program can easily be removed, and that "autorun" is disabled if the user presses the Shift key while the CD is loading. It is also possible to disable "autorun" from the Control Panel; any user with a concern for security will have done this. In any case, this approach only works if the computer is running Microsoft Windows; it has no effect on people like me who don't.

SunComm eventually backed down (BBC News report); they probably realized that continuing public notice that their product was laughably ineffective would not be good for business. The point is that security measures don't have to be good for it to violate the DMCA to circumvent them.

The Slashdot poster may have been joking, but maybe not. The issue is a serious one. Is it now a federal crime in the United States to produce machine translation software, dictionaries, grammars, and other aids to translation? Reasonable people may think that this isn't what the DMCA was intended to prohibit, but you can't count on either greedy corporations or power-seeking government officials to be reasonable. The Electronic Frontier Foundation has lots of information about the DMCA and the problems with it here.

Posted by Bill Poser at 01:14 PM

Aiding and editing the "enemy"

The New York Times reports that the U.S. Treasury Department has dreamt up yet another novel form of censorship:

Treasury Department Is Warning Publishers of the Perils of Criminal Editing of the Enemy. The New York Times, February 28, 2004, National section.

The Treasury Department has decided that it is illegal to edit writing that originates from Iran, and it seems to be gearing up to extend this restriction to other nations that the U.S. government restricts business with.

Posted by Christopher Potts at 10:37 AM

Rolling and unrolling Indian r's

Thomas Friedman's column in today's NYT describes his experiences in an "accent neutralization" class in Bangalore that "teach[es] the would-be Indian call center operators to suppress their native Indian accents and speak with a Canadian one." He characterizes the phonetic content of the class with a passing reference to allophones of /t/ and /r/:

"Watching these incredibly enthusiastic young Indians preparing for their call center jobs — earnestly trying to soften their t's and roll their r's — is an uplifting experience, especially when you hear from their friends already working these jobs how they have transformed their lives."

"Softening their t's" is a plausible way to refer to the flapping and voicing of /t/ when it's not the onset of a stressed syllable. This is the process that makes "latter" and "ladder" homophones for most North Americans, and likewise can make "say 'fat' again" sound like "say 'fad' again".. This is an example of the class of processes that linguists call "lenition", which is basically a fancy word for "weakening" or "softening", though "softening" is not generally used as a term of art in the same sense. As a term of phonetic description, soft is sometimes used for voicing features (though sometimes voiceless sounds are called "soft" and sometimes voiced ones are), and sometimes for other features like palatalization (the "soft g" in George vs. the "hard g" in gorge). I haven't heard the phrase "soft t" used to describe the lenited allophones of /t/ in North American English -- but it wouldn't be a terrible choice of terminology, despite the danger of confusion with palatalization of /t/ in sequences like "what you" -> "whatcha".

But teaching young Indians to "roll their r's" in order to sound like Canadians? This is really puzzling, since there's a standard meaning for the term "rolled r", namely the kind of tongue-tip trill that most Spanish speakers have for word-initial /r/ ("la raza") or for medial /r/ written as a geminate ("perro"). I'm no kind of expert on the English dialects of our neighbor to the north, but I'd be willing to bet a substantial sum that few if any of them have any trilled r's. As far as I know, most Canadians have the same "bunched" r that most Americans do. This is a sound with a very interesting bit of acoustic physics behind it -- but that's another story.

Meanwhile, if the youth of Bangalore are really being taught to perform trilled r's as a way to sound Canadian, their future customers are in for a treat. More likely, though, Mr. Friedman (or his editor) is just confused. The local language in Karnataka state, where Bangalore is located, is Kannada, which has a trilled r. So perhaps the students in the class that Friedman observed were actually learning to un-roll their r's -- though I've visited Bangalore, and my impression is that its residents mostly used tapped r's in their English.

I'd be interested in a better-informed account of the mass-market Henry Higginses of the growing Indian call-center industry. I imagine that they spend more time on vowels and prosody than on consonants, but it'd be nice to know the facts.

[Update: Bill Poser suggests:

I have no information on what the call center training does, but I wonder if part of it doesn't have to do with removing retroflexion. The use of retroflexes in place of standard English apico-alveolars is a salient characteristic of Indian English as well as of the loan phonology of the Indian languages that I know about. The retroflexes are referred to as "hard" in the non-specialist literature, e.g. in discussions of character sets by computer people. So "softening" might be removal of retroflexion.

This is certainly plausible. Or maybe Friedman just registered that the students were learning how to adjust the pronunciation of various sounds; wanted to be more specific because it makes for a more engaging and readable account; picked "t" and "r" at random because they're common letters whose English pronuncations is somewhat regular; and threw in "soften" and "roll" by process of association.

All that I can confidently conclude from what Friedman wrote is that he's not especially interested in phonetic description, which is hardly news. As for what and how Bangalore call-center recruits are really taught about how to imitate various English dialects, those are interesting questions that will have to answered by someone who knows the facts.]

Posted by Mark Liberman at 09:56 AM

Language Internal Reanalysis

Geoff Pullum has mentioned some nice examples of how English has created new singular stems from words borrowed from other languages, whose original singular form looked like a native English plural. What is more surprising is that this kind of reanalysis sometimes occurs within a single language; an inherited word is reanalyzed as containing an affix that it does not, historically, contain.

A neat example is found in Coptic, the latest form of the ancient Egyptian language. Coptic ceased to be spoken several hundred years ago, but it is still in use as the liturgical language of the Coptic Church. It is written in a variant of the Greek alphabet.

In Coptic (the Sahidic dialect, to be precise), the way you say "the king" is πρ̄ρο [pr̩ra]. (The stroke under the first <r> means that it is syllabic, so this sounds something like "purr-rah".) This may ring a bell, and it should. It is the descendant of the ancient Egyptian word prʕʔ, which is the ultimate source of our word pharaoh. (ʕ is the phonetic symbol for the voiced pharyngeal fricative, the sound represented by the Hebrew letter ע and the Arabic letter ع. ʔ is the phonetic symbol for the glottal stop, the sound represented by the Hebrew letter א and the Arabic letter ء. We generally know only the consonants, since Egyptian writing does not represent the vowels.) This term originally referred to the palace. pr means "house", ʕʔ "great", so together they mean "great house". But by the beginning of the 18th dynasty (1539 B.C.E.) it had taken on the meaning of "king", just as "The White House" now refers to the President of the United States as well as his residence.

The catch is that πρ̄ρο means "the king", not "king". "king" is just ρ̄ρο [r̩ra]. "the kings" is ν̄ρ̄ρωου [nr̩rou], and "queen" is τρ̄ρο [tr̩ra]. To understand this, you need to know a little bit about the grammar of Coptic. In Coptic the masculine singular definite article is a prefix π [p]. The feminine singular definite article is τ [t], and the common plural is ν̄ [n]. For example, "man" is ρωμε [rome]. "the man" is πρωμε [prome]. "the men" is ν̄ρωμε [nrome].

So, the original word for "king" in Coptic must have been πρ̄ρο [pr̩ra]. But because this began with a [p], it looked like it began with the masculine singular definite article, and so at some point "king" was mistaken for "the king", the initial [p] was taken to be the masculine singular definite article and stripped off, and the stem of "king" became -ρ̄ρο [-r̩ra]. In short, speakers of Egyptian misanalyzed their own inherited word for "king".

Posted by Bill Poser at 12:37 AM

February 28, 2004

Kudos, cherries and peas

Cursed as I am with the habits of a scholar, this sentence about the prospects for The Lord of the Rings: Return of the King getting Best Picture happened to catch my eye as I was reading film critic Lisa Jensen on the Oscar prospects in the local free newspaper in my home town:

The thrice-nominated trilogy has yet to win the gold, but this year the King will prevail with a cumulative kudo for the sheer enormity of the entire three-part, 10-hour production.

Ah, Lisa has made a very understandable mistake, I thought. Kudos does look like it might be the plural of a word kudo, I thought, but it isn't. It's a Greek singular. Well, I was wrong.

I am, as I say, cursed with the habits of a scholar, so I looked it up even though I thought I knew I was right. And in the wonderful on-line Webster's dictionary I found that I was simply behind the times. Kudo is listed. It occurs as a reasonably well-established back-formation from kudos, and a usage note is appended to its entry:

usage Some commentators hold that since kudos is a singular word it cannot be used as a plural and that the word kudo is impossible. But kudo does exist; it is simply one of the most recent words created by back-formation from another word misunderstood as a plural. Kudos was introduced into English in the 19th century; it was used in contexts where a reader unfamiliar with Greek could not be sure whether it was singular or plural. By the 1920s it began to appear as a plural, and about 25 years later kudo began to appear. It may have begun as a misunderstanding, but then so did cherry and pea.

So it's me that made a mistake. If the word had been introduced last week in the Santa Cruz free newspaper Good Times, it could perhaps be called an error. But in use since the 1920s? That's two generations. No, Lisa Jensen is just using the resources of the English language (she's not a Greek, after all). And I just learned one more thing I didn't know about English. Three things, in fact, because I also looked up cherry (from Old North French cherise, wrongly taken to be a plural) and pea (Middle English pease was taken to be a plural too; it came from Latin pisa, which actually was a plural, but of pisum, not of *pi!).

[Note added later: Lisa also uses enormity to mean "hugeness". Cullen Murphy has ignorantly insisted that this is a mistake, but it is not just a perfectly kosher usage, it is actually older than enormity meaning "horrendousness" or "horrendous thing", as Mark Liberman pointed out here. And her Oscar predictions were stunningly correct: she predicted just about everything exactly right. A big triple kudo to Lisa for the enormity of her knowledge and insight.]

Posted by Geoffrey K. Pullum at 09:03 PM

Gorillas and umbrella women

Michael Shermer's Skeptic column in the March 2004 Scientific American discusses research by Daniel Simons and others on "inattentional blindness." These studies show that "when observers were actively engaged in an unrelated task, they sometimes failed to see ... an unexpected event, or UE". For example, the "unrelated task" might be viewing a video and counting rapid basketball passes made by a group of people wearing white t-shirts, while ignoring the passes made by people wearing black t-shirts; the "unexpected event" might be the appearance in the video of a woman carrying an umbrella or a person wearing a gorilla suit.

Some demos of the videos used in these tasks are here. The gorilla video and the umbrella video are certainly striking. A key paper is Daniel Simons and Christopher Chabris, "Gorillas in our midst: sustained inattentional blindness for dynamic events." Perception (1999), v. 28 pp. 1059-1074. From their abstract:

...we are surprisingly unaware of the details of our environment from one view to the next: we often do not detect large changes to objects and scenes (`change blindness'). Furthermore, without attention, we may not even perceive objects (`inattentional blindness'). Taken together, these findings suggest that we perceive and remember only those objects and details that receive focused attention. ... Our results suggest that the likelihood of noticing an unexpected object depends on the similarity of that object to other objects in the display and on how difficult the priming monitoring task is.

Here's part of the data from their experiment, showing the percentage of subjects noticing the unexpected event in each condition.

Easy task
Hard task
White t-shirts
Black t-shirts
White t-shirts
Black t-shirts
Umbrella woman

The videos (as you can see for yourself above) show two groups of students weaving in and out while passing basketballs around. One group is wearing white t-shirts, while the other group is wearing black t-shirts. A subject may be asked to attend to either the "team" wearing white or the team wearing black. The "easy task" is just to keep a silent mental count of the number of passes made by the monitored team; the "hard task" is to count bounce passes and aerial passes separately.

When subjects are performing no monitoring task, they always notice the umbrella woman and the gorilla (the "unexpected event" or UE). They notice the UE more often when performing the easy task than when performing the hard task. They notice the gorilla more often when they're monitoring the black team, and the umbrella woman (who is wearing pale colors) more often when they're monitoring the white team.

It's important to recognize what these results don't show -- they don't show that we ignore things unless we're looking for them. Instead, they show that when we're performing a cognitively difficult monitoring task, we may fail to notice (very salient) things that are not relevant to the task.

One obvious practical application of these results is in designing jobs and work procedures, so that (for instance) pilots monitoring instruments don't fail to notice unexpected objects on the runway. Another obvious application is in evaluating eyewitness testimony. However, Shermer's Scientific American column draws a more metaphorical moral. He suggests that we can think of science as a cognitively demanding monitoring task, whose practitioners may therefore be blind to all sorts of gorillas and umbrella women that happen to be cognitively distant from the things they're focused on. I'm sure that this is true, and will offer some examples from the linguistic sciences in later posts. However, in my experience, "inattentional blindness" in science is more complex. Everyone actually sees the metaphorical gorilla, but they've collectively decided that it's not interesting or relevant, so that it's not examined with any care, and it's ignored in framing descriptions and crafting explanations.

Shermer closes with an upbeat assessment, which I share:

... the power of science lies in open publication, which, with the rise of the Internet, is no longer constrained by the price of paper. I may be perceptually blind, but not all scientists will be, and out of this fact arises the possibility of new percepts and paradigms. There may be none so blind as those who will not see, but in science there are always those whose vision is not so constrained.


Posted by Mark Liberman at 08:23 AM

February 27, 2004

Do you wish to use Hmoob?

On Wells Fargo Bank ATMs around where I live, the first question up on the screen is about which language you would like to transact business in, and I noticed recently after an upgrade that one of the choices now says "Hmoob". Now that's a language name that doesn't appear in the reference books. But Bill Poser, Language Log's resident Asian languages expert-in-chief, was able to tell me what is going on.

It turns out that this is a spelling of what is more usually written as Hmong. The language of the Hmong people, whose traditional home is in the mountains of Laos and adjacent parts of Vietnam and China, has about a dozen writing systems, so Bill tells me. Of the ones that use roman letters (as Vietnamese does), the most widely used is one that follows somewhat similar principles to the ones used in the romanization of Chinese that was once worked out by Yuen-Ren Chao.

The reason that there is no ng or other indication of the velar nasal "-ng" sound is that this particular alphabet treats that nasal consonant as a feature of the vowel -- not a separate nasal consonant, but a vowel produced with nasalization. The writing system doesn't separate the quality of the vowel from its nasalization. So when you see oo, that means the "ong" vowel sound. So that leaves the question of what the b is doing on the end there.

Well, Hmong is a tone language. Every syllable has an associated tone or pitch -- high, low, medium, falling, rising, or whatever. But the language doesn't have a lot of syllables that crucially have to be written ending in a consonant letter. That means (or so thought the people like William Smalley who analyzed the language) that some consonant letters are surplus to requirements: there is no need for any syllable-final uses of the letters b, d, g, j, s, or v. So occurrences of those letters at the ends of words can be used instead to indicate tones, avoiding the need for having accents. That's just what Chao proposed for Chinese (not that it caught on very widely). The tone that occurs on the word Hmong is the one written with a final b.

It's a neat trick to have a way to spell words containing both nasalization and crucially important tone without any accents or funny letters. But it comes at the cost of having Hmong look like Hmoob, which to me, I must admit, looks completely wroob. "We travel aloob, singing a soob..."? "Ding doob the witch is dead"? "Can't we all just get aloob?"? To whom does this thoob beloob? It's no use; if I tried all day loob I don't think I could get used to it. My orthographic habits are too stroob to break. But I have no doubt that it's a great comfort to see the word there on the ATM screen if (like tens of thousands of my fellow Californians) you're a Hmoob.

Posted by Geoffrey K. Pullum at 05:29 PM

Online fun with ancient writing systems

On the web site of Penn's Museum of Archaeology and Anthropology, you can write your name in Egyptian Hieroglyphs, or your "monogram" in Babylonian cuneiform. [via Phluzein]

Posted by Mark Liberman at 02:56 PM

The mysterious marthambles

I agree with David Mamet that Patrick O'Brian's Aubrey-Maturin series is a masterpiece. O'Brian's fictional self-presentation is irrelevant to the books, but faintly troubling. Recently I've stumbled on a bit of evidence that he might sometimes have made up stories about his words as well as his life.

An interview with Patrick O'Brien published in The Patrick O'Brian Newsletter (volume 3, issue 1, March 1994) contains this Q & A:

Q. Please explain the meaning of the term "marthambles," the sailors' disease that Dr. Maturin is often concerned with aboard ship. I have looked in many dictionaries and medical texts for such a term.

A. Marthambles is a very fine word that I found in a quack's pamphlet of the late 17th or early 18th century advising a nostrum that would cure not only "the strong fires" and a whole variety of more obvious diseases but the marthambles too. I have never seen it anywhere else and it has escaped the OED.

Certainly the word marthambles is missing from the OED and from other standard lexicographical sources that I've checked. The glossary at the O'Brian fan site Maturin's Medicine has

marthambles (DI 123, RM 164, NC 132, 149, WDS 130, YA 226):

An unspecified illness, "known as the marthambles at sea and griping of the guts by land" [NC]. Patrick O?Brian is said to have seen the word on a pamphlet of the era by the quack doctor, Dr Tufts. It appears to be contagious and deadly to Pacific islanders.

If the given list of citations is correct, then O'Brian's earliest use was the one on p. 123 of Desolation Island, which was published in 1978:

'Of course he'll live,' said his messmates. 'Ain't the doctor pumped him dry, and blown out his gaff with physic?' For it was just as much part of the natural order of things that Dr. Maturin should preserve those who came under his hands; he was a physician, not one of your common surgeons -- had cured Prince Billy of the marthambles, the larynx, the strong fives -- had wormed Admiral Keith and had clapped a stopper over his gout -- would not look at you under a guinea, five guineas, ten guineas, a head, by land.

Let's ignore the question of whether "the strong fives" (Desolation Island) or "the strong fires" (Patrick O'Brian newsletter) is a typo. It's the marthambles that we're after here.

Dorothy Dunnett's historical novel The Ringed Castle (fifth of the "Lymond Chronicles") was published in 1971. The fictional year is 1555, and Francis Crawford of Lymond, working for the Russian Tsar as a mercenary soldier, has travelled to Lampozhnya, near the mouth of the Pechora river, in what is now the Arkhangel'skaya Oblast'. He's accompanied by the English navigator Diccon Chancellor, employed by the infant Muscovy Company. On p. 244 (of the 1997 Vintage edition) we read:

Once, a low drumming made itself heard among the thin sounds spread out under the frozen crust of the stars: the cries and barking and warbling song: the coughing and squealing of livestock; and Chancellor asked what it was.

'The signal for massacre?' Lymond said; and then, relenting: 'The Samoyedes are Shamanists, and worshop Ukko as chief of the gods. The tribes are led by the Shamans, and the Shamans practise magic and medicine with the aid of their voices and drums. If you can manage an attack of the Marthambles, we could persuade one to say an incantation over you. You would then be anointed with infallible remedies -- say, live earthworms mashed into alcohol.'

'I shall avoid succumbing to the Marthambles,' Chancellor said. 'Are all their remedies so alluring?'

'Take your pick,' Lymond said. 'For example, cornsilk and hot dough and live ants in warm oil for your joint pains. Celery water and goose fat massage for frost bite. That works, and you might as well make a note of it: the Company will have cases sooner or later. The voice and drum treatment is something again.'

'Faith?' said Richard Grey.

They were about to retire for the night. Lymond rose, as did his captain, a shadow behind him. 'I don't know. The Shaman will not come to me. He must invite me to his tent; and he has not done so yet.'

'Acquire an attack of the Marthambles,' said Chancellor.

'I have them,' said Lymond, 'every time I think of George Killingworth sitting confidently over a wine pot with Viscovatu. Do you still regret that you came?'

He spoke to Chancellor, and Chancellor, after a long moment, answered him truthfully. 'No,' he said.

'No,' Lymond said also. 'Verily, God hath eighteen thousand worlds; and verily, your world is one of them, and this its bright axle-tree.'

The first novel in Dunnett's Lymond Chronicles, The Game of Kings, was published in 1961, and further volumes came out every couple of years until 1975. They're exciting, literate, carefully-researched works, full of accurate historical detail and historically accurate specialized terminology. Dunnett is like O'Brian in using specific archaic vocabulary as a method for establishing characters and setting scenes, as in this passage from the start of The Ringed Castle, where a group of Lymond's former associates get a letter from him:

Lymond had left Turkey, it transpired, for Moscow. And now was inviting the pick of his captains to follow him.

'Well?' said Guthrie. 'He says the prospects for trained men seem excellent.'

The legal mind in the group was affronted. 'Prospects?' said Fergie Hoddim. 'Yon's a sore outlay, traveling to Russia and back for a prospect. They're a coarse, jabbering, ignorant people, and ye canna issue a complaint against wrangeous and inordinate dunts if ye're lying down deid on your baikie. I'll not move a step but a contract.'

They left at the end of the week: eight well-balanced and reasonable mercenaries, who had made up their minds to this exploit before ever finishing that laconic letter. And Fergie Hoddim was one of their number.

It seems likely that O'Brian would have read Dunnett's books, perhaps before he began his own series of historical novels in 1970 or so, and certainly as he went with them. And her use of marthambles predates his by seven years.

So either Dunnett read the same "quack's pamphlet of the late 17th or early 18th century" that O'Brian allegedly did; or he got the marthambles from her, and made up the pamphlet as he made up being born in Galway and raised a Roman Catholic in genteel circumstances.

[Update: more on the marthambles here and here.]

Posted by Mark Liberman at 07:04 AM

February 26, 2004

Tina Turner to sing in Latin and Sanskrit

Continuing the uptick for classical languages in the popular media, Maria Abraham writes on the Reuters business wire that Tina Turner "will play a Hindu goddess in a spiritual Merchant-Ivory musical for which she will also sing classical numbers in Latin and Sanskrit."

Unless the reporter is confused, Merchant-Ivory will be stretching even further than Mel Gibson did to have Shakti (a.k.a. Kali, a.k.a. Durga) singing in Latin. [via Phluzein]

Posted by Mark Liberman at 05:27 PM

Hayek on Hebb

An earlier post discussed the history of ideas about spontaneous order in neuronal networks, specifically in Friedrich Hayek's 1952 book The Sensory Order and in Donald Hebb's 1949 work The Organization of Behavior. Mark Seidenberg has been to the library, and sends a paragraph quoted from the preface of Hayek 1952:

"It seems as if the problems discussed here were coming back into favour and some recent contributions have come to my knowledge too late to make full use of them. This applies particularly to Professor Donald Hebb's Organization of Behavior, which appeared when the final version of the present book was practically finished. That work contains a theory of sensation which in many respects is similar to the one expounded here; and in view of the much greater technical competence of its author I doubted for a while whether publication of the present book was still justified. In the end I decided that the very fullness with which Professor Hebb has worked out the physiological detail has prevented him from bringing out as clearly as might be wished the general principles of the theory; and as I am concerned more with the general significance of a theory of that kind than with its detail, the two books, I hope, are complementary rather than covering the same ground."

Posted by Mark Liberman at 05:09 PM

Why are negations so easy to fail to miss?

Over the past month or so, a series of posts here have sketched an interesting psycholinguistic problem, and also hinted at a new method for investigating it. The problem is that people often get confused about negation. More exactly, the problem is to define when and how and why people get confused about negation, not only in intepreting sentences but also in creating them. The method is "Google psycholinguistics": the analysis of internet text as a corpus, as a supplement to more traditional methods like picture description, reaction time measurements or eye tracking.

This all started with could care less. It's clear that this phrase has become an idiom, meaning "don't care", even if it's not clear exactly how the not disappeared from the apparent source cliché couldn't care less. In this Language Log post from last month, Chris Potts discusses a range of other examples where the presence or absence of negation seems to leave the meaning (in some sense) unchanged. For example: "That'll teach you (not) to tease the alligators."

Followups in our pages and elsewhere (here, here, here, here, here) discussed many cases of developments of a different kind, where extra negations create an interpretation at odds with what the writer or speaker meant. An antique and canonical example (cited by Kai von Fintel) is "No head injury is too trivial to ignore." The literal meaning is the opposite of what the author wants it to be, but this is not irony or sarcasm -- the author is just confused. The extra negations are sometimes explicit negative words (like not and no) and sometimes implicit parts of words with negative meanings (like refute, fail, avoid and ignore). Generally the result has at least two negatives, and often a scalar limit, conditional, hypothetical, or other irrealis construction as well.

In fact, this description is predictive -- if you think of a construction that meets these conditions, and check with Google or Altavista, you will generally find lots of examples whose literal meaning is clearly the opposite of what the writer intended.

The obvious hypothesis is that it's hard for people to calculate the meaning of phrases with several negatives (perhaps especially in combination with things like scalar limits and hypotheticals). The implicit negation in words like fail and ignore may be especially difficult to untangle. This explains why the errors are not detected and corrected: we accept an interpretation that is a priori the plausible one, even though it's incompatible with the sentence as written or spoken, because it's too hard to work out the semantic details.

However, this may not provide an adequate explanation for why the errors are so commonly made in the first place. The pattern is predictive of errors, but it doesn't predict how common the errors will be, either in themselves or by comparision to "correct" interpretations of the same pattern.

In this post, Geoff Pullum mentions the particular case of "fail to miss" used to mean simply "miss." A little internet search shows that this sequence is moderately common (around 2,400 ghits for "fail/failed/failing to miss", or one per 1.8 million pages), and that when it occurs, it is almost always used in the "wrong" meaning:

Miss Goodhandy doesn't fail to miss an opportunity to humiliate Steve, and gives him a few good swats with the jockstrap's thick elastic waistband.

Although his attendance at school was still very poor, Stanley never failed to miss a movie at the local theaters.

Canceling a few flights here and there seems like a good trade-off because the results of failing to miss a real threat are so severe.

This is sure to be a killer tournament, don't fail to miss it!

It seems to me that there are several different psycholinguistic questions here: why do most people not even notice the problem in sentences like this? why do people stick in the extra "fail to" in the first place, given that the sentences mean what their authors intend if they just leave it out? why are uses of "fail to miss" so often accompanied by an additional negative ("doesn't fail to miss", "never failed to miss", etc.)? and why do people hardly ever use "fail to miss" to mean "fail to miss"?

In fact, almost the only internet examples of "correct" usage of fail to miss are copies of this famous passage:

This is what The Hitchhiker’s Guide to the Galaxy has to say on the subject of flying: There is an art, or, rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Pick a nice day and try it. All it requires is simply the ability to throw yourself forward with all your weight, and the willingness not to mind that it's going to hurt.

That is, it's going to hurt if you fail to miss the ground. Most people fail to miss the ground, and if they are really trying properly, the likelihood is that they will fail to miss it fairly hard. Clearly, it is the second part, the missing, which presents the difficulties.

Douglas Adams offers us a clue here, I think: you can fail to do something only if you first intended to do it. It's relatively rare for people to intend to miss something, but missing things is generally easy to do, so when you try to miss something, you usually succeed (and you might describe what you did as avoiding rather than missing, anyhow). Therefore, failing to miss things just doesn't come up very often. Perhaps this hole in the semantic paradigm leaves a sort of vacuum that bad fail to miss rushes to fill?

We can test this idea with "fail to ignore", because ignoring things is often both desirable and hard to do, and failing to ignore things is therefore an event that we often may want to comment on. There are certainly plenty of "wrong" interpretations of fail to ignore:

The Judge Institute is a building that no-one in Cambridge can fail to ignore. Much has been written about its jelly-baby hues, its pyramid-like proportions, and its metamorphosis from the husk of Old Addenbrooke's. (context)

Progressive thinkers and activists need to consider the practical implications of these principles. Good people of the world cannot fail to ignore them.

In New York, state Sen. Michael Balboni (R-Mineola) is circulating a proposal based on the original California bill, and plans to introduce the measure in the next 10 days. "Various industries in New York are looking at our legislation," said Balboni legislative assistant Tom Condon. "We have to ask, are all these lawsuits beneficial to our economy? And we can't fail to ignore possible negligent conduct from these manufacturers. It's a difficult issue."

but there are plenty of "correct" interpretations as well:

[T]he chapter points out the pitfalls that are likely when making decisions: ignoring opportunity costs, failing to ignore sunk costs, and focusing only on some of the relevant costs.

He managed somehow to answer their questions, trying and failing to ignore the addictive joy of a kindred spirit touching his.

The story of a black lawyer who tried and failed to ignore his race.

And "fail to ignore" is also less common (in both right and wrong interpretations) than "fail to miss" (about 1200 ghits to 2400 -- though the verb miss is also about twice as common as the verb ignore). In any case, the counts are large enough (tens of millions for the basic words such as fail, miss and ignore, and thousands for phrases such as fail to miss, fail to ignore) that one could imagine fitting some simple statistical models for the generation process that would permit testing different answers to some of the questions asked above.

As another example, consider the counts in the table below

to underestimate
to overestimate

Nearly all the "to underestimate" cases are logically mistaken substitutes for "to overestimate":

It is impossible to underestimate the long-term impact of Phoebe Muzzy’s ’74 longstanding role as an Annual Fund volunteer.

It is almost impossible to underestimate the importance of rugby to the South African nation in terms of its self-esteem on the world stage.

It's impossible to underestimate Lucille Ball's importance to the new communications medium.

It's impossible to underestimate the value of early diagnosis of breast cancer. (BBC)

Why are these mistakes so common? Why are correctly-interpreted uses of "impossible/hard/difficult to underestimate" so rare -- except in discussions of the mistaken ones? Is there a connection between these two facts?

Google psycholinguistics may point the way to the answers, despite its obvious and severe practical and theoretical difficulties as a methodology.

[Update: Fernando Pereira observes that "[f]or those of us skiers who spend a considerable time in the trees, the chance of 'failing to miss' is why we wear helmets". However, the single result of searching for |"fail to miss" wilderness ski| failed to produce any other correct uses:

The clay like soil of the Adirondacks makes it difficult for water to run off and creates these mud holes that can cause you to sink up over your knees if you fail to miss a rock or log when crossing.

This may be sampling error, but apparently it's not enough to do something where missing things can be both difficult and desirable. Seriously, I think in this situation people are more likely to use the word avoid. I couldn't find any examples involving skiers and trees, but there are plenty of cases in a slightly generalized frame, e.g.

Low-hangng [sic] branches and limbs can be a problem for boaters who fail to avoid getting caught in them.

Finally, while checking all this out, I found this amusing piece (entitled "Do not fail to avoid neglecting this post") about the difficulties of calculating the polarity of summaries of SCOTUS decisions. ]

Posted by Mark Liberman at 09:18 AM

Offsite backup for world's languages

At 7:36am GMT (2:36am EST) on Friday, the Rosetta Disk is scheduled to be launched on board an Ariane-5 rocket from the European Spaceport in Kourou, French Guyana. The mission's target is the comet Churyumov-Gerasimenko, which will be reached in 2014 after a "billiard ball" journey through the Solar System lasting more than ten years. This will be the first mission to orbit and land on a comet.
[European Space Agency Rosetta Mission]

The Rosetta Disk is a modern version of the Rosetta Stone. The 2-inch nickel disk is micro-etched with 30,000 pages of information covering over 1,000 languages. For each language there is a simple dictionary, a guide to pronunciation and counting, and a traditional story with translation. Additionally, to help language decipherment in remote futures, a translation of a common text (the first three chapters of the book of Genesis) is provided in all languages. The disk can be read with the aid of an optical microscope.

Posted by Steven Bird at 01:27 AM

February 25, 2004

Avoiding rape and adverbs

Elmore Leonard offers some advice about writing. [New link here -- 3/15/2006.] For his kind of story-telling, his rules make sense to me. At least, they're a fairly accurate description of how he writes, and I like the results. He doesn't mention adjectives, but his fourth rule does suggest avoiding adverbs in one particular context:

3. Never use a verb other than ''said'' to carry dialogue.

The line of dialogue belongs to the character; the verb is the writer sticking his nose in. But said is far less intrusive than grumbled, gasped, cautioned, lied. I once noticed Mary McCarthy ending a line of dialogue with ''she asseverated,'' and had to stop reading to get the dictionary.

4. Never use an adverb to modify the verb ''said'' . . .

. . . he admonished gravely. To use an adverb this way (or almost any way) is a mortal sin. The writer is now exposing himself in earnest, using a word that distracts and can interrupt the rhythm of the exchange. I have a character in one of my books tell how she used to write historical romances ''full of rape and adverbs.''

I wanted to check how Leonard stacks up on the contentious adjective dimension. So I pulled a couple of titles at random from my (complete?) collection of his works. A quick, rough count puts the first few paragraphs of Cat Chaser at about 15% adjectives (" ...long two-tone hair thinning fast, what was left of a blond pompadour receding from a sunburned peeling forehead..."), and the first few paragraphs of Mr. Majestyk at 10% ("...worn-out looking men in dirty, worn-out clothes that had once been their own or someone else's good clothes..."), both high relative to Doug Biber's norms.

Honesty compels me to point out that Leonard uses lots of adverbs. The first few paragraphs of Cat Chaser have fast, neatly, freshly, once, half, already, almost, directly, there, and today, not counting the large number of adverbial PPs.

I should also point out that Leonard often uses appositives and adverbial PPs in quotative tags:

...Moran said, just as dry.
...Nolen Tyner said, smiling a little, ...
...the woman said, with an edge but only the hint of an accent.
...Virgil said, spacing the words.
...Mr. Perez said, with his soft accent.
...Ryan said, still wanting to be sure.
...Rafi said, his expression still grave.

"Blah blah, Rafi said, his expression still grave" is stylistically different from "blah blah, Rafi said gravely", but it doesn't seem to me that the writer is intruding any less in these quotative-tag appositives than in quotative-tag adverbs.

Leonard also sometimes uses non-"said" tags like "the girl went on" and "Ryan told his friend."

But still, he uses plain "X said" a lot more than most writers. And this is Elmore Leonard. I'm going to cut him some slack.

Posted by Mark Liberman at 09:06 PM

Those slurry, sleepy southerners

Cindie McLemore emailed about a profile of Lyle Lovett by Alec Wilkinson in the current New Yorker. Wilkinson writes

Lovett's voice is typically a bit raspy, and his diction is slurry. He often sounds as if you'd just woken him up. "Wire" becomes "wi-er." "Threw" becomes "thoo." (p. 72)

Cindie observed

I don't know what he means about "wi-er" ("war" would be consistent with "thoo") but my father says "thoo" for "threw" when he's talking about certain topics (as in "thoo the football"), and not because he's sleepy or slurry.

I don't think I've ever heard Lovett talk -- it's possible that he really does speak imprecisely -- but it seems much more likely that this is an example of sociolinguistic stereotyping.. Many people (including some sourtherners) perceive the accent of southern speakers as an indication of various moral shortcomings associated with laziness, carelessness, sleepiness and so on. Ignorance and stupidity also may come into the picture. I wrote about this a few months ago in connection with a CNN report that speech recognition technology failed in Shreveport LA due to "Southern drawl and what I call lazy mouth", and a Michael Lewis piece in Slate opining that "technology doesn't sound nearly as impressive when it is discussed in a booming hick drawl".

Wilkinson exemplifies Lovett's "slurry diction" and "sound[ing] as if you'd just woken him up" by giving two attempts at phonetic renditions of words that have pronunciations characteristic of the area where Lovett was born and raised (Klein, TX), making seem all the more likely that the issue is regional accent, not personal carelessness or low levels of physiological arousal.

I do agree that the pseudo-phonetic rendition "wi-er" is puzzling, since the expected pronunciation (I think) would rhyme with the typical northern U.S. pronunciation of "far". Maybe Lovett has learned that his native pronunciation is stigmatized and produces an exaggerated northern form? Or maybe Wilkinson is just confused -- about how to describe someone's pronuncation phonetically, as well as about how (not) to interpret it stylistically, morally and physiologically.

[Update: two Texans have expressed doubt about whether I should call Texans "southerners". This brings up all sorts of issues -- cultural identity as well as dialect geography -- that I'm not competent to survey. Let's say at least that from the point of view of Americans from some other parts of the country, there's a family resemblance in accent that tends to evoke similar stereotypes. And the Atlas of North American English defines the South as including most of Texas.

Robyn Stewart wrote from Vancouver, Canada, to explain that (at least some) Canadians have picked up the same prejudices:

A visitor to where I work had an accent from Alabama. He was highly intelligent, but the stereotype associated with the accent is so strong -- we've only really heard it on Forrest Gump -- that one of my students remarked "I know he's smart, but he talks like a MORON."

I suspect that seeing Forrest Gump is not enough in itself to form these stereotypes -- maybe Beverly Hillbillies, Foghorn Leghorn and a few other items of popular culture played a role -- but in fact, it's a mystery to me how these attitudes are created and maintained. As in the case of learning words, it does seem that individuals need only small amounts of "training" to pick up such stereotypes, at least once they're established in the community. And this insightful post by Geoff Pullum shows how different these stereotypes can be for someone from a different speech community -- he was raised in the U.K., and his associations with Texan speech patterns come from positive adult experiences.

We shouldn't exaggerate the degree of stereotyping in the Wilkinson quote. All he wrote was that Lovett sounds raspy, slurry and sleepy, with a couple of regional pronunciations given as examples. I ignored the "raspy" part, and associated the slurry and sleepy parts with a larger pattern of properties stereotypically associated with speakers from the the American south and south midland areas. ]

Posted by Mark Liberman at 05:47 PM

Defining marriage

I've noticed that I twitch a little each time I hear someone talking about how what we've got to do is pass a law, or a constitutional amendment, that defines marriage as being between a man and a woman, as if something lexicographical was at issue. Yesterday we were treated to the most egregious case of this, when our president told us solemnly that he was "troubled by activist judges who are defining marriage," because "Marriage ought to be defined by the people, not by the courts." And I realized why this kind of talk was making me twitch. This issue is being represented as linguistic, relating to a democratic right of the people to stipulate word definitions, when it's nothing of the kind.

As Mark Liberman has repeatedly reminded us, there are dictionaries. To take Webster's, for example, this is the definition we have now for the word at issue:

marriage 1 a (1) : the state of being united to a person of the opposite sex as husband or wife in a consensual and contractual relationship recognized by law (2) : the state of being united to a person of the same sex in a relationship like that of a traditional marriage <same-sex marriage> b : the mutual relation of married persons : WEDLOCK c : the institution whereby individuals are joined in a marriage
2 : an act of marrying or the rite by which the married status is effected; especially : the wedding ceremony and attendant festivities or formalities
3 : an intimate or close union <the marriage of painting and poetry -- J. T. Shawcross>

The definition says there are three main meanings, 1, 2, and 3: one for a state, relation, or institution, one for an act, and one for a more broadly conceived kind of union, respectively. The first of these, 1, divides into three sub-senses, a, b, and c: 1a for the state of being married, 1b for the relation of marriage to someone, and 1c for the institution of marriage. Then 1a is split into two sub-sub-senses, 1a(1) for the man-woman contract, and 1a(2) covering the same-sex equivalent. [Notice, that correctly avoids making it a contradiction when we talk about what Gavin Newsom has been allowing in San Francisco: we can't talk about permitting bachelors and spinsters to be married to each other and still be bachelors and spinsters, because that would be self-contradictory, but talking about "same-sex marriage" is not self-contradictory, it's just a use of meaning 1a(2) rather than 1a(1).]

Then there's sense 2, which denotes the act of marrying, and sense 3, a bit further afield, allows for all sorts of abstract and concrete close relationships and mergers.

This will all do just fine for all our linguistic purposes. We don't need to revise our language to have this discussion: English is flexible enough to allow us talk about both the narrower and the broader kinds of marriage: the marriage of Britney Spears or the marriage of true minds.

So I wish people — above all our president — wouldn't put their wedge issue in terms of this nonsense about how what's on the agenda is defining the term "marriage" more accurately and correctly as involving a man and a woman. We don't put definitions of words in the US constitution. (They change too frequently, that's one reason.) What's on the table here is taking away rights from certain couples: allowing what we are talking about when we use sense 1a(1), but disallowing what we are talking about when we use sense 1a(2). The proposal is to deny a specific subset of the people the advantages of a certain kind of contract. [Note: It has of course been customary for centuries to deny them any such right; but that is just the sort of thing that can change as our democracy evolves, and a few judges and at least one mayor have now decided that the change is long overdue.]

Go ahead, make my day (as Dirty Harry used to say): if they want a wedge issue, bring it on. Let them go ahead and try to pass, for the first time in the history of our country, a constitutional amendment aimed at taking rights away from a proper subset of the people. (The prohibition amendment was an ill-advised subtractive social amendment of similar type, but at least it took away the specified rights from all of the people. It was a big mistake, anyway, and soon had to be repealed.) But don't let them try to tell me they are revising a definition. It's nothing to do with defining the word "marriage". Webster's has done that perfectly well. It's about a denial of rights. The idea is that if you fall in love with a lesbian and want to marry her and live with her forever and share your life and property with her and be with her until you sit by her side at the hospital when she dies, that's O.K., but your rights will be subject to a limitation: you will be permitted all this under the sanction of the institution of marriage if you are male, but denied such permission if you are female. To add an insistence on that point in the constitution would be an act of discrimination, not of definition, so let's call things the way they are.

Posted by Geoffrey K. Pullum at 03:54 PM

Relating to the cougar in your life

Our occasional contributor Arnold Zwicky (currently recuperating after a brush with a rogue strain of bacteria that reached quorum and tried to kill him in December) sent me a quote from the regular Saturday column of pets miscellanea in the Palo Alto Daily News for February 21, 2004, responding to a letter about mountain lions (cougars) in Santa Clara County:

Unlike your instinct to curl into a ball if attacked by a dog, the best way to survive a mountain lion attack is to fight the animal off... Anything that will assist you in battling the animal off if it comes to that, will be helpful to you. Most importantly, never hike alone. Hike as a pair or a group and always keep children close. Did you spend Valentines Day alone this year? If so, a feline companion might be what you need in your life!

Unh? I don't think so. Feline companions in the back country may not be what you need. In fact if they're hungry, you might look like something that they need (which, we were recently reminded, might be true for domestic cats as well).

Anyway, the moral here is: don't ever say "Yadda yadda yadda" or "Whatever" when your English teacher or some grammarian on Language Log is trying to explain to you that decisions about where to put your paragraph breaks are really important stuff.

Posted by Geoffrey K. Pullum at 02:39 PM

It hardly goes without saying

Listening with half an ear to NPR, I heard a reporter say "it hardly goes without saying that ..." He meant either "it hardly needs to be said" or "it goes without saying", but what came out was a blend of the two.

He's not the first to do this. Internet search turns up several dozen examples like these:

It hardly goes without saying that countries who have adopted this mode of thinking generally are some of the most disadvantaged in the world.

It hardly goes without saying implementing agencies must be signed up to the plan.

It hardly goes without saying that the resources used in producing this latest outrage could have been spent on much better things.

It hardly goes without saying that you do not have to be an hereditary Peer to have these skills.

It hardly goes without saying what the main topic of conversation has been over the past week, as people have learned of the tragic death of Princess Diana.

Here's another piece of evidence that verbal habit trumps logic when multiple negations are in the picture. At least in this case, the valid phrases are much commoner than the logically incoherent ones.

Posted by Mark Liberman at 12:50 PM

What bacteria want to say

Bill Poser has just pointed us to information about what cats have to say: "Love you? Heck no. I'd eat you if you weren't so damn big," said Marchessa Yolanda Principessa des Astres, a Siamese cat from New Jersey, to her owner and handler, Fran Dershowitz. "Well, I'd bat you around first."

The February Scientific American has an article on signals for quorum sensing in bacteria. Quorum sensing was discovered about 40 years ago: some bacteria are bioluminescent when their population density is high enough, but turn off the lights when they're alone; others use quorum sensing as a switch for biofilm production, toxin generation and so forth. The article attributes to J. Woodland Hastings and Kenneth H. Nealson, the discoverers of the effect, the view that bacteria "[cry] out, like Horton the elephant's dust speck in the Dr. Seuss book, 'We are here! We are here! We are here! We are here!'"

The new news is work by Bonnie Bassler suggesting that in addition to the autoinducer and sensor proper to quorum sensing among their own kind, many bacteria have a second kind of signal, AI-2 ("autoinducer 2") that communicates quorum information across many diverse bacterial types. She refers to it as "bacterial Esperanto".

My favorite part of the article, though, is the mildly peevish quote from Stephan Winans at Cornell: ""Do bacteria want to communicate with each other, or is it just by accident? This idea has taken hold that these bacteria want to communicate with each other. It may be just too good to be true." I feel that this is a profound insight, with important applications in human relations as well.

Posted by Mark Liberman at 08:58 AM

February 24, 2004

Cat Talk

Previous discussion of animal communication on Language Log has focused on overblown and misleading claims. An angle that we haven't considered is what animals might have to say. According to this report in the Watley Review, cat fanciers were so upset with what they heard from their cats that they lynched the inventor of a device that provides real-time translation between Cat and English when he demonstrated the device at a cat show.

Posted by Bill Poser at 11:28 PM

A gentle reminder

Hey, everybody, there are dictionaries! If you have a question about words, you can look it up in dictionaries and often find useful information. In particular, for the history of English words, the Oxford English Dictionary is often helpful. Unfortunately, most bars don't have a paper copy or a web subscription, but nearly all university libraries have both, and the web version is likely to be available to everyone on campus and perhaps remotely to those who can authenticate themselves electronically.

I've mentioned this before in connection with journalists who carry on about words without doing elementary fact checking. I bring in up now because Allan Hazlett's weblog has a sort of illustrated bar-room discussion of the verb "to spoon", either in the intransitive sense "To lie close together, to fit into each other, in the manner of spoons", or in the transitive sense "To lie with (a person) spoon-fashion".

I refer to the bar because Allan tells us that "We discussed this at length last night at the bar". He cites two questions, "First question: is spooning essentially sexual?" and "Second question: Is 'to spoon' transitive?", and he tells us that "there was little consensus." He gives the results of some research using Google, which is certainly a sign of wisdom. However, dictionaries are good for this kind of thing as well, and I'm pretty sure that the OED must be available at Brown!

The glosses above required no independent lexicography because I just cut and pasted them from the OED, which cites these examples of the intransitive:

1887 Harper's Mag. Apr. 781/2 Two persons in each bunk, the sleepers ‘spooning’ together, packed like sardines. 1894 Outing XXIV. 343/2 The precision with which we could ‘spoon’ that sad night was truly beautiful to behold.

and this for the transitive:

1887 Harper's Mag. Dec. 49/2 ‘Now spoon me.’ Sterling stretched himself out on the warm flag-stone, and the boy nestled up against him.

I believe that this is enough to settle both questions, at least for the usage of the late 19th century: Harper's magazine in 1887 was not an outlet for explicit descriptions of sexual encounters, but it was representative enough of the "common elite" to count as evidence for the existence of a transitive verb.

Posted by Mark Liberman at 04:20 PM

Burbling as a good thing

David Brooks, writing approvingly of the progressive attitude of Hispanics, used the verb burble in a way that surprised me:

We are bound together because we Americans share a common conception of the future. ... That mentality burbles out of Hispanic neighborhoods, as any visitor can see.

The jargon file has

burble [Lewis Carroll's "Jabberwocky"] Like flame, but connotes that the source is truly clueless and ineffectual (mere flamers can be competent). A term of deep contempt. "There's some guy on the phone burbling about how he got a DISK FULL error and it's all our comm software's fault." This is mainstream slang in some parts of England.

This corresponds pretty closely to my intuition about burble. I would never use it to describe activity that I view positively. At first, I thought that Brooks' usage must have been a malapropism for bubble. But according to the OED, the jargon file is wrong (at least historically) to insist on a negative connotation, and wrong to cite Jabberwocky as the main source:

1. a. intr. To form vesicles or bubbles like boiling water; to rise in bubbles; to flow in or with bubbles, or with bubbling sound.

1303 R. BRUNNE Handl. Synne 10207 As o here yen shulde burble out. c1440 Promp. Parv. 56 Burblon [1499 burbelyn], as ale or oer lykore, bullo. 1470-85 MALORY Arthur X. ii, A fayre welle, with clere water burbelynge. 1530 PALSGR. 459/2 To boyle up or burbyll up as a water dothe in a spring, bouilloner. 1577 W. VALLANS Two Swannes in Leland's Itin. (1759) V. 10 To Whitwell short, whereof doth burbling rise The spring, that makes this little river runne.

The OED does give as a second meaning

2. a. To speak murmurously; to ‘ramble’ on. b. trans. To say (something) murmurously or in a rambling manner. Also transf.

and in this sense, the quotations start with Jabberwocky and are generally more negative in tone:

[1871 ‘LEWIS CARROLL’ Through Looking-Glass i. 22 The Jabberwock..Came whiffling through the tulgey wood, And burbled as it came!] 1891 KIPLING Light that Failed viii, You only burble and call me names. 1906 B. VON HUTTEN What became of Pam III. iv, Miss Wantage..began to burble, and then to roar. 1920 MULFORD J. Nelson vii. 67 ‘Forty feet of rope an' a sycamore tree,’ burbled Smitty. 1921 Blackw. Mag. July 31/2 A sleepy dinner it was. We burbled a few plans for next day, and fell asleep by the fire. 1934 T. E. LAWRENCE Let. 6 Aug. (1938) 813 You send me a sensible working-man of a letter..and I burble back in this unconscionable way. 1965 Parade 15 May, ‘I think they just called our flight number,’ burbled Carter.

However, looking for uses on the web, I find plenty that are like the OED's "rise in bubbles" meaning, which seems to be what Brooks had in mind:

A small icy fountain burbles gently in the centre of the room.

Yet at cruising speeds, the big engine burbles along in serene style.

But my experience is that intellectual capital works differently––if legislation is influenced, it is only because that knowledge burbles up.

Brahms’ magnificent Violin Concerto sings with Romantic flair and passion, and Beethoven’s Pastoral Symphony burbles with the sounds of brooks and birds.

Despite the obvious risks, he positively burbles with enthusiasm, calling his new array "incredible," and the "simplest device I've ever had to manage."

The bass line burbles along amiably and the whole character is of greatest contrast to the stormy grandeur of the chorus.

Live and learn.

Posted by Mark Liberman at 12:17 PM

The world is upside down

There's nothing really new here, but I'm still sometimes brought up short by the difference between stereotypes and realities.

Here's Samuel Huntington, a "life-long Democrat" and old-fashioned cold war liberal, with a Foreign Policy article The Hispanic Challenge (from a forthcoming book entitled "Who Are We"):

...the single most immediate and most serious challenge to America's traditional identity comes from the immense and continuing immigration from Latin America, especially from Mexico, and the fertility rates of these immigrants ... This reality poses a fundamental question: Will the United States remain a country with a single national language and a core Anglo-Protestant culture? By ignoring this question, Americans acquiesce to their eventual transformation into two peoples with two cultures (Anglo and Hispanic) and two languages (English and Spanish).

The impact of Mexican immigration on the United States becomes evident when one imagines what would happen if Mexican immigration abruptly stopped. ... most important of all, the possibility of a de facto split between a predominantly Spanish-speaking United States and an English-speaking United States would disappear, and with it, a major potential threat to the country's cultural and political integrity.

And here's David Brooks, senior editor at the Weekly Standard and the New York Times' house conservative, with a NYT op-ed piece "The Americano Dream":

Frankly, something's a little off in Huntington's use of the term "Anglo-Protestant" to describe American culture. There is no question that we have all been shaped by the legacies of Jonathan Edwards and Benjamin Franklin. But the mentality that binds us is not well described by the words "Anglo" or "Protestant."

We are bound together because we Americans share a common conception of the future. History is not cyclical for us. Progress does not come incrementally, but can be achieved in daring leaps. That mentality burbles out of Hispanic neighborhoods, as any visitor can see.

Huntington is right that Mexican-Americans lag at school. But that's in part because we've failed them. Our integration machinery is broken. But if we close our borders to new immigration, you can kiss goodbye the new energy, new tastes and new strivers who want to lunge into the future.

That's the real threat to the American creed.

And by the way, David Brooks puts a hyperlink to "Foreign Policy" in the on-line version of his NYT op-ed piece. I haven't seen that before -- is it a first lunge into the future for the Gray Lady?

Posted by Mark Liberman at 08:20 AM

Counting poles

There's been some commentary recently about William Safire's column on multipolar, mostly focusing on the oxymoronic phrase "common elitist usage". I was struck by a different part of Safire's piece, his discussion of the etymology of pole

A pole, from the Greek polos, ''axis,'' is ''one of two ends of an axis going through a sphere.''

from which he concludes that multipolar is incoherent, because

An axis has two ends -- no more, no fewer -- and so polar can refer to one end and bipolar to two ends, like a magnet or a couple of superpowers. But since the prefix multi- means ''more than two,'' a multiple prefix was tacked on the stem word that didn't deserve that treatment.

My first reaction was "Sez who?"

I'm no expert in the history of English, but the OED gives two different nouns spelled "pole", and for pole1 says that the etymology is from OE. pál via ME. pôl, with the meaning

1. a. In early use, A stake, without reference to length or thickness; now, a long, slender, and more or less cylindrical and tapering piece of wood (rarely metal), as the straight stem of a slender tree stripped of its branches; used as a support for a tent, hops or other climbing plants, telegraph or telephone wires, etc., for scaffolding, and for other purposes.

The root is the same as for palisade, which certainly covers a multitude of poles. It's pretty rare for an application to rely on a single stake; and most tents and all hops fields have more than one pole. According to the American Heritage Dictionary's Appendix I, the Indo-European root is *pag-, which meant "To fasten", with other derivatives including "fang, peace, pact, palisade, and travel".

The OED does trace pole2 back through Latin polus to Greek polos, with the meaning

1. Each of the two points in the celestial sphere (north pole and south pole) about which as fixed points the stars appear to revolve; being the points at which the earth's axis produced meets the celestial sphere.

Liddell & Scott give a variety of meanings to the Greek source word, in addition to "axis of the celestial sphere" : "centre of the circular threshing-floor", "dowel", "windlass, capstan" among others. Since the Indo-European root is *kwel- meaning "To revolve, move around, sojourn, dwell", with derivatives including "colony, cult, wheel, cyclone, pulley, and bucolic", it seems clear that the "axis of a turning sphere" sense is a specialization of a much less constrained meaning, hardly limited to one thing with two ends.

So we seem to have Safire coming and going, so to speak. However, he is dealing not with pole but with polar, and the -ar suffix in question is (I guess) the one that comes from Latin -aris, and occurs in words like lunar, globular, scholar. It would be unexpected to see this suffix added to a Germanic word like the OED's pole1 -- so Safire's etymology is correct. And so is his metaphorical analysis, I have to grant, since pole2 has clearly gone through a long bipolar phase since its early bronze age days as a word for "wandering around".

This affixal selection -- -ar permitting only pole2 -- is too bad, because pole1 has its share of scientific applications as well. These come via the poles (as opposed to zeros) of a transfer function, namely the complex frequencies for which the overall gain of the system is infinite. Although I don't know the history of this term, I've always assumed that it's related to the shape of a plot of the transfer function gain around the pole, which looks like the fabric of a tent approaching a tent pole. It's normal for systems to have multiple poles -- as many as you like -- without causing any problems as long as they stay out of the right half-plane (in the Laplace domain, because this would make the system unstable). So there's no reason that "multi-polar" couldn't refer to the multiple poles of a circus tent or a resonant system -- this would be a great source of metaphors -- if only Old English poles could be polar, or resonance poles were Greek poles rather than Anglo-Saxon ones.

But maybe it's just as well. There are already lots of bad electrical-engineering puns about flights from Warsaw with Poles in the right half-plane, and a smaller number of references to "bipolar bears" in the mental health area. If resonance frequencies could be polar, then some really vile (if mercifully obscure) jokes about instabilities of arbitrage schemes would be inevitable.

Anyhow, it's interesting that two plausibly related senses for the same phonemic string -- English pole as "the axis of a turning sphere" or "a long, slender cylindrical rod" -- turn out to come from completely different historical sources, here the IE roots kwel- (originally meaning "wander around") and pag- (originally meaning "fasten").

[Update: Douglas Davidson writes:

There are many related and unrelated uses of "pole" in science and mathematics, from polar coordinates to polarized light to the singularities of analytic functions that you mention. (If they are in fact derived from 'polos', then the polhode and herpolhode of Poinsot would have to be the strangest.) The one that scotches Safire, however, is the standard multipole expansion, whereby functions are analyzed into their monopole, dipole, quadrupole, etc. components according to their angular distribution. (The odd assortment of numerical prefixes is another topic entirely.)

If "polar" could be a form of the Anglo-Saxon "pole", then any field of hops or three-ring circus tent would refute Safire. Given that it can't, Davidson (and others like Language Hat & Semantic Compositions) are right to brandish quadrupoles and the like in his direction.

The question of the "poles" of analytic functions is a sideshow, but I'm still curious about which kind of pole word they are. I guess I don't have much real evidence for my intuition that they are Anglo-Saxon poles rather than Greco-Latin poles. My first impulse is just based on the tent-pole image, which may be personal and irrelevant. I don't think you can talk about "polar bandwidth" or "polar frequency" in this sense, which might be evidence, but there could be other reasons for that. If the term was (for example) originally French, then it must be the Greco-Latin version after all, but I don't know the history. And I have to confess that polhode and herpolhode are new to me...]

Posted by Mark Liberman at 12:29 AM

February 23, 2004

On beyond Google?

The March Technology Review has an article by Wade Roush on the future of search technology. Things discussed include Mooter's techniques for improving results by tracking users' responses, Teoma's work on identifying the experts within topic-specific clusters of web documents, Dipsie's efforts to mine the "deep web", and Microsoft activities including AskMSR's NLP question-answering techniques.

Posted by Mark Liberman at 10:40 PM

Common Elitist Usage

Our colleague LanguageHat has been reading William Safire, who in his most recent column says that he is guided by "common elitist usage". LanguageHat wonders:

What on earth does this mean? I can only think that, trapped between his automatic deference to prescriptive ukases and a cloudy realization that if everybody is using words in an illogical way usage must trump logic, he squares the circle by means of this oxymoron. I can't decide whether I'm amused or impressed.

Actually, I think that what Safire is saying is perfectly coherent. On the one hand, he is saying that what he considers correct is determined ultimately by usage, not by etymology. On the other hand, he doesn't accept just anybody's usage as authoritative; he defers to "elitist usage".

Deferring to some sort of linguistic authority is what all of us do. Even those of us who feel quite secure in their usage in general recognize certain people or institutions as the authorities on the use of technical terms in areas in which we are not expert, such as names of plants or chemicals. Indeed, one prominent suggestion for how to decide whether two groups of people speak dialects of the same language or different languages is whether they recognize the same sources of linguistic authority.

What is interesting about Safire's statement is ithe ambiguity of the phrase "elitist usage". "elitist" has two meanings:

  • "characteristic of the elite"
  • "characterized by the belief that certain people deserve special treatment by virtue of their supposed superiority"
For me, at least, the latter is the usual if not only meaning, so I lean toward the interpretation that Safire is interested in the usage of elitists. Dictionaries do give the former meaning as well, though, so perhaps he means to say that he follows the usage of the elite.

Of course, this raises the question of who the elite are, and why their usage should be authoritative. There are many types of elites, but usually, in the absence of any other context, the term refers to the elite in wealth, power, and fame, people like George Bush, Michael Jackson, and Martha Stewart. Except for those plant names and chemicals and so forth, I think I'll follow my own usage.

Posted by Bill Poser at 05:57 PM

Colorless green advocates sleep furiously

Penn sponsors an annual lecture on issues in the cognitive sciences, endowed in memory of Benjamin and Anne Pinkel. Last Friday, Ray Jackendoff gave the 2004 Pinkel lecture, on the topic Towards a Cognitive Science of Culture and Society.

Ray's presentation explored an analogy between language and social interaction, which he laid out at the start of his handout in a sort of a table:

Unlimited number of understandable sentences Unlimited number of understandable social situations
Requires combinatorial rule system in mind of language user Requires combinatorial rule system in mind of social agent
Rule system not available to consciousness Rule system only partly available to consciousness
Rule system must be acquired by child with only imperfect evidence in environment, virtually no teaching
Rule system must be acquired by child with only imperfect evidence, only partially taught
Learning thus requires inner unlearned resources, perhaps partly specific to language Learning thus requires inner unlearned resources, perhaps partly specific to social cognition
Inner resources must be determined by genome interacting with processes of biological development Inner resources must be determined by genome interacting with processes of biological development

Ray went on to discuss other aspects of his proposed research program, without dwelling further on the analogy between sentences and "understandable social situations". However, this analogy reminded me of an interesting undergraduate term project from a course I taught last fall with Lyle Ungar, "Introduction to Cognitive Science".

A senior engineering student named Chris Osborn wanted to explore a class of statistical sequence models called "Aggregate Markov Models" (AMMs), which Fernando Pereira used a couple of years ago to show that Noam Chomsky was wrong in 1957 about the statistical status of "Colorless green ideas sleep furiously". Chris decided to try fitting an AMM not to the sequence of words in texts, but to the sequence of speaker names in a discussion. As a source of data, he chose the transcripts of three oral arguments from the 2001 term of the U.S. Supreme Court.

Chris found that a two-class AMM accurately distinguishes the justices from other participants (court officials and lawyers), even when trained on a single transcript of about 250 turns. This is analogous (on a smaller scale) to Fernando's success in inducing word classes that distinguish the probabilities of the different word orders in Chomsky's example by a factor of 200,000. The evaluation is different -- Chris was interested in finding induced classes that make sense, while Fernando wanted lifelike probability estimates for very improbable sequences -- but both applications show unsupervised learning of implicit structure from sequence data.

Chris was not trying to suggest that humans normally analyze turn-taking independent of the content of the turns or other aspects of the context, and I'm not suggesting this either. But if you think about it, an orthographic transcript is a highly abstracted characterization of the actual communicative interaction -- it leaves out everything except the sequence of word identities, more or less -- and the sequence of speaker names is another such abstracted characterization, in which there is also often quite a bit of structure. We know that humans are exquisitely sensitive to the statistical properties of communicatively-relevant behavioral sequences, and there is no reason to suppose that this sensitivity ends at the edges of speaker turns.

Posted by Mark Liberman at 10:11 AM

February 22, 2004

The perils of a dry wit

Doc Searls reports on the case of a credulous business reporter at the Chicago Tribune who picked up and printed -- as if it were true -- a satirical piece on denounce.com about a new FOAF service from amazon.com allowing customers to inspect and manipulate their friends' shopping baskets. A couple of months ago, people curious about the history of Harvard University were being sent by Google to this Language Log post. On the web, caveat lector.

Posted by Mark Liberman at 07:12 PM

Exclusive: Doonesbury rips off Language Log

Language Log readers who are also Doonesbury fans were shocked, shocked, to wake up and find that the Sunday morning Doonesbury strip for today was a one-joke piece entirely devoted to making fun with a curious observation that you read first here on Language Log on January 3rd, namely that communications between Pat Robertson and God appear to establish that God uses like as a hedge, as teenagers do, rather than its more grownup and formal near-equivalent if you will.

As Mark has shown, like it's not a perfect equivalent of if you will in all respects, but it's like, close. The point is that God could easily have said unto Pat, "It's going to be a blowout election, if you will, in 2004." But what is implied in Robertson's report is that God saith unto him, "It's going to be like a blowout election in 2004". The Doonesbury strip has Mark Slackmeyer discovering this fascinating fact on his radio show in an interview with Robertson -- over seven weeks late.

Language Log has spent the morning consulting with its attorneys at the distinguished Boston law firm of Dewey, Cheatham and Howe, and has considered its options for bringing a suit over this flagrant rip-off of our stuff, but after careful reflection on the advice received from counsel we have decided on this occasion not to bring the force of law to bear on Garry Trudeau for this shameless piece of borrowing, and will content ourselves with wagging the finger of warning. (Well, to be honest with you, the advice of counsel was, "Oh, give me a break; you chicken-shit bloggers don't have a prayer. Trudeau is big-league. You guys are the little people. Don't abuse the privilege of having my cell number by wasting my valuable time on this kind of footling nonsense. I'm trying to have breakfast here. I'm billing you $750 for this under the ‘or any part thereof’ clause. [Click.]")

We will not be going to court, then. However, Trudeau now owes us one. Language Log feels morally entitled to rip off one good piece of linguistic material from Trudeau's strip without being criticized for it, and will be looking for opportunities to do that over the next few months.

Posted by Geoffrey K. Pullum at 12:56 PM

February 21, 2004

Speech Recognition Recognizes Intonation?

The other day Bill Poser posted a comment on linguistics sessions at last week-end's annual meeting of the American Association for the Advancement of Science. Not only were there six sessions on language at the meeting, but we had our own theme track, for the first time ever: the AAAS called it "Language, Origins and Development".

But there was also linguistics-related material in some other sessions. The one that surprised me was Mari Ostendorf's talk on "Overview of Speech Recognition" in a mostly-Microsoft session called "Scientific Problems Facing Speech Recognition Today".

Ostendorf, a professor in Electrical Engineering at the University of Washington, gave an interesting survey of what's going on in this field and then turned to her own research. She said that she's currently exploring a new aspect of speech that is especially useful and exciting: prosody! Her main example illustrating this exciting discovery was a natural speech segment recording consisting of two sentences, with clearly audible declarative sentence-final intonation at the end of each sentence; the accompanying written text on the screen lacked punctuation entirely. She pointed out that this short segment was problematic until one considered prosody, at which point the text suddenly made excellent sense. Of course, it was only a problem as long as one omitted all standard punctuation from the written version, a point she didn't mention.

Taking prosody into account, she said, makes her automatic speech recognition system vastly more efficient. Surprise, surprise.

I don't have any trouble understanding why many people who work on speech recognition and machine translation find much of linguistic theory unhelpful, because only some of what linguists know is likely to be useful for these purposes. But announcing the discovery of prosody (she was actually talking mainly about sentence intonation, not other prosodic features) seems a bit much.

It seems especially odd because intonation has been a feature of at least one rather low-tech automatic voice system for a long time. In the wonderful film "American Tongues", a celebration of American English dialects, one segment features the woman whose voice is (or was?) the source of the telephone numbers you get automatically when (for instance) you dial a number that has been changed. She explains on screen that she recorded each numeral from 1 to 9, plus 0, in several different pronunciations, differing by intonation (I don't remember whether she uses that word), so that the entire number will sound fairly natural to the listener. So in a telephone number 228-2228, the first 8 will have clause-final-but-not-sentence-final intonation, and the second 8 will have falling-pitch sentence-final intonation.

So AT&T knew a long time ago what some speech recognition experts are apparently just finding out. In fairness to Ostendorf, I should add that her talk was aimed at a nonspecialist audience, so it's quite possible that she knows about the vast amount of work on intonation in linguistics, including work in computational linguistics. But she certainly didn't mention any.

Posted by Sally Thomason at 05:32 PM

UNESCO International Mother Language Day

"Half of the world's 6000 to 7000 languages are in danger of extinction. International Mother Language Day, celebrated annually on February 21, aims to promote the recognition and practice of the world's mother tongues, particularly minority ones."
(UNESCO Director General's message, Press release).

There are several initiatives to document the state of the world's endangered languages or to promote their development, e.g.: Ethnologue, FEL, ELF, Terralingua, although their resources are tiny in comparison to the scale of the task.

The majority of the world's population speak two or more languages, and "mother language" is the one they learnt first. There are plenty of even more curious descriptors for such neglected languages which I've been collecting over the past few years: exotic language, indigenous language, less-commonly taught languages (LCTL), little-known language, local language, lesser-used language, minority language, non-indigenous minority languages (NIML), languages other than English (LOTE), obscure language, strange language, interesting language, low density language, low diffusion languages, non-major languages, under-represented languages, rare languages, vanishing language, tribal language, non-commercial language, ignored language, under-resourced language, non-industrial language.

My favorite, coined by my esteemed colleague B at Carnegie Mellon University, is crazy little language, as in "OK, suppose we take some crazy little language...".

Posted by Steven Bird at 04:32 PM

No Navajo in Arizona Schools

Proponents of the English Only movement have at various times assured us that they aren't targetting native American languages, only what they see as pandering to immigrant languages. It isn't true. According to this article in the Navajo Times, the State of Arizona is now taking the position that Arizona's English Only Law, proposition 203, forbids the use of Navajo as the language of instruction in state schools. The Findings and Declarations are all about immigrants, but the actual wording of the law makes no distinction between immigrant languages and native languages. The state's new position conflicts with the opinion issued by Arizona attorney general Janet Napolitano in 2001, which held that Proposition 203 barred instruction in languages other than English outside the reservation, but did not do so within the reservation. Funding for Navajo language immersion schools is now threatened.

Posted by Bill Poser at 03:45 PM

The Latin of The Passion

I commented a little while ago on the fact that the use of Latin in Mel Gibson's forthcoming film The Passion is not authentic because Latin was not widely used in Israel. It turns out that isn't the only inauthenticity. According to correspondants who have seen previews of The Passion, in the film Latin is spoken in the Italianate pronounciation, that is, in the pronounciation generally used in the Roman Catholic Church in Italy and the United States. That is not how Latin was pronounced 2,000 years ago.

Now, you might wonder how we know. After all, the Romans didn't bequeath to us any recordings of their speech. And indeed, sometimes, when a language is known only from written records, we may not have a very good idea of what it sounded like. But there are ways of learning what languages sounded like in the past, and when we are lucky it is possible to learn quite a bit. Sometimes we have descriptions by contemporary authors. Sometimes we have indirect evidence, such as the way in which the words of the language in question were written by speakers of other languages, or the way in which loans from foreign languages were written. Some information can be gleaned from variation in spelling. Sometimes we can make inferences from developments in the daughter languages, or from the application of phonological rules.

In the case of Latin, we know quite a lot. The best summary of our knowledge of the pronounciation of Latin is a slim book entitled Vox Latina: A Guide to the Pronounciation of Classical Latin by W. Sidney Allen. It has a companion entitled Vox Graeca that summarizes our knowledge of the pronounciation of ancient Greek. The most obvious difference between the Italianate pronounciation and the classical pronounciation is the use before [e] and [i] of the palatal affricates [tɕ] (as in cheese) and [ʤ] (as in judge) in place of the velar stops [k] (usually written <c>) and [g] respectively. According to Allen, that pronounciation didn't arise until at least the fifth century C.E.

By the way, although it has pride of place due to the role of the Italian clergy in the church, the Italianate pronounciation is not the only one used in the Catholic church. There are various "national" pronounciations, which at times have been defended vociferously against reform. According to Allen (p. 104):

The reforms were, however, opposed by the Chancellor of the University [Cambridge - WJP], Stephen Gardiner, Bishop of Winchester, who in 1542 published an edict specifically forbidding the new pronounciation of either language. As penalties for infringement, M.A. s were to be expelled from the Senate, candidates were to be excluded from degrees, scholars to forfeit all privilges, and ordinary undergraduates to be chastised.

Gardiner's edict was only repealed in 1558 on the accession of Elizabeth I.

Posted by Bill Poser at 02:46 PM

On not avoiding negatives

Recent posts by Geoff Pullum and Mark Liberman have reminded me of one of my favorite topics from a course on Language and the Law that I taught some years ago. A penchant for sentences with multiple negatives is one of the things that make jury instructions notoriously hard to understand. (On the general topic, see the classic article by Robert P. Charrow & Veda A. Charrow, `Making legal language understandable: a psycholinguistic study of jury instructions', Columbia Law Review 79/7:1306-1374, 1979.)

Here are two examples from California State jury instructions, the model for jurisdictions around the country. The first uses avoid, which, as Geoff and Mark have observed, is highly problematic for current English speakers; the second has three negatives, one not and two negative prefixes, mis- and un-.

  1. ...If such a result from certain conduct would be foreseeable by a person of ordinary prudence with like knowledge and in like situation, and if the conduct reasonably could be avoided, then not to avoid it would be negligence.

  2. Failure of recollection is a common experience and innocent misrecollection is not uncommon.

And here's an example of how confusing negatives can be in actual discourse, at least in a hostile ``conversation'' when Grice's Cooperative Principle is not operating. Below is a bit of the transcript of a trial (Pittsburgh, PA, March 1984) in which the prosecutor is cross-examining the defendant:


Now I ask you, is it not true that you weren't at home the night of the robbery?




Were you not, in fact, at the bar?


I already told you I was at home!

But my all-time favorite example of confusing multiple negatives is in the 1974 impeachment debate in the House Judiciary Committee. I don't know how many of the Congressmen in this debate are lawyers; there's some amusing research that indicates that not even lawyers understand legalese, though they all think they do.

Below is the House Judiciary Committee debate on unless in the House of Representatives, Committee on the Judiciary, Washington, DC; it's from the Debate on Articles of Impeachment of President Richard M. Nixon, and it took place on Friday, July 26, 1974. (I got this from Nuel Belnap in 1983. I've omitted Nuel's logical "if...then's" here -- you can find them on the web by googling a chunk of the debate -- but note that Latta's first two reformulations, Mann's formulation, and Dennis's second reformulation are in direct contradiction to McClory's motion. At least, I think I've got that right....)

The committee met, pursuant to notice at 11:55 a.m., ... Rep. Peter W. Rodino, Jr. (Chairman) presiding.

Mr. McCLORY: I have a motion at the clerk's desk which I have distributed among the members, Mr. Chairman.

CHAIRMAN: The clerk will read the motion.

CLERK (reading): Mr. McClory moves to postpone for 10 days further consideration of whether sufficient grounds exist for the House of Representatives to exercise constitutional power of impeachment unless by 12 noon, eastern daylight time, on Saturday, July 27, 1974, the President fails to give his unequivocal assurance to produce forthwith all taped conversations subpoenaed by the committee which are to be made available to the district court pursuant to court order in United States v. Mitchell ...

Mr. LATTA: ...I just want to call (McClory's) attention before we vote, to the wording of his motion. You move to postpone for 10 days unless the President fails to give his assurance to produce the tapes. So, if he fails tomorrow, we get 10 days. If he complies, we do not. The way you have it drafted I would suggest that you correct your motion to say that you get 10 days providing the President gives his unequivocal assurance to produce the tapes by tomorrow noon.

Mr. McCLORY: I think the motion is correctly worded, it has been thoughtfully drafted.

Mr. LATTA: I would suggest you rethink it. ...

Mr. MANN: Mr. Chairman, I think it is important that the committee vote on a resolution that properly expresses the intent of the gentleman from Illinois (McClory) and if he will examine his motion he will find that the words `fail to' need to be stricken and ...

Mr. McCLORY: If the gentleman will yield, the motion is correctly worded. It provides for a postponement for 10 days unless the President fails tomorrow to give his assurance, so there is no postponement for 10 days if the President fails to give the assurance, just 1 day. I think it is correctly drafted. I have had it drafted by counsel and I was misled originally, too, but it is correctly drafted. There is a 10-day postponement unless the President fails to give assurance. If he fails to give it, there is only a 24-hour or there is only a 23 and a half hour day (sic).

Mr. RANGEL: Mr. Chairman?

Mr. McCLORY: I think the members understand what they are voting on.

Mr. DENNIS: Will the gentleman yield to me?

Mr. RANGEL: Mr. Chairman --

Mr. DENNIS: The gentleman yielded to me, Mr. Rangel. Excuse me. I know you did not realize that fact.

Mr. RANGEL: No, I did not.

Mr. DENNIS: He did not. I realize that. What Mr. Mann says and what Mr. Latta says is true. In my opinion. It would be much better drafted if you said `provided that' or `unless he does not', or something, but I think nevertheless, the gentleman from Illinois is correct, that although this is a very backhanded way of stating it, it does in fact state it because it says he gets 10 days if he does not -- well, it is a backhanded way of stating what the gentleman is trying to state. It could be improved but what he is doing is nevertheless there.

Mr. MANN: I guess we can settle for it as long as we all understand it, Mr. Chairman.

CHAIRMAN: Will the gentleman yield?

Mr. RANGEL: Mr. Rangel, I think this motion itself has provided sufficient delay and I move the question.

CHAIRMAN: The question is on the motion of the gentleman from Illinois. ...

CLERK: Mr. Chairman, 11 members have voted aye, 27 members have voted no.

CHAIRMAN: And the motion is not agreed to. ...

Posted by Sally Thomason at 11:49 AM

Who is to be master?

Following up on Geoff Pullum's post about "too complex to avoid judgment", I checked Google for patterns such as "no * is too * to avoid" and "no * is too * to ignore". If I've analyzed the results correctly, it appears that the "incorrect" interpretations of phrases instantiating these templates far outnumber the "correct" interpretations. I've put "correct" and "incorrect" in scare quotes because if we this were a matter of word meaning, we good descriptive linguists would say that the speech community had simply changed its mind about what the word means, thereby changing the word meaning and making the "incorrect" usage ipso facto "correct".

It's not clear to me what the right analysis is here. One story says that this is a matter of logic, which can't be changed by voting, and that the predominance of wrong usages is to be explained by psychological arguments. Another story says that the language is changing, whether syntactically (starting to revert to negative concord, which the vernacular has never abandoned?) or semantically (does no in the determiner of a subject NP have some unexplored interpretive possibilities?) or by developing a syntactic/semantic "construction" with a non-compositional meaning. As a simple phonetician, I'll leave these questions to the professionals, but the first sort of explanation seems more plausible to me, though the second would be more fun.

Here are some details.The pattern "no * is too * to avoid" gets 54 ghits. Most but not all the examples seem to count the negatives wrong:

[N]o executive is too prominant (sic) to avoid the long arm of the law
No one is too young to avoid being tempted
No business is too small to avoid or ignore protecting itself from another business using its name, product, service or invention.
No sacrifice is too great to avoid total destruction in Gehenna.

The seven "sacrifice" cases all seem to be variants of the same religious document, which asserts that "No sacrifice is too great to avoid total destruction in Gehenna." I read this as a "correct" calculation of the meaning: "to avoid total destruction, no sacrifice is too great." Note that in this case, unlike in most of the "wrong" ones, the sacrifice is neither doing the avoiding nor being avoided. There are a few other "correct" usages, but the "wrong" ones far outnumber them.

The pattern "no * is too * to ignore" has 50 ghits, mostly details, errors, issues, advantages etc. that are too small to ignore:

Five Star Events believes no detail is too small to ignore
Kelly... said that in the playoffs no advantage is too small to ignore
No error is too small to ignore - I want to make the second edition perfect!

One writer seems to have gotten his wires more seriously crossed, misplacing "seem" as well as losing track of his negatives:

Everything I seem to have done I have done well in, and no detail is too small to ignore.

Eliminating a few things that don't belong in the output of Michael Leuchtenburg's snowclone_google.pl program, we get as "wrong" examples:

no detail is too small to ignore: 7
no error is too small to ignore: 3
no conflict is too distant to ignore: 2
no issue is too small to ignore: 2
no advantage is too small to ignore: 2
no point is too small to ignore: 1
no profit is too small to ignore: 1
no contribution is too small to ignore: 1
no skill is too small to ignore: 1
no mission is too small to ignore: 1
no amount is too small to ignore: 1
no detail is too minor to ignore: 1

I've eliminated a differently-parsed example that is not relevant: "the problem of no-shows is too costly to ignore". There are only two cases where the literal meaning is (I think!) the intended meaning. However, these examples seem to be sarcastic, which makes me worry about whether I've analyzed them correctly:

No blemish is too hard to hide,. No problem is too big to ignore,. As long as you don’t hear complaints,. Why should you care?
Of course, the muted outrage and lack of debate over these lies and prevarications merely adds to the sense that no lie is too big to ignore.

It occurs to me that the mistakes (if that's what they are) may be caused by a sort of constructional resonance. What the writers really want to say seems to be something like "no X is so Y that we (normatively or habitually) Z it", or "no X is so Y that it (normatively or habitually) Zs". However, they can't quite figure out how to frame that in an idiomatic way with the pieces that come to hand, and as their mental generation process is fiddling with the fragments, everything kind of slips into the familiar and similar frame of phrases like "the box is too heavy to lift" or "Kim is too drunk to drive", in the form "no X is too Y to Z". In this solution, the interaction of negation, scalar direction and infinitival control doesn't work out right, but it's hard to calculate these things, and so the result passes muster.

Another possible source of confusion is the difference between asserting that someone or something is not past a certain limit, and asserting that no such limit exists. If I say "This flaw is not small enough to ignore",or "This man is not important enough to avoid prosecution", it's natural to understand this by reference to a threshold of size or importance that the flaw and the executive don't reach. But the intent of statements like "No flaw is too small to ignore" and "No man is too important to avoid prosecution" is precisely that no such threshold exists, and this may be why the obvious alternative phrasings "No flaw is small enough to ignore" and "No man is powerful enough to avoid prosecution" are not chosen.

[Update: there are many similar patterns that reliably produce examples that mean the opposite of what their writers clearly intended:

However, despite all this and her duties as Party cell secretary, she is never too busy to ignore the needs of children.
However, one should not be too complacent to ignore the uncertainties that are prevailing in the domain of the global economic waters nowadays.
Of course, I hope my clients will not be too stubborn to ignore this advice.
No risk is too great to prevent the necessary job from getting done.
A certain situation may appear dangerous to most people, but to a journalist, no situation is too dramatic to prevent their story from happening.
The store is known for its personal and friendly service; no request is too wacky to refuse.

A reader suggests that

I wonder if what is going on here is that "so" has merged with "too"? All of these examples become good if "too" is replaced with "so", e.g.:

"No head injury is so trivial as to be ignored."

"No executive is so prominent as to avoid the long arm of the law."

My impression is that this use of "so" has essentially dropped out of common use and that "too" has taken over its role.

I don't think this can be true in general. "Kim is so upset as to scream" is not very idiomatic, but "Kim is too upset to scream" has not taken over its meaning. If there's been a change of this sort, it must be limited to certain phrasal templates, which I guess might count as what some people these days call " constructions". However, it's suspicious if the "constructions" in this case turn out to be exactly those sentences in which the meaning is hard to compute for independent reasons. That's why I suggested above that people start out wanting to say something like "no X is so Y as to Z" but wind up using "no X is too Y to Z" -- not because of a general meaning change, but because the "right" outcome is problematic and the "wrong" one is both similar enough to activated instead, and also too complex semantically for its opposite meaning to be obvious.]

Posted by Mark Liberman at 09:05 AM

Too complex to avoid judgment?

Deputy Attorney General James Comey, speaking after the indictment of Enron ex-CEO Jeffrey Skilling, got himself into one of those curious tangles where the combination of implicit and explicit negations in the sentence outstrip the logic centers of the brain and you say the exact opposite of what you meant:

"The Skilling indictment demonstrates in no uncertain terms that no executive is too prominent or too powerful and that no scheme to defraud is too complex or too fancy to avoid the long arm of the law."

I read that in the San Francisco Chronicle; you can read it on MSNBC; it was on NPR's voices-in-the-news feature on the Sunday Weekend Edition on February 27. But Comey meant the exact opposite of what he said.

To say that no scheme is too complex to avoid the law is to say that avoiding the law (getting away with it) cannot be prevented by excess complexity. But Comey clearly meant that failing to avoid the law (falling into the clutches of the prosecutors) will never be prevented by excess complexity. So he should have said that no scheme is so complex that it can avoid the law; or (equivalently) that no scheme is too complex for it to be subject to legal investigation and prosecution.

Yet virtually no one will have spotted the error. That's a very curious fact, for which I have nothing that would count as a serious explanation. It is perhaps worth pointing out, though, that there are three waves of negation in what Comey said. One wave comes in the multiple no determiners ("no scheme is F" means "it not the case that some scheme is F"); another is implicit in the multiple too modifiers ("too tired to rock" means "so tired that one is not able to rock"); and a third is implicit in the verb avoid ("avoid doing X" means "manage to not do X"). Human brains don't function well in the face of N negations for N > 2. And it may be worse when two out of three are implicit.

A case I've personally observed that is puzzling in a vaguely similar way is the phrase "filling a much-needed gap", which I actually saw as a headline in a Salvation Army newsletter -- devoted to a much-needed program that was filling a gap (it was not the gap that was much-needed). More closely similar (because it involves too) is a case that Kai von Fintel has briefly discussed, the putative hospital emergency room sign "No head injury is too trivial to ignore". No example of this sort is too fascinating not to avoid notice on Language Log.

[Note added later: No one is immune to occasional trouble with implicit negatives. Just last night I was involved in (and on the losing end of) a philosophical discussion with Barbara Scholz, who is extraordinarily acute of thought and careful of speech, and she told me sternly that there was an important point that I was failing to miss. This was not, of course, what she wished to say to me.]

Posted by Geoffrey K. Pullum at 01:23 AM

February 20, 2004

The pleasures of linguistic foolishness

There is surely more nonsense written about language than about any other topic. A right-thinking person is daily puzzled, annoyed or even infuriated by ignorant assertions about the proper use of words, incoherent prescriptions for good writing, overblown or misleading interpretations of animals' abilities, stunningly silly historical theories, bizarre clinical hypotheses, and an inexhaustible variety of other confident but false claims.

As an alternative to tranquilizers, a dose of humility can help us bear the load: everybody makes mistakes. Or we can go the other way, and take the attitude of H.L. Mencken's essay On Being an American, which ends:

... here, more than anywhere else that I know of or have heard of, the daily panorama of human existence, of private and communal folly--the unending procession of governmental extortions and chicaneries, of commercial brigandages, and throat-slittings, of theological buffooneries, of aesthetic ribaldries, of legal swindles and harlotries, of miscellaneous rogueries, villainies, imbecilities, grotesqueries, and extravagances--is so inordinately gross and preposterous, so perfectly brought up to the highest conceivable amperage, so steadily enriched with an almost fabulous daring and originality, that only the man who was born with a petrified diaphragm can fail to laugh himself to sleep every night, and to awake every morning with all the eager, unflagging expectation of a Sunday-school superintendent touring the Paris peep-shows.

By the way, the quoted passage is typical of Mencken in containing 16% adjectives, twice the norm for academic prose. I'd like to have heard Mencken discuss this issue with his contemporary Strunk.

Posted by Mark Liberman at 08:14 AM

February 19, 2004

The bride wore black

In today's NYT, Craig Smith describes a marriage ceremony in Paris between a female police officer and a dead man:

"I had what you can call a perfect wedding," Ms. Demichel said the next day, chain-smoking beside her new mother-in-law in a Paris café.

I don't know if the connection between this story and the current American concern about non-traditional marriage pairings is irony or chance. But I'm sure that singer/songwriter Teddy Goldstein didn't know about the French practice when he wrote his comic song Widow:

... the thought of a wife
while I'm still alive
makes me skip a breath,
why it's enough to scare me to death.
Still I care for you, more than I often care to show,
So I started thinking about how

Some day, babe, when they lay me in the grave,
I hope you'll be there, wearing a black veil.
I said some day, babe, when they lay me in the grave,
I pray that they'll be calling you by my last name, saying "so sorry, are you OK?"
I guess what I'm trying to say, why I'm down here on bended knee,
What I want to know,
is, will you be my widow?

So when I die, and the preacher asks "did he have a wife",
Baby well that's your cue, to stand up and say "I do".
Because I'll let it now be said
I will marry you when I'm dead.
So you should be thinking about how


I hasten to point out that the French practice is a sentimental remedy for the death of a fiancé(e), and has nothing to do with the difficulties of marriage to a live spouse.

Posted by Mark Liberman at 09:51 PM

Three is a trend

The fad of using camera cell phones to take up-the-skirt photos of women in public places (often on escalators or from below staircases) is known as upskirting. Washington state tried to make it illegal, but was shocked recently to find that its State Supreme Court had ruled it (though "disgusting and reprehensible") perfectly legal under the state constitution if done in a public place.

The emergence of this technologically-assisted voyeurism should not surprise us (show me a technology that hasn't found new and unintended uses in the hands of male jerks). But the word upskirting is more interesting. It has a mildly surprising structure: It's a gerund-participle formed on a compound verb base where the compound consists of a preposition and a noun, interpreted as preposition + object. These are quite rare.

In general they are not productively available. That is, people don't just casually refer to jogging with weights as "withweighting", or to looking things up in books as "inbooking", or to flying small planes under suspension bridges as "underbridging", and expect to be understood without explanation.

Lest anyone suggest that downsizing is a relevant example, let me point that it doesn't: it does not derive its meaning from a preposition phrase "down the size".

Yet one by one, quite slowly, a small number of P + N compounds are being added to the English vocabulary. An example that does have the structure I'm talking about is overlanding, meaning travelling overland (usually in an all-terrain vehicle). Another (referring to making descents of cliffs or office block walls on ropes) is downwalling. (There is a film called Downwalling, directed by Lubomir Slavik and Jaromir Zid, produced by LUJA Studio in the Czech Republic. It was shown at the Banff Mountain Film and Book Festival.)

I'd predict that more such compounds are to come. One is an exception, two are a couple of anomalies, but three is a trend.

Posted by Geoffrey K. Pullum at 04:56 PM

February 18, 2004

Desires, beliefs, conversations

Rebecca Saxe has an essay in the current Boston Review entitled "Reading Your Mind: How our brains help us understand other people":

Children's early understanding of what makes people do the things they do appears to develop in two stages. In the first stage, children understand that people act in order to get the things they want: that human beings are agents whose actions are directed to goals. ...

Children in the first stage are missing something very specific: the notion of belief. Until sometime between their third and fourth birthdays, young children seem not to understand that the relationship between a person's goals and her actions depends on the person's beliefs about the current state of the world. ...

An impressive conceptual change occurs in the three- or four-year-old child.... a ... transition from the first stage of reasoning about human behavior, based mainly on goals or desires, to the richer second stage, based on both desires and beliefs.

This cartoon (by Dan Zettwoch, based on a story by Jason Shiga) shows how difficult -- and important -- such reasoning can be. As I wrote a few months ago, the cartoon has a special force for anyone who has ever tried to work these calculations out in explicit, logical terms.

Once you get past the stereotyped formulae, to participate effectively in a conversation requires making and re-making plans and predictions about how your actions will affect your interlocutors' states of mind. Even tracking the conversations of others involves lots of inferences about other people's desires and beliefs, and the half-conversations that cell phone users impose on us may be obnoxious because we have to "read minds" with only half the normal evidence.

It seems likely that this ability to "read minds" -- and to plan to change minds -- is a critical piece of the evolutionary history of language, and perhaps a rather recent one. A 1998 review article by C.M. Heyes entitled "Theory of Mind in Non-Human Primates" concludes that

A survey of empirical studies of imitation, self-recognition, social relationships, deception, role-taking and perspective-taking suggests that in every case where nonhuman primate behavior has been interpreted as a sign of theory of mind, it could instead have occurred by chance or as a product of nonmentalistic processes such as associative learning or inferences based on nonmental categories.

Saxe describes some recent experiments by Brian Hare, which suggest that chimps can reason about other chimps' mental states "in a competitive setting where some natural benefit follows from knowing what the other chimpanzee believes", though she also says that "[t]here is another way for the subordinate chimp to solve the competitive problem, one that depends only on certain behavioral associations, and not on ideas about beliefs at all."

Neither Saxe nor Heyes discusses parrots.

Posted by Mark Liberman at 07:46 PM

A poem about adjectives

Elizabeth Akers Allen (1886) wrote:

Where would the force of language be
Without the adjective?
How could the critic wing his shaft?
How could the poet live?

How could the novelist portray
The creatures of his brain,
The beauty of his heroine,
The transport of his swain?

No more his tide of eloquence
The orator could pour,
No more the man of science fill
His treasuries of lore.

The lover's tongue could never tell
His passion and despair;
Deprived of its superlatives
Who would for flattery care?

Where would the sting of satire be?
The edge and point of wit?
How could the stab of censure wound,
The dart of sarcasm hit?

Biographers would cease to prowl,
Historians drop the pen,
Paralysis would chill and numb
The tongues and minds of men,---

The press would lose its voice of might,
The pulpit all its power,
The sage could not describe a star,
The botanist a flower,---

So rarely is a period penned,
A line or sentence made,
Or thought set down, O adjective,
Which does not claim thy aid!

Yet I for once defy thy might,
For mark me, as I live,
No stanza of the nine here writ
Contains an adjective!

Posted by Mark Liberman at 07:30 PM

Those who take the adjectives from the table

It is not entirely easy to tell when Ben Yagoda's pieces in The Chronicle of Higher Education are evidencing his dry wit and when they are being serious. But I hope he is not at all serious in his apparent partial agreement with the experts on writing who insist that adjectives are bad. (Most of his article is organized around examples of adjective use that he clearly loves.) I really don't know how any of these people managed to reach the stage of being thought expert. How could "one of the few points on which the sages of writing agree" possibly be that "it is good to avoid them" when to utter the very thought you need the adjective good? How could William Zinsser possibly be serious in saying that most adjectives are "unnecessary" when he couldn't finish his sentence without the adjective unnecessary? How could Yagoda himself suggest that writers mainly use adjectives because they are "they either haven't, or are afraid they haven't, provided sufficient data", while using the adjectives afraid and sufficient in order to say it? Was he afraid of having insufficient data when he wrote his sentence? Or is he above the rest of us?

He is right, of course, that the so-called experts condemn the adjective. If you want to see what the very worst of the usage and style recommenders say, it is always a good idea to turn to Strunk and White's The Elements of Style first. Sure enough, on page 71 of the 4th edition, they say: "Write with nouns and verbs, not with adjectives and adverbs." As usual, moronic advice, and impossible to follow. And in the very next sentence they use adjectives themselves, of course. (An indecisive disjunction of adjectives, in fact: "weak or inaccurate". Well which is it? Be clear, they would say to you if you wrote that.)

What do these writing experts think they are doing trying to take something as subtle as how to write well and boil it down to maxims as simple as the avoidance of one particular grammatical category? Are they... Well, I'm really going to need an adjective to say this... Are they insane?

Look, you don't get good at writing by deleting adjectives. Writing is difficult and demanding; you can learn to get moderately good at it through decades of practice writing millions of words and critiquing what you've written or having others critique it. About 6% of those words will be adjectives, whether you write novels or news stories, whether they're good or bad.

The exception is that if you belong to the academic chattering classes --- the literary experts who tell other people to avoid adjectives --- the frequency goes up to over 8% in your academic prose. As in so many other domains, the very people who tell you not to are doing it more than you are. As Bertold Brecht put it:

Those who take the meat from the table
Teach contentment.
Those for whom the taxes are destined
Demand sacrifice.
Those who eat their fill speak to the hungry
Of wonderful times to come.
Those who lead the country into the abyss
Call ruling too difficult
For ordinary men.

Those who lard their prose with juicy, slobbering, adjectival modifiers, he might have added, write stupid little books like The Elements of Style that tell you not to. The second word in Roger Angell's Foreword to the 4th edition of Strunk and White is an attributive adjective. In E. B. White's introduction to the book, the 6th word is an attributive adjective and there is another in the 4th line and so it goes on. The first two chapters of the main part of the book both have titles that begin with an attributive adjective. There is one in the first line of the text of the first chapter. I won't go on. Just take your copy of that vile little work with its absurd advice ("Use the active voice"; "Omit needless words"; "Be clear" --- all of them, notice, phrased with adjectives) and drop it in the wastebin.

Bibliographical credits

Quantitative data source: Douglas Biber et al., The Longman Grammar of Spoken and Written English (London: Longman, 2002), p. 506.

Crappy usage advice: William Strunk and E. B. White, The Elements of Style, 4th edition (New York: Allyn and Bacon, 2000), with a foreword by Roger Angell.

Article by Ben Yagoda: The Chronicle of Higher Education, February 20, 2004.

Adjectives by Webster's Third New International Dictionary. All adjectives driven by professionals on a closed course; do not try this at home.

Posted by Geoffrey K. Pullum at 02:07 PM

Endangered Languages and Programming Languages

A recent article on Slashdot makes a connection between endangered languages and programming languages:

An article from NewScientist.com reports that half of all human languages will have disappeared by the end of the century, as smaller societies are assimilated into national and global cultures. This may be great news if one is looking at a common standard for communication, but it dosen't help those designing the next generation of programming languages. For example, there's an extremely strong link between Panini's Grammar and computer science (PDF link), and with every language lost, there is a possibility that we may have missed an opportunity at improving the underlying heuristics.

As a student of endangered languages and a proponent of linguistic diversity, I can't help but be sympathetic, but I'm afraid that this misses the point. The interest of Pāṇini's grammar of Sanskrit, the अष्टाध्यायी (aṣṭādhyāyī), for computer science lies in the devices it uses for grammatical description, not in the details of the Sanskrit language that it describes. Pāṇini's approach to grammatical description was intended to be applicable in principle to any language, and indeed it was used to give elegant descriptions to a number of other languages. One of these was the தொல்காப்பியம் (tolkaappiyam), a grammar of Tamil, a Dravidian language unrelated to Sanskrit. The connection between the Indian grammatical tradition and formal language theory reflects the deep similarities among human languages, not their differences. There are many reasons to cherish the diversity of natural languages, but their potential contribution to programming languages probably isn't one of them.

Posted by Bill Poser at 12:59 AM

February 17, 2004

Latin and Science

I spent a four-day weekend at the Annual Meeting of the American Association for the Advancement of Science in Seattle, where, in spite of linguists' alleged taste for secrecy, a number of symposia were devoted to linguistic topics:

There was also a topical lecture entitled Language Acquisition and Creolization: How Children Shape Languages by Elissa Newport.

At one symposium, a member of the audience made a comment about the importance of Latin in the development of science, allegedly due to Latin being particularly precise. The alleged precision of Latin is also one of the reasons traditionally given for the study of Latin as a school subject. Now, I have got nothing against Latin. I have some acquaintance with it myself. But this idea about the superiority of Latin is bizarre. I' ve certainly never seen any evidence that Latin is any more precise than any other language, but it is the idea that Latin is responsible for the development of science that really takes the cake. Science as we know it was created by speakers of Greek, preserved and transmitted primarily by speakers of Arabic, and developed in its modern form by speakers of Italian, French, German, English and various other languages. For a time most of them wrote in Latin, but they surely did their thinking in the vernacular. The development of science did not slow down a whit when scientists ceased to publish in Latin. Where do people get this idea?

Posted by Bill Poser at 07:46 PM

Adjectives: the Rodney Dangerfield of grammatical categories

Ben Yagoda writes about adjectives in the Chronicle of Higher Education. The lead:

As far as not getting respect goes, adjectives leave Rodney Dangerfield in the dust. They rank right up there with Osama bin Laden, Geraldo Rivera, and the customer-service policies of cable-TV companies. That it is good to avoid them is one of the few points on which the sages of writing agree. Thus Voltaire: "The adjective is the enemy of the noun, though it agrees with it in number and gender." Thus Twain: "When you catch an adjective, kill it."

Let's add Fowler's complaint about Kipling's "remorseless and scientific efficiency in the choice of epithets", recently noted here.

Posted by Mark Liberman at 01:14 PM


We've heard that China is moving in the direction of a market economy, but this item from today's English edition of the People's Daily is an example of capitolism gone wild:

The Chinese government began standardizing the written Chinese language in 1952, and today, the ancient, complex characters are Beijing used mainly in Hong Kong, Macao and Taiwan.

[Update: This article is also off base in suggesting that the use of abbreviated Chinese characters in the Ming dynasty is news. Many of the simplified characters made official in Mainland China are variants widely used in handwriting and in informal contexts, such as signs in shops and markets, for centuries. This comment by Wolfgang Behr points out that the short form of "10,000" goes all the way back to the Oracle Bone inscriptions!]

Posted by Bill Poser at 12:52 PM

Another mad cow word

Continuing our series on mad cow terminology (here and here), we can add a new one: bovine amyloidotic spongiform encephalopathy, or BASE, cited in a new PNAS study (apparently dated Monday but not yet on the PNAS Early Edition site) and discussed in this NYT story today. As of this moment, even Google doesn't have it.

The reported results are distressing news because (quoting the NYT) "the discovery of new forms suggests that many cases of 'sporadic' human disease — by far the most common kind, responsible for about 300 deaths a year in the United States — are not spontaneous at all, but come from eating animals."

Posted by Mark Liberman at 12:30 AM

February 16, 2004

Hung like a hero

I remain puzzled about the history of "a(n) hero". Being home sick, I couldn't get to the library, but had a little time for recreational investigation on the internet. It seems that there was a period centered around 1800 when "an hero" was common, as suggested by this histogram of the death dates of the 60-odd authors that lion.chadwyck.com finds for the search string "an hero". By 1900, "a hero" is all that is found; and the pre-1700 citations also seem to be mostly of that form, though there are not many of them.

Interestingly, there are scattered instances of "a hero" through the 18th and early 19th centuries as well, perhaps as common as the "an" version during that period, or even commoner. I've reproduced one of these in its full context below, because I hadn't realized that Jonathan Swift wrote gangsta ballads.

Clever Tom Clinch going to be hanged. Written in the Year 1726.

As clever Tom Clinch, while the Rabble was bawling,
Rode stately through Holbourn, to die in his Calling;
He stopt at the George for a Bottle of Sack,
And promis'd to pay for it when he'd come back.
His Waistcoat and Stockings, and Breeches were white,
His Cap had a new Cherry Ribbon to ty't.
The Maids to the Doors and the Balconies ran,
And said, lack-a-day! he's a proper young Man.
But, as from the Windows the Ladies he spy'd,
Like a Beau in the Box, he bow'd low on each Side;
And when his last Speech the loud Hawkers did cry,
He swore from his Cart, it was all a damn'd Lye.
The Hangman for Pardon fell down on his Knee;
Tom gave him a Kick in the Guts for his Fee.
Then said, I must speak to the People a little,
But I'll see you all damn'd before I will whittle.
My honest Friend Wild, may he long hold his Place,
He lengthen'd my Life with a whole Year of Grace.
Take Courage, dear Comrades, and be not afraid,
Nor slip this Occasion to follow your Trade.
My Conscience is clear, and my Spirits are calm,
And thus I go off without Pray'r-Book or Psalm.
Then follow the Practice of clever Tom Clinch,
Who hung like a Hero, and never would flinch.

[According to the OED, whittle (found in line 16) is a variant form of the cant word whiddle "to peach".]

Posted by Mark Liberman at 11:04 PM

Stupid junk mail envelope blather

I already voiced a mild complaint, in an earlier post, that my mortgage company had sent me a special glossy insert sheet containing execrable poetry, as if doggerel would make me love them more. Well, on Saturday I received an envelope from AT&T that had on the cover, in addition to "Urgent Disbursement Notification" (yeah, right), the following legend:

POSTMASTER: If undeliverable, please process following applicable Postal Regulations

Am I really supposed to believe there are some pieces of mail that the postmaster does not process following applicable Postal Regulations? The aforesaid Postal Regulations, with their Needless Capital Letters, only apply to some pieces of mail, and not to all?

Do they really think I am so stupid that if they put this meaningless, redundant blather on the envelope I will be more likely to be interested in their junk mail, and less likely to guess that they want to bribe me with a fake $30 check to switch to their long distance service?

Has somebody out there been telling them that I am a total moron? Is that it? Eh? Does AT&T think I'm a dope, dunderhead, fool, addlepate, moron, simpleton, or absolute drooling, pea-brained, knuckle-dragging halfwit?

Posted by Geoffrey K. Pullum at 09:43 PM

An hero ain't nothing but a hypercorrection

Geoff Pullum emailed a correction to my earlier post on "a(n) hero":

Nice work; but take out the first of the two 18th century quotes; it's irrelevant. "Heroic[k]" is a totally different case, because of the stress contour. There is a surviving affected dialect that has

*an history book an historical novel
*an heretic an heretical opinion
*an habit of stealing an habitual thief
*an hysterectomy an hysterical outburst
*an hero an heroic poem

Also "an hotel" for the really snooty. See THE CAMBRIDGE GRAMMAR, pp.1618-1619 (I wrote that bit; did I labor for nothing?? USE the book Mark, take it down from the shelf and LOOK in it).

Your point about "hero" is good: NO naturally spoken dialect, not even those of the people who stay in an hotel, says "an hero", nor ever has in two hundred years.

He's right -- I checked the book. But Geoff, it's not on line!

Glen Whitman wrote in with a similar observation (though he spared me the page numbers), and added that

many years ago my father was advised not to aspirate his H's and to use 'an' before nouns beginning in H, as a means of minimizing his southern accent in public speaking.

I cut the Addison "an heroick" quote from the earlier post. Perhaps Adams' "an hero" was a hypercorrection by a colonial with aspirations above his station -- I don't know the history -- but "an hero" does seem to be found pretty regularly around the time of Adams' letter, e.g.

But thou complying with thy princely wrath,
Hast shamed an Hero whom themselves the Gods
Delight to honour ... [Cowper's translation of Homer, about 1790]

Then Erjun, to the base a rod,
An Hero favour'd by a God [William "Oriental" Jones, "The Enchanted Fruit", about 1790]

along with a few (earlier?) instances of "a hero":

A Hero, whose bright Fame may gild thy Bays,
And more thy Name, than thou his Glory raise. [Edmund Arwaker, An Epistle to Monsieur Boileau (1694)]

Fancy that can to Clouds of Smoke give Light,
And trace a Hero through the dusky Fight. [Nahum Tate, A POEM ON THE PROMOTION OF SEVERAL Eminent Persons IN CHURCH and STATE (1699)]

What's up with the NYT remains a mystery. However, Robyn Stewart emailed two suggestions:

Any chance the cited "an hero" is a further dig at the quebecois? It wouldn't be unusual to hear a quebecois speaker say that, as a lot of people need to make a conscious effort to pronounce the h, but have internalized the practice of using an before a vowel sound.

I'd vote for plain old typo, though. Perhaps an intervening adjective was removed, and the an left. That's how "an consonant-noun" and "a initial-vowel-noun" get into my writing.

Posted by Mark Liberman at 06:56 PM

Stuck inside of Fowler with the Memphis blues again

Tracking the prescriptivist revulsion against transpire back through time, I recently came to the section on Americanisms in the 1908 second edition of H.W. Fowler's The King's English:

There are certain American verbs that remind Englishmen of the barbaric taste illustrated by such town names as Memphis ... A very firm stand ought to be made against placate, transpire, and antagonize...

I found myself completely unable to understand this. Why should placate remind an Englishman of Memphis? And what is barbaric about Memphis as a town name? Fowler footnotes transpire in order to explain that he oppposes it "[e]ven in the legitimate sense (see Malaprops), originally a happy metaphor for mysterious leaking out, but now vulgarized and 'dead'." Does he really mean that we should remove all dead metaphors from the lexicon? Few words would be left.

Fowler continues:

Mr. Rudyard Kipling is a very great writer, and a patriotic; his influence is probably the strongest that there is at present in the land; but he and his school are americanizing us. His style exhibits a sort of remorseless and scientific efficiency in the choice of epithets and other words that suggests the application of coloured photography to description; the camera is superseding the human hand. We quote two sentences from the first page of a story, and remark that in pre-Kipling days none of the words we italicize would have been likely; now, they may be matched on nearly every page of an 'up-to-date' novelist:

Between the snow-white cutter and the flat-topped, honey-coloured rocks on the beach the green water was troubled with shrimp-pink prisoners-of-war bathing.—Kipling.

Far out, a three-funnelled Atlantic transport with turtle bow and stern waddled in from the deep sea.—Kipling.

Trying to grasp the "remorseless and scientific efficiency" of honey-coloured and waddling, I had an insight. I was trying to make sense of this chapter as a work of scholarship, but I should have been reading it as a modernist prose poem, a sort of cubist collage of the classic themes of European anti-Americanism. Trying to make literal sense of Fowler's objections to Memphis and Kipling is just as pointless as asking Bob Dylan exactly what it means to have the Memphis blues in Mobile:

Grandpa died last week
And now he's buried in the rocks,
But everybody still talks about
How badly they were shocked.
But me, I expected it to happen,
I knew he'd lost control
When he built a fire on Main Street
And shot it full of holes.
Oh, Mama, can this really be the end,
To be stuck inside of Mobile
With the Memphis blues again?

As Fowler's contemporary Mallarme wrote: "Nommer un objet, c'est supprimer les trois quarts de la jouissance du poème qui est faite du bonheur de deviner peu à peu ; le suggérer, voilà le rêve." Well, for some people, anyhow.

After this insight, Fowler's musings go down like sips of vintage port and bites of aged stilton: " foreign ... barbaric taste ...real danger ... remorseless and scientific efficiency ... less desirable character ... anxious ... curious bizarre style ... insinuate itself .. vulgarized .. brief and startling exhaustiveness ... "

It was in 1909, the year after Fowler's work was published, that W.C. Handy moved his band to Memphis and composed the song that came to be known as The Memphis Blues:

Mister Crump don't 'low no easy riders here,
Crump don't 'low no easy riders here.
We don't care what Mister Crump don't 'low,
We gonna bar'lhouse any how,
Mister Crump don't 'low no easy riders here.

Mister Crump don't 'low it, ain't goin' have it here,
Crump don't 'low it, ain't goin' have it here,
We don't care what Mister Crump don't 'low,
We gonna bar'lhouse any how,
Mister Crump can go and catch hisself some air.

By comparison, that's a cold beer on a hot night. I'm with Handy: Mr. Fowler can go and catch hisself some air.

[Update 2/20/2004: H.L. Mencken wrote about the prejudices of anti-americanisms in The American Language (1921), and specifically about some of the words that Fowler objects to:

It is curious, reading the fulminations of American purists of the last generation, to note how many of the Americanisms they denounced have not only got into perfectly good usage at home but even broken down all guards across the ocean. To placate and to antagonize are examples. The Concise Oxford and Cassell distinguish between the English and American meanings of the latter: in England a man may antagonize only another man, in America he may antagonize a mere idea or thing. But, as the brothers Fowler show, even the English meaning is of American origin, and no doubt a few more years will see the verb completely naturalized in Britain. To placate, attacked vigorously by all native grammarians down to (but excepting) White, now has the authority of the Spectator, and is accepted by Cassell. To donate is still under the ban, but to transpire has been used by the London Times.


Posted by Mark Liberman at 02:01 PM

An hero at the NYT

This morning's NYT has a story by Clifford Kraus about insults to French-speaking Canadians on an episode of Late Night with Conan O'Brien that was taped in Toronto. One phrase took me aback:

One reason nerves became so frayed was that the Triumph the Insult Comic Dog routine came just three weeks after Don Cherry, the CBC hockey commentator and an hero among a certain class of rough-and-tumble Anglophone Canadians, poked fun at French Canadian and European players for wearing protective visors on their helmets.

The string "an hero" does have 1,970 ghits, to 2,010,000 for "a hero." The "an hero" collection (leaving aside the German and so on) is a mixture of illiterate or apparently non-native English ("His contributions as an Hero, character artist, comedy roles can never be performed by any other artist." "He makes himself to be an hero when 336 american soldiers lost their lifes in Iraq so far" "Any body can be an hero My hero are The people who tought us what we should know and what we shouldn't.") and antique writings, like this passage from a letter of John Adams, written in September 1776:

Pray tell me, Colonel Knox, does every Man to the Southward of Hudsons River, behave like an Hero, and every Man to the Northward of it, like a Poltroon, or not? The Rumours, Reports and Letters which come here upon every Occasion, represent the New England Troops, as Cowards, running away perpetually, and the Southern Troops as standing bravely. I wish I could know, whether it is true.

The NYT itself over the past 30 days has 23 instances of "a hero" to just this one instance of "an hero".

So what's with Clifford Krauss? Was this just an editorial slip, or the start of an underground journalistic movement to bring back 18th-century linguistic norms?

Posted by Mark Liberman at 09:29 AM

February 15, 2004

Camel words: I'm convinced

Well, Mark's list of words for camel has completely convinced me. I used to be a skeptic, but now I've seen the light. I am now prepared to accept... I am now fully convinced that... umm...

Oh, dear. I forgot what these long lists of far-off peoples' finely differentiated words for particular types of thing are supposed to convince us of. I know it's something terribly important about language and cognition and world view, because almost every introduction to language or culture says so; I just forgot what. How embarrassing.

Posted by Geoffrey K. Pullum at 06:42 PM

46 Somali words for camel

A number of readers have emailed to ask about the "several dozen words for 'camel' in Somali" that I mentioned in an earlier post. After wrestling with my conscience, I've dug out a list of 46 such words that I compiled a few years ago when we studied Somali in a field methods course that I sometimes teach, and I'll share it with you in the hopes of taking some of the heat off the much-abused Eskimos and their words for snow. So now, if you want to observe that things of category X are important to members of group Y, you can go ahead and write "If the Somalis have some 46 different words for camel, then Y must have more than 50 words for X". Geoff Pullum will be able to criticize you for hackneyed rhetoric and banality of thought, as well as for the unmotivated assumption that cultural interest always translates instantly into multiplication of vocabulary. However, he won't also castigate you for incredibly sloppy scholarship on the exotic-language word-count issue. Unless, of course, you come at the end of new chain of serial exaggerations, and write "If the Somalis have more than 460 words for camel, then ..."

My list is somewhat more reliable than the unchecked serial exaggeration of Eskimo snow vocabulary originally documented by Laura Martin, and later popularized and extended by Geoff. At least it's an actual list of alleged words. However, no one should take it as gospel truth. Most of the glosses have been taken from Zorc and Osman's Somali-English Dictionary, and some have been checked with one or two Somali language consultants, with possible scribal errors by me. There are doubtless other words I've omitted -- a brief internet search turned up at least three candidates, cited below -- and perhaps some of the items on my list should be removed.

However, scholarly quibbles aside, there can be no doubt that the Somali have lots of words for (different kinds of) camels -- and even more words for pieces of camel herding, packing and riding gear, camel diseases, things camels do, things you can do with or to camels, camel body parts (whether integrated into live camels or removed for other uses), things made from bits of camel, things that look like bits of camel, and so forth. That's because camels and the material culture of camel husbandry are a big part of Somali life. I do share the general prejudice that it's normal for people to develop lots of words for animals (as well as other things) that are important to them -- consider the set of English horse words that I (not at all a horse person) can think of: arabian, barb, bay, buckskin, cayuse, clydesdale, cob, colt, courser, dobbin, filly, foal, gelding, horse, lippizaner, mare, mount, mustang, nag, palomino, percheron, pinto, pony, przewalski, quarter horse, roan, shetland, sorrel, stallion, steed, stud, thoroughbred, trotter.

A brief internet search turned up a scholarly presentation The Camel in Somalia that claims a total of 8,741,978 camels in Somalia, as of some curiously exact but surely out-of-date census, and asserts that this is "around 50 percent of Africa's total camel population". (By comparison, their slide 10 claims 170,000 camels in Egypt and 165,000 in Saudi Arabia.) Note that slide 9, which presents a table from M.A. Hussein's "Conceptual classification of Somali camel types", characterizes "Southern Somali Camel Types" as hoor, siifdaar and eyddimo. None of these terms are in my list below, nor could I find any likely entries for them in the dictionaries I have at hand.

The CIA World Factbook for Somalia gives the human population as 8,025,190, also as of a long-out-of-date census; but in any case there are apparently roughly as many camels as people in Somalia.

I've tried to leave out camel-applicable words whose primary meaning is more general, such as

horweyn "group of livestock or camels herded separately (usually those who are not giving milk)"\
malloolli "sterile and fat male (of man, ram, camel, etc.)"

It would be easy to add another couple dozen of those.

OK, enough preliminaries. Here's the list:

aaran "young camels who are no longer sucklings"
abeer or ameer "female camel that has not given birth"
afkuxuuble "miscarried camel fetus"
awr "male pack camel"
awradhale "camel that always gives birth to he-camels; stud-camel that always breeds male camels"
baarfuran "female camel that is not used as a pack camel"
baarqab "stud camel"
baatir "mature female camel that has had no offspring"
baloolley "she-camel without calf that will or will not give milk depending on her mood"
buub "young unbroken male camel"
caddaysimo "unloaded pack camel; unpoisoned arrow"
caggabbaruur "lion cub; young camel"
cashatab "female camel that has stopped giving milk or failed to conceive when it was supposed to"
cayuun "camel sp."
daandheer "strong camel of the herd"
duq "old female camel; old woman"
dhaan "camel loaded with water vessels"
dhoocil "bull camel; naughty boy/girl"
farruud or qarruud "mature male camels; elders"
garruud "old male camels; old people"
geel "camel"
gool "fat camel"
guubis or guumis "first-born male camel"
gulaal "male camel unable to project the gland in his mouth; person with hesitant or stammering speech"
guran "herd of camels no longer giving milk that are kept far from dwelling areas"
gurgurshaa "calm, docile pack-camel which can be loaded with delicate items" [from gurgur "to carry things one by one"]
hal "female camel"
hayin "tame pack camel; [atr.] docile; [ext.] simple, uncomplicated"
irmaan "dairy camels"
kareeb "mother camel kept apart from her young
koron "gelded camel"
labakurusle "two humped camel [lit. two-camelhump-er]"
luqmalliigle "young camel"
mandhoorey "lead ~ best camel in the herd"
nirig "camel foal"
rati "male camel"
qaalin "young camel"
qaan "young camel ~ camels"
qawaar "old she camel"
qoorqab "uncastrated male (camel etc.)"
qurbac "young male camel"
rakuub "riding camel (from Arabic)"
ramag or ramad "she camel who has recently given birth"
sidig "one of two female camels suckling the same infant"
tulud "one's one and only camel"
xagjir "milk-producing camel that is partially milked (two udders for human consumption; two for its calf)"

From among the many other camel-related words in Somali, here are a few of my personal favorites;

golqaniinyo "bite given on a camel's flank to render her docile during milking"
gulguuluc "low bellow of a camel when it is sick or thirsty; poem recited in a low voice"
fur "to unload a camel; to open, disclose, set free, decipher, untie"
guree "to make room for a person to sit on a loaded camel; to make space for s.o. in a loaded car or truck"
haneed "left side of cow camel where one stands when milking; good form, nice style"
u maqaarsaar "to put the skin of a dead calf or baby camel on top of a living one in order to induce (cow, camel) to still give milk; [fig.] deceive, mislead or trick s.o."
uusmiiro "to extract water for oneself from the stomach of a camel to drink during a period of drought"
booli "looted camels"

[In case you're puzzled about the gloss for gulaal "male camel unable to project the gland in his mouth; person with hesitant or stammering speech" , it's all explained here. ]

Posted by Mark Liberman at 10:59 AM

Shoes, torches, mothers

This morning's NYT reports the choice of Michael McKean to replace Harvey Fierstein as Edna Turnblad in the Broadway show Hairspray. The story features two separate generalized clichés involving bras, as well as a touching reprise of Oscar Wilde's witticism about men, women and mothers.

One of the generalized clichés is in the headline: "Passing the bra: the search for a new Edna". The other is in a quote from McKean: "Harvey has left some big cups to fill. Seeing him do the part for the first time, I just wanted to shoot myself."

I think that we're meant to understand "passing the bra" by analogy to "passing the torch", rather than by reference to passing "the buck", "the bar exam", "the hat", "the time", "the stone" or "the point of no return." The fact that we figure this out effortlessly, without even noticing, is an interesting (and I think unstudied) instance of psycholinguistic ambiguity resolution. Note that we don't do it by simple-minded semantic analogy -- a bra is surely more like a hat than like a torch -- nor by simple frequency counting, since "pass(ing) the buck" gets 63,800 ghits to 32,800 for "pass(ing) the torch". The 2,131 ghits for "pass(ing) the flame" and the 17,500 for "pass(ing) the baton" don't make up the difference, especially if we were to factor in the alternatives that yield a meaning similar to "hat", such as "collection plate".

The "big X to fill" pattern is not so idiomatically protean. Of course there are compositional uses like "The venue is a big hall to fill", but the only idiomatic cases seem to be generalizations of the cliché "big shoes to fill", which has 25,300 ghits.

"Big boots to fill" is apparently a regional or perhaps idiolectal variant, with 1,010 ghits:

Fellow World Cup winner Richard Hill added: "Whoever takes his place will have very big boots to fill...."
Yvonne is such a great secretary. She's going to leave some big boots to fill.
The C-2500L then as the direct replacement for the D-620L has big boots to fill..

And then there is a long, long tail of self-consciously cute examples involving other kinds of footwear:

The Rink Rats want to thank Steve for his hard work during his tenure as Manager and we know he has left some big skates to fill.
First-team selections Jessica Mendoza and Sarah Beeson, and second-team selection Robin Walker, left big cleats to fill on the mound, at first base and shortstop, respectively.
When King David died, Solomon, who had a sinking feeling that he had some mighty big sandals to fill, succeeded him.
"Those are some big sandals to fill," laughs Sebastian Bach -- better known to some as the wild-tressed former frontman for Skid Row -- as he prepares for a national tour of the rock musical, Jesus Christ Superstar.
With talented backstrokers, medley swimmers, and divers rounding out their roster, the Tigers' only glaring hole going into this season is at distance free. School record holder Kevin Volz '02, who came in third at Easterns, has left some big flip-flops to fill.
Considering the past 15 years, the new crop of hoopsters has some mighty big sneakers to fill.
The new LS Director is Chris Walker who realizes that he has very big hiking boots to fill!
Replacing last years running back Tim Deasey was also the coaches mind as he has big spikes to fill.
Although they have big running shoes to fill now that three-time All-America Mary Proulx has graduated, the Owls believe this could be the year they finally gain the elusive national berth.
Hank Williams has some mighty big cowboy boots to fill.
The 1997 Westfield High School girls Gymnastic team has some big slippers to fill.
Big stilettoes to fill (from a headline about the musical Cabaret)
With Rosie and Sally Jessy packing it in and Oprah set to retire in 2005, there are some big pumps to fill.
Maffei was just about as beloved on the Street as CFOs can be; Connors will have some big galoshes to fill in rainy Redmond.

By associative transfer, with even less semantic coherence than usual, there are some examples involving "feet" and "paws":

Fleury said on Friday that he had no idea which team will draft him. He pondered the possibility of following recently retired Patrick Roy in Colorado. "That would be big feet to fill there," Fleury said.
Butchie retiring has left some big 'paws' to fill, but this youngster shows great potential of being 'the one'. (from the North Wapiti Husky Kennels Iditarod "Sponsor-a-Dog" program)

A few non-foot-associated articles of clothing also appear, such as "Big wings to fill" as a headline about casting the play Angels in America, or "One thing's for sure, Spider-Man 2 will have some very big tights to fill". And the teasers for Amy Reiter's "audio dish" of 10/18/2000 show anticipatory plagiarism of the NYT headline writers "big cups":

Boy George is all over Eminem, Marilyn Manson hates bad f***ing grammarians, Shirley Jones may have some big cups to fill and Russell Crowe bares all.

There's something curious about this set of expressions, if you think about it: did anyone ever actually inherit the footwear of their predecessor in a job or role? Certainly no one does so now, but I'm skeptical that anyone ever did. So how did the phrase get started, and why does it have such an enduring appeal?

Finally, for those who are still with me, a reward.

According to the NYT article that we started with, Michael McKean was originally scheduled to fly in from LA for an audition on January 19, but on January 14, "[his] mother, who lived on Long Island in the house where she had raised her family, had a major stroke." Someone else took his spot on January 19, and his mother died on January 21.

"It's a little lesson on how life goes sometimes," Mr. McKean mused. "The folks from `Hairspray' called and said, `If you want to blow it off, we understand.' But it really was a very welcome diversion, because, while I'm juggling all the funeral stuff and this house full of things my mother never threw away, I said, `Let me go meet with them anyway.' "

McKean's tryout was "spectacular", and he got the part.

The real challenge for Mr. McKean will be to give the jokes their full due by finding the womanliness in his maleness. It's a job that seems timely to him. "I keep thinking about that lovely quote from Wilde," he said. " `All women become like their mothers. That is their tragedy. No man does. That's his.' So this gig is for my mom. She would have loved it."

I think that's touching, I really do.

[Update: John Bell emails

Your recent blog entry led me to find:


It begins: "Out-of-towner Mark Salyer has some big wigs to fill as the lead in the Actor's Express production of Hedwig and the Angry Inch . . ."


Posted by Mark Liberman at 09:00 AM

February 14, 2004

WebSense -- Not!

I guess there's some satisfaction to be taken from Geoff Pullum's discovery that the WebSense filter's block of Language Log as a sex site wasn't the result of an overzealous reading of the site's content ("copular sentences," anyone?), but merely of the filter's having blocked the IP of Language Log's host machine on the basis of what turned out to be a misclassification of another site that was hosted there.

But I'd demur from Geoff's description of the WebSense error as the result of a kind of "typo" -- it's a lot more ominous than that.

In fact this sort of misclassification is extremely common, as I learned when I served as an expert witness for the American Library Association in its challenge to the Children's Internet Protection Act, which mandates the use of filtering software in all libraries receiving certain federal subsidies. (The law was overturned by a three-judge federal panel in June of 2002, but was ultimately held constitutional by the Supreme Court.)

All the filtering companies routinely block the IP's associated with any site their software flags as objectionable, even if the machine in question hosts dozens or even hundreds of innocuous sites. The rationale for this procedure is that porn sites frequently change their url's, so that IP blocking is a necessary back-up. This IP blocking accounts for a large proportion of the misclassifications of sites by filters. In records that N2H2 (makers of the "Bess" filter) produced for the ALA trial, it turned out that more than half of the overblocks for which the filter received unblocking requests over one seven-week period several years ago involved virtual hosting -- or 583 sites for that period alone. Note that these were merely those sites whose owners had discovered that they were being blocked and had taken the trouble to write to the filtering company -- the actual number of innocuous sites that are blocked by this procedure was surely orders of magnitude greater than that. And the proportion of sites that are improperly blocked by this procedure is doubtless a lot higher today, owing to the sharp increase in the number of blogs and other sites that are virtually hosted.

In fact while critics of filters have noted chiefly the overblocking of sex- and health-information sites and the like, IP blocking is responsible for restricting access to a huge amount of utterly irrelevant protected speech, and the burden is entirely on the site owners to discover the errors and report them to the filtering companies. (For more on this, you can look at the pieces I've done on filters in The American Prospect and The New York Times.) In our investigations in connection with the ALA case, we discovered that filters were blocking sites devoted to dollhouse furniture, obituaries, wrestling, Latin music, celebrity autographs, and the computer society of Lulea University of Technology, most of the overblocks probably due to IP blocking, so Language Log is in good, if depressingly abundant, company.

Note by the way that the filters also block all translation sites, Google cache and image pages, anonymizer sites, and other sites that return a url different from the one that was requested -- but that's another issue.

Posted by Geoff Nunberg at 04:33 PM

Darwin on talking parrots

In the course of scanning through Charles Darwin's 1871 book The Descent of Man to locate his Valentine-appropriate speculations on the origins of language in love songs, I was reminded of his brief remarks on talking parrots. This is a topic that came up recently due to the BBC wildlife piece on N'kisi (see here and here for discussion), and I should have thought to cite Darwin at the time. It's a bit depressing that so little has been learned about this over the past 134 years. In particular, it's surprising that we still know almost nothing about the role that complex vocalizations, imitative or otherwise, play in the normal life of parrots in the wild.

Here's what Darwin wrote, with a bit of the context from his Chapter III ("Comparison of the Mental Powers of Man and the Lower Animals"):

That which distinguishes man from the lower animals is not the understanding of articulate sounds, for, as every one knows, dogs understand many words and sentences. In this respect they are at the same stage of development as infants, between the ages of ten and twelve months, who understand many words and short sentences, but cannot yet utter a single word. It is not the mere articulation which is our distinguishing character, for parrots and other birds possess this power. Nor is it the mere capacity of connecting definite sounds with definite ideas; for it is certain that some parrots, which have been taught to speak, connect unerringly words with things, and persons with events.*(2) The lower animals differ from man solely in his almost infinitely larger power of associating together the most diversified sounds and ideas; and this obviously depends on the high development of his mental powers.

*(2) I have received several detailed accounts to this effect. Admiral Sir. B. J. Sulivan, whom I know to be a careful observer, assures me that an African parrot, long kept in his father's house, invariably called certain persons of the household, as well as visitors, by their names. He said "good morning" to every one at breakfast, and "good night" to each as they left the room at night, and never reversed these salutations. To Sir B. J. Sulivan's father, he used to add to the " good morning" a short sentence, which was never once repeated after his father's death. He scolded violently a strange dog which came into the room through the open window; and he scolded another parrot (saying "you naughty polly") which had got out of its cage, and was eating apples on the kitchen table. See also, to the same effect, Houzeau on parrots, Facultes Mentales, tom. ii., p. 309. Dr. A. Moschkau informs me that he knew a starling which never made a mistake in saying in German " good morning" to persons arriving, and "good bye, old fellow," to those departing. I could add several other such cases.

Posted by Mark Liberman at 02:05 PM

Darwin and Deacon on love and language

This being St. Valentine's day, I thought I'd bring out some of the theories proposed over the years about the relationship of courtship and mating to the evolution of language. Charles Darwin thought that language evolved out of love songs, at least partly. More recently, Terence Deacon has argued that language evolved so as to permit marriage contracts to be expressed, negotiated and socially accepted. (There are many other interesting stories about language evolution, but this is February 14, after all.)

In The Descent of Man (Chapter 3) Darwin suggested that

"primeval man, or rather some early progenitor of man, probably first used his voice in producing true musical cadences, that is in singing, as do some of the gibbon-apes at the present day; and we may conclude from a widely-spread analogy, that this power would have been especially exerted during the courtship of the sexes,- would have expressed various emotions, such as love, jealousy, triumph,- and would have served as a challenge to rivals." (Descent of Man, Chapter 3).

He went on to say (Chapter 19, "Secondary Sexual Characters of Man") that

"We must suppose that the rhythms and cadences of oratory are derived from previously developed musical powers. We can thus understand how it is that music, dancing, song, and poetry are such very ancient arts. We may go even further than this, and, as remarked in a former chapter, believe that musical sounds afforded one of the bases for the development of language.

... Mr. Spencer comes to an exactly opposite conclusion to that at which I have arrived. He concludes, as did Diderot formerly, that the cadences used in emotional speech afford the foundation from which music has been developed; whilst I conclude that musical notes and rhythm were first acquired by the male or female progenitors of mankind for the sake of charming the opposite sex. ...

... [I]t appears probable that the progenitors of man, either the males or females or both sexes, before acquiring the power of expressing their mutual love in articulate language, endeavoured to charm each other with musical notes and rhythm. ... [W]e have no means of judging whether the habit of singing was first acquired by our male or female ancestors. Women are generally thought to possess sweeter voices than men, and as far as this serves as any guide, we may infer that they first acquired musical powers in order to attract the other sex. But if so, this must have occurred long ago, before our ancestors had become sufficiently human to treat and value their women merely as useful slaves."

More recently, Terence Deacon argued in his 1997 book The Symbolic Species that human language evolved not as an mechanism of courtship but rather as a means for establishing socially-recognized promises of sexual exclusivity. [What follows is a summary reproduced from lectures notes I wrote a few years ago for an introductory linguistics course].

Deacon's argument is a complex one, depending on a number of results from ethology and other allied fields.

He argues that the key point is a shift to a symbolic mode of communication, in which new linguistic tokens (i.e. words) can be created with an arbitrary relation to their meanings. He stresses that the first steps in developing symbolic communication look very difficult for a non-linguistic species, helping us to understand why no non-human species has gone very far down that road:

Even a small, inefficient, and inflexible symbol system is very difficult to acquire, depends on significant external social support in order to be learned, and forces one to employ very counterintuitive learning strategies that may interfere with most nonsymbolic learning processes. The first symbol systems were also likely fragile modes of communication: difficult to learn, inefficient, slow, inflexible, and probably applied to a very limited communicative domain. . . . Neurologically and semiotically, symbolic abilities do not necessarily represent more efficient communication, but instead represent a radical shift in communicative strategy. It is this shift, not any improvements, that we eventually need to explain.

As a rule, he argues, significant changes in communicative systems in other species occur "in the context of intense sexual selection."

It is at the point in the life cycle where choice of mate takes place that evolutionary theory predicts we should find the greatest elaboration of communicative behaviors and psychological mechanisms in both pair-bonding species and polygynous species, though the communicators and the messages may differ significantly in these two extremes. Between these extremes there are many more complex mixtures of reproductive social arrangements that add new possibilities and uncertainties, and thus further intensify selection on the production and assessment of signals.

Deacon then points out that human mating arrangments, though diverse across societies, share some characteristics that make our species nearly unique: "cooperative, mixed-sex social groups, with significant male care and provisioning of offspring, and relatively stable patterns of reproductive exclusion, mostly in the form of monogamous relationships." According to Deacon, "reproductive pairing is not found in exactly this pattern in any other species." The reason this pattern is not found, he argues, is that it's a recipe for sociosexual disaster: "the combination of provisioning and social cooperation produces a highly volatile social structure that is highly susceptible to disintegration."

In evolutionary terms, a male who tends to invest significant time and energy in caring for and providing food for an infant must have a high probability of being its father, otherwise his expenditure of time and energy will benefit the genes of another male. As a result, indiscriminate protection and provisioning of infants will not persist in a social group when there are other reproducing males around who do not provision, but instead direct all their efforts towards copulation.

These tensions get worse if males and females spend a lot of their time apart, as necessarily happens if males are out hunting and scavenging while females are gathering plants with children in tow. "Hunting and provisioning go together, but they produce an inevitable evolutionary tension that is inherently unstable, especially in the context of group living. Besides ourselves, only social carnivores seem to live this way."

Carnivores that engage in cooperative group hunting include wild dogs, wolves, hyenas, lions and meercats. All such creatures exhibit particular ecological and reproductive patterns that defuse the resulting evolutionary tension. Among lions, provisioning takes place among a "pride" of closely-related females (sisters, aunts etc.). One, two or rarely three male lions take over a pride and guard it against other males -- who will try to kill the cubs to bring the females into estrus -- but do not provide food. Among wild dogs and wolves, the cooperative hunting pack includes both males and females, and they provision both pups and a nursing mother. However, in a given pack there is usually only one reproducing female, who is typically the mother of many of the hunters. Other females are kept from becoming sexually receptive by social pressures and perhaps pheromones. There is usually also only one reproductively active male in a pack.

The typical human pattern -- with many reproductively active males and females living in a group while maintaining patterns of sexual exclusivity, and with male provisioning of children although mated males and females spend considerable time apart -- is never found among the social carnivores.

Deacon suggests that this background helps to explain why the evolution of systematic hunting as a major food source for our hominid ancestors posed a difficult problem in social engineering.

The acquisition and provisioning of meat clearly would be a better strategy for surviving seasonal shortages of more typical foods than shifting to nutrient-poor diets of pith, bark, and poor-quality leaves, as do modern chimpanzees. But this is only possible if there is a way to overcome the sexual competition associated with paternity uncertainty. The dilemma can be summarized as follows: males must hunt cooperatively to be successful hunters; females cannot hunt because of their ongoing reproductive burdens; and yet hunted meat must get to thoese females least able to gain access to it directly (those with young), if it is to be critical subsistence food. It must come from males, but it will not be provided in any reliable way unless there is significant assurance that the provisioning is likely to be of reproductive value to the provider. Females must have some guarantee of access to meat for their offspring. For this to evolve, males must maintain constant pair-bonded relationships, and yet for this to evolve, males must have some guarantee that they are provisioning their own progeny. So the socio-ecolgogical problem posed by the transition to a meat-supplemented subsistence strategy is that it cannot be utilized without a social structure which guarantees unambiguous and exclusive mating and is sufficiently egalitarian to sustain cooperation via shared or parallel reproductive interests.

Deacon argues that this problem required -- or at least invited -- a solution mediated by symbols.

[C]ertain things cannot be represented wthout symbols. . . . Although there is a vast universe of objects and relationships susceptible to nonsymbolic representation, indeed, anything that can be present to the senses, this does not include abstract or otherwise intangible objects of reference. This categorical limitation is the link between the anomalous form of communication that evolved in humans and the anomalous context of human social behavior.

For hunting and provisioning to co-exist in large groups of reproductively active hominids, Deacon argues, it was necessary to establish a certain sort of social contract. If this contract can be establishing and maintained, then everyone is better off. However, it will not work until nearly everyone observes the terms and also enforces observance among others.

Essentially, each individual has to give up potential access to most possible mates so that others may have access to them, for a similar sacrifice in return.

Accomplishing this requires two things. First, you have to establish a shared understanding of who is bonded with whom. According to Deacon, "this information can only be given expression symbolically", because it "is a prescription for future behaviors," not just a memory or an index of past behavior, or an indication of current social status or reproductive state, or even a prediction of probably future behavior.

The pair-bonding relationship in the human lineage is essentially a . . . set of promises that must be made public. These . . . implicitly determine which future behaviors are allowed and not allowed; that is, which are defined as cheating and may result in retaliation.

Second, you have to get everyone else that might be involved to agree not to cheat, and to help protect against cheating.

For a male to determine he has . . . paternity certainty, requires that other males also provide some assurance of their future sexual conduct. Similarly, for a female to be able to give up soliciting provisioning from multiple males, she needs to be sure that she can rely on at least one individual male who is not obligated to other females to the extent that he cannot provide her with sufficient resources.

A marriage contract is a social contract, not just an agreement between the bonded pair. It is typical in human societies for the social group as a whole to play an active part in maintaining sexual exclusivity between individuals; this is something that happens in no other species. Deacon argues that it happens among humans because all members of the group "are party to the social arrangement, and have something to lose if one individual takes advantage of an uncondoned sexual opportunity."

Deacon is less clear about the first steps in the process of establishing such social contracts. He suggests that the ability of apes to acquire limited symbolic abilities in laboratory settings give us an indication of what our species' symbolic beginnings might have been.

In a word, the answer is ritual. Indeed, ritual is still a central component of symbolic "education" in modern human societies, though we are seldom aware of its modern role because of the subtle way it is woven into the fabric of society. The problem for symbol discovery is to shift attention from the concrete to the abstract; from separate indexical links between signs and objects to an organized set of relations between signs. In order to bring the logic of token-token relationships to the fore, a high degree of redundancy is important. This was demonstrated in the experiments with the chimpanzees Sherman and Austin. It was found that getting them to repeat by rote a large number of errorless trials in combining lexigrams enabled them to make the transition from explicit and concrete sign-object associations to implicit sign-sign associations. Repetition of the same set of actions with the same set of objects over and over again in a ritual performance is often used for a similar purpose in modern human societies. Repetition can render the individual details for some performance automatic and minimally conscious, while at the same time the emotional intensity induced by group participation can help focus attention on other aspects of the objects and actions involved. In a ritual frenzy, one can be induced to see everyday activities and objects in a very different light.

To sum up: Deacon thinks that early hominids developed symbolic communication as a way to establish social contracts permitting stable family and group structures, which otherwise would not have permitted hunting and scavenging for meat as a systematic source of supplemental food during times of drought. This set the state for nearly two million years of evolutionary adaption for improved symbolic communication, probably due to sexual selection (crudely, females preferred males who could make more convincing promises).

Posted by Mark Liberman at 01:57 PM

Web sense: Language Log is clean

One must be so careful not to jump to conclusions, always to ask what the evidence is, always to double-check. Otherwise one will believe all sorts of nonsense. I did some corresponding with the Websense corporation after a reader of Language Log reported that the Websense filters were blocking access to our site on grounds that it was a sex site. We were indeed being blocked, and we immediately jumped to the conclusion that content must be to blame. After all, searches on the string "sex pro" were at one point turning up, as the top-ranked hit, a saucy piece by Mark (that animal!) on the views of Sapir and Whorf concerning the influence of language and thought -- he had headlined it "Sapir/Whorf: sex (pro) and space (anti)".

But things were not so entertainingly loopy, it turns out. No robot had scanned our stuff for pornographic or titillating content and placed us on the banned list because of what was found.

An employee on the Websense database staff, signing the initials PL, has informed me that we had never been tagged as a porn site:

Just to let you know, Language Log was not classified under an adult category at any time in our master database. The issue was caused by an erroneous classification for another site that was hosted on the same IP as www.languagelog.com, and it was rectified when brought to our attention.

So we have no exotic anecdote to offer you about crazy web censoring by fgrep searches for dirty words. It was just a typo in a sort of phone book of IP addresses concerning the sites hosted by the virtual hosting company that rents us our URLs languagelog.com, languagelog.net, and languagelog.org (they all point to the same machine, a little under-used Linux box sitting unnoticed on a table behind a filing cabinet somewhere in the Institute for Research in Cognitive Science at the University of Pennsylvania). Websense seems to have acted courteously and extremely swiftly when informed of the mistaken classification of our harmless little educational site, which hardly ever mentions such topics as massage or adult situations or fetishism or anything, not that there's anything wrong with those topics.

Language Log is back in business, and we can say whatever we f*cking jolly well please, even on the topic (should it ever come up) of s.e.x.

Posted by Geoffrey K. Pullum at 12:55 PM

St. Valentine among the pigeons

It seems that we can learn nothing about this day from the life of the saint or saints whose name it bears. According to the New Advent Catholic encyclopedia:

At least three different Saint Valentines, all of them martyrs, are mentioned in the early martyrologies under date of 14 February. One is described as a priest at Rome, another as bishop of Interamna (modern Terni), and these two seem both to have suffered in the second half of the third century and to have been buried on the Flaminian Way, but at different distances from the city. ... Of both these St. Valentines some sort of Acta are preserved but they are of relatively late date and of no historical value. Of the third Saint Valentine, who suffered in Africa with a number of companions, nothing further is known.

The article goes on to explain that

The popular customs associated with Saint Valentine's Day undoubtedly had their origin in a conventional belief generally received in England and France during the Middle Ages, that on 14 February, i.e. half way through the second month of the year, the birds began to pair. Thus in Chaucer's Parliament of Foules we read:

For this was sent on Seynt Valentyne's day
Whan every foul cometh ther to choose his mate.

For this reason the day was looked upon as specially consecrated to lovers and as a proper occasion for writing love letters and sending lovers' tokens. Both the French and English literatures of the fourteenth and fifteenth centuries contain allusions to the practice. Perhaps the earliest to be found is in the 34th and 35th Ballades of the bilingual poet, John Gower, written in French; but Lydgate and Clauvowe supply other examples.

Here in the Quad, the commonest birds are pigeons. The ornithologists at Cornell's Project PigeonWatch have a page on courtship behavior, and ask all of us to "[record] the colors of pairs of pigeons involved in courtship displays" so as to "help Lab of Ornithology scientists to determine what colors of males females choose for mates". Either the scientists are not interested in the colors of females that males choose, or perhaps they believe that male pigeons are not choosy. Given that "once a pair has formed a 'bond' they will stay together for life", it seems evolutionarily unwise for male pigeon love to be blind, doesn't it? I'm sure that the evolutionary psychologists have an explanation, though I suspect that if male pigeons turn out not to be so pathetically accepting after all, there will be an explanation for that as well.

The PigeonWatch scientists have missed the chance to use St. Valentine's Day as a recruiting opportunity, perhaps because they believe that the medieval French were wrong about the timing of bird courtship as about so much else: "feral pigeons display courtship behaviors throughout the year although they are more likely to mate in late winter and spring".

Well, the pigeons in the quad are certainly nothing if not feral. Even the squirrels fear them. Still, I'll keep an eye out on this St. Valentine's Day for billing and cooing (also bowing, tail-dragging, driving and clapping), and if I see any, I'll note the color morphs of both participants, just for the record.

Posted by Mark Liberman at 07:54 AM

February 13, 2004

Grammar critics are, like, annoyed really weird

According to Andrea Petersen on the WSJ wire, "grammar critics are annoyed" at the continued and even growing use of like. The prize for critical annoyance goes to "Katie Schwartz, a speech pathologist in Chattanooga, Tenn.", whose "'Sense Cues' kit trains speakers to associate the smell of something they don't like with remembering to delete superfluous 'likes' from their conversation." So says the article, anyhow. I don't know how it is in Chattanooga, but a speech pathologist who tried to get Philly kids to change their speech habits by spraying them with patchouli would soon be picking bits of atomizer out of her dentures.

Googling "sense cues" reveals that Schwartz' kit is part of a program for "corporate speech pathology" that "helps determine which sense will most aptly cue a client to work on a specific behavior by testing all five. Once the dominant sense is established, the client can focus on using the cues every day until cues are no longer necessary." A full-spectrum approach, apparently: unpleasant noises, ugly ties, bad food and clammy handshakes as well as the bad smells. I've spent time in workplaces like that, though it didn't occur to me that the experience would improve my behavior.

Although Petersen features like use by a sixth-grade teacher in New York and 78-year-old Scottish women, she missed the most famous like user of the year: God Himself. Somehow I don't think Pat Robertson whipped out his "Sense Cues" kit to find the best sensory modality for reminding the Almighty to work on his diction.

Posted by Mark Liberman at 06:12 PM

February 12, 2004

No free speech for spam rage

USA Today reports that when Charles Booher got mad at a Canadian company that spammed him, and threatened to shoot or torture them and send them anthrax unless they removed him from an e-mail list, Federal agents turned up at his home and arrested him. He's looking at five years in jail and a fine of a quarter of a million dollars.

There but for the grace of God. I did something similar to an organization (EDUCAUSE) that had put the faculty list at my university on its spamming list. "Take me off this list," I wrote, "or I will hunt you down and kill you."

It was a literary allusion. The line is from a classic Saturday Night Live sketch about an imaginary high-tuition college that offered students a chance to take the majority of their parents' tuition payment back as a cash payment, go on vacation for four years, and receive a degree without taking any exams, provided they didn't tell their parents that the college had no faculty and no classes. "If you tell anyone," the students were told at orientation, "we will hunt you down and kill you."

I misjudged the people at EDUCAUSE. They're not SNL fans, apparently, because they were furious and said they would report me for threatening them, unless I was joking. "Of COURSE I'm joking, you morons," I told them quickly; and that's why I'm not in a Federal jail right now.

So: the spammers can spam us, and can send us anything they like --- about porn, gambling, dangerous surgical procedures, dubious dietary supplements, utterly illegal financial rip-offs --- but if we repay them in kind by sending them our free speech over the email channel, we go to the Federal penitentiary? It's a cruel world. There is nothing so cruel that I wouldn't use it against spammers if I could: if there was a button on my keyboard that sent a signal that made the sender of the current email message burst into flames at his keyboard, I'd be tapping that button several times a day, and bit by bit the spam problem would be solved. But we don't have that option. And apparently we don't have the option of sending them flaming, over-the-top, threatening messages, either.

Charles Booher didn't mean it, by the way. It was just a way to express his rage. We should all get together and start a defense fund for him. We could get a hundred million or so from a Nigerian guy who wrote me just the other day wanting to use my bank account for a fund transfer scheme...

Posted by Geoffrey K. Pullum at 11:18 PM

Pronouncing "wronger": where's the evidence?

Can anyone point me to a few ordinary textual citations for the word wronger that do not appear in special fixed phrases?* Yes, there are a few Google hits for the word, but they are completely dominated by parodies (like a song Wronger modeled on Britney Spears' Stronger), and allusions (like the widely repeated phrase wrong and wronger, alluding to the movie title Dumb and Dumber). I'm having a lot of trouble finding clean, ordinary, non-special uses.

What I'm interested in is how the comparative adjective form wronger is pronounced. Does it rhyme with longer and stronger, where you can hear an extra [g] which isn't there in long and strong? Or does it rhyme with agent nouns like gonger, meaning "person who plays the gong"? (If you don't see what I mean, compare finger with singer: they do not rhyme, unless you are from the north of England, because in most dialects you hear a [g] in finger that you don't hear in singer.)

The parodies and allusions fight against us here. The parody of "Stronger" encourages us to assume the [g]; the allusion to "Dumb and Dumber" encourages the non-[g] pronunciation (there is no [b] heard in dumb or dumber). Such analogies might mess up people's judgments on how they would say it (people are by no means always to be trusted on questions about how they speak).

I would turn to an authoritative reference grammar, of course, but I have a bit of a problem there: I'm the co-author of one. The inflection chapter (chapter 18) of The Cambridge Grammar of the English Language was written by Frank Palmer and Rodney Huddleston with some contributions by me. So it would be a bit like asking myself. And personally I'm not sure I have any relevant evidence. We listed (on page 1583) a few words that we claimed were just exceptions to the claim that monosyllabic adjectives inflect, and we included wrong on that list. Now, the forms *liker, *loather, and *worther still seem cast-iron ungrammatical to me, but I've come to think that we probably shouldn't have included cross, ill, and real; you can occasionally find crosser, iller, realer, and the corresponding superlative forms, and it's important not to give the impression that there is something incorrect about any of those -- though they seem to be relatively rare.

The signs of extreme rarity of certain presumed comparative and superlative forms are puzzling. One example is provided by fake. Why is the comparative form faker so rare? It does occur, but almost all the occurrences of it on the web are references to the title of Mick Foley's book Foley is Good: And the Real World is Faker Than Wrestling. There's enough fake stuff going on in this world, and surely, some instances more so than others. One would expect the word to be more common.

Why righter and wronger are so rare is the biggest puzzle. Dictionary entries for wronger invariably refer to the noun meaning "one who wrongs somebody", which is not relevant here. Where's the comparative-grade adjective? The Oxford English Dictionary contains not a single attestation of it. Why has the inflected comparative of one of the commonest adjectives in English been so rare down the centuries? Damned if I know. Nobody seems to know. A sci.lang discussion thread back in 1998 (here, for example) revealed only that nobody really knew.

Yet it's actually relevant to something. The other three one-syllable adjectives ending in the -ng sound of song are long, strong, and young, but they are all irregular: that [g] in the comparative and superlative forms would not be expected from simple addition of -er.

But the only adjective we can use as a test of the claim that the regular comparative and superlative inflection does not have that extra [g] -- since what we need is a monosyllabic adjective ending in the ng sound (the velar nasal) is wrong. I suspect it is regular. The wonderful Webster's Third New International Dictionary of the English Language, Unabridged (Merriam-Webster, 1961; last reprinted, with addenda, 2002) says it is -- they show the pronunciation of wronger as not having that extra [g]. But even they do not cite an example of wronger or wrongest used in context. And the word is just way too rare to ask for people's intuitions about it.

This is the sort of thing on which you might build an argument from poverty of the stimulus (see this earlier post for discussion of another case of such an argument). If people intuitively know that wronger doesn't have that extra [g], and they're correct about that, yet they hardly ever hear wronger (or even read it), and most might well never have heard it at all, how on earth did they learn what they know?

*I'm shy about putting my email address on the web, naturally, because there are hordes of roving spam robots out there, harvesting addresses so they can send me advertisements for body-part enlargement services and dubious Nigerian money transfer deals. But my login name is my last name, and my institution is UCSC, and the domain is EDU, so it's easy to guess, if you're not a brainless spam robot.

Posted by Geoffrey K. Pullum at 10:39 PM

February 11, 2004

Blogging time

In the course of a post about the factual standards of linguistic journalism, Semantic Compositions remarks:

This morning, Semantic Compositions featured an article from the Boston Globe, which launched him off about the prescriptive/descriptive debate. Mark Liberman, however, sat down with a copy of the Oxford English Dictionary, and just demolished the prescriptive case being made. SC envies Prof. Liberman the time necessary to compile such a post, but feels obliged to answer his rhetorical question: "Can't anybody use a dictionary anymore?"

Though grateful for the plug, I'd like to correct two related misapprehensions: first, that an actual copy of the OED was involved, and second, that researching the post took a significant amount of time.

Here's how it was.

1. Over breakfast early last Monday morning, I read the new links on A.L.D., noted Powers' screed, snarfed the offending section, opened an html file, and pasted it in. Blogging time: about 20 seconds. (I'm not counting the time I spent reading his piece or other things, as I would have done that anyhow). The rest of the morning I spent answering email and participating in a weekly DARPA project conference call.

2. On most Mondays, the computational linguistics students at Penn hold a lunchtime talk series called Clunch. The Clunch Wallahs not only arrange a presentation but also provide a buffet lunch. Last Monday, the presentation was Julia Hockenmeier with a fascinating presentation about using CKY parsing techniques to predict protein folding from amino acid sequences, and the food was from Tandoor India. At noon, I sat down with my rice, saag paneer and naan, unpacked my laptop, logged into Penn's OED subscription via wifi, looked up Powers' three words, snarfed the relevant senses and historical citations, and pasted the results wholesale into the html page that I'd opened earlier. By the time Julia started her talk at 12:20 or so, I'd long since finished both my lunch and the OED interaction, checked my email, and spent several minutes talking with my neighbors. Blogging time: maybe 2 minutes.

3. Julia finished up her talk around 1:30. I had a couple of errands to run, and then I walked home, arriving about 2:30. I did the dishes, checked my email, and then sat down to write the post. I'd thought it through a bit while doing the dishes, and I already had all the quotes in place, so I just had to type it out and then load it into MovableType's somewhat clunky posting interface. I'm a reasonably fast typist, and I didn't do much editing beyond reading it through for obvious idiocies and fiddling with a couple of formatting issues. The post went in shortly after 3:00 p.m., and I turned my attention to an afternoon of actual work. Blogging time: about 20 minutes.

Total time to create the post -- about 23 minutes. Portion of that time devoted to research: about 2 minutes.

If I had needed to consult a paper dictionary and copy the citations by hand, the research would have taken a lot longer, probably half and hour or more. As a result, I probably wouldn't have done it. I can rarely afford to spend more than 10 or 20 minutes composing a post, generally using an odd little shard of time between one appointment or chore and another, when I can't hope to do anything that requires intellectual set-up time.

If there seems to be some real research in one of my posts, it's either something I did as part of one of my day jobs -- some stuff I compiled for a course lecture, or some examples for a paper, or the like -- or it's something that can be done in a couple of minutes, through the miracle of networked computing.

Using online dictionaries is now so easy that I'm reluctant to excuse people like Murphy and Powers for not bothering to check before drawing a line in the sand on word usage -- though I'm a tolerant person.

Posted by Mark Liberman at 11:10 PM

Language Log: banned by Websense

Semantic Compositions has recently posted his discovery that Language Log is banned by Websense. He learned this when he found that he can't read us at his workplace.

I suppose we owe this to the same earnest literal-mindedness that brings us internet pilgrims googling their way to this post for the phrase "sex pro". But looking over the Websense web site, I don't see any way to request a recount.

I wonder how many other nannyware systems block us? I'd hate to think that we're cut off from the public library users of America, not to speak of the millions looking for a quick shot of linguistics on their lunch break.

Here are some relevant web sites, if you're interested in issues of censorware (and I'm now a bit more interested than I was before): the EFF's censorware page, the censorware project, whitehat's websense page, The Irving Independent School District's websense site (which includes a page to use to suggest a change -- if you're a websense customer).

Posted by Mark Liberman at 09:34 PM

Word counts without lexical facts

I doubt that I'm alone in finding Mark Liberman's ruminations on camel spit genuinely fascinating (vastly more so than I would have predicted had someone asked me yesterday to say whether I wanted to hear something about this topic; one really must try not to prejudge).

And I hope that no one will have missed the key difference between what he does and what I was grumbling about in the previous post. It's the difference between the qualitative and the quantitative. Mark cites specific qualitative facts about the meanings and etymologies of particular Somali words, and speculates on what they mean for the view of the world you get through Somali lexicon and metaphoric imagery. And he has studied this language for a semester, and he has a dictionary of it, and unlike some people, he has learned to use it. What I grouse about is people who reduce the wonder to bald quantitative assertions concerning ethnic groups they know nothing about (tribes with 50 words for this or 92 words for that), having no actual quantitative data to back it up, and having not even asked if there are any such data. Comparative lexical census-taking without actually counting; statistics without the numbers. That's what gets my goat about the people who prattle on about the "abundance of words" this or that tribe has for shoes or ships or sealing wax or camel spit or kings.

Posted by Geoffrey K. Pullum at 01:30 PM

Somali words for camel spit

Geoff Pullum is right to point out that Joann Loviglio's AP story on the documentation of Middle Chulym doesn't offer any evidence that their "abundance of words related to hunting and fishing, plants and flowers, weather and family relations" is different from our own, either in number or in kind. And Geoff is right to wonder why "people yearn so desperately to believe that there is some kind of incredible profusion" of revelatory terminology in exotic languages. I'll offer a partial explanation in the form of a qualified defense. The lexicon of other languages is full of thought-provoking concepts and connections. These are not more numerous or more insightful than the analogous patterns in English, but they're definitely different, and thus sometimes striking. I'll give a couple of examples from a language that is well documented and in no current danger: Somali.

I need to start with a bit of sex education, quoted from L. Skidmore, "Reproductive Physiology in Male and Female Camels", in Skidmore and Adams (eds), Recent Advances in Camelid Reproduction, International Veterinary Information Service, Ithaca NY, (2000):

Sexual behaviour [of male camels] is also characterized by exteriorisation of the soft palate...The protrusion of the soft palate occurs all day long at intervals of 15 - 30 minutes and is accompanied by loud gurgling and roaring sounds. The protrusions become more frequent with increased excitement such as the presence of other males and females. During copulation the soft palate ejection may be replaced by grinding of the teeth and frothing at the mouth. This frothing is generally attributed to increased secretion of the salivary glands and the frequent exteriorisation of the soft palate.

The Somalis traditionally spend a lot of time with camels, so of course they've noticed this phenomenon, and named it. (And yes, there are several dozen words for "camel" in Somali, roughly like the range of English words for bos species, but that's not the point here). According to Zorc and Osman's Somali-English Dictionary, one of several Somali words for the male camel's rutting-froth is doobbo. From this is derived the verb doobbadillaacso "to reach sexual maturity (of a camel)"; dillaacso means ""to tear, crack open for oneself", so that doobbadillaacso might be etymologically glossed "to uncork one's rutting-froth", or in a more contemporary idiom "to bust a froth". Another meaning given for doobbadillaacso (and I suppose the main one for urban or diaspora Somalis) is (of humans) "to reach intellectual maturity; be capable of speaking in public".

Public oratory is a big thing in Somali culture, so to make a lexical connection between camels' rutting-froth and verbal facility is interesting. This is not an isolated lexical metaphor: according to Zorc and Osman, gulaal means "male camel unable to project the gland in his mouth; person with hesitant or stammering speech".

I'm not claiming that this lexical metaphor expresses a unique or unprecedented insight. In a way, its very familiarity ("debate == male sexual competition") is the reason it's interesting to find it in this cultural context. However, there are other Somali camel-word connections whose interest does derive from an apparently irreducible exoticism. There is a word foolbaxsi that Zorc and Osman gloss as "agitated circling movements of a pregnant camel prior to giving birth". It's clearly derived from fool "labor pains" and bax "to come out, to go out". The exotic bit comes from the verbal form foolbaxso, glossed as "to rub the oil of fried coffee beans onto one's face and body (when eating breakfast)". The mind does boggle, you have to admit.

[Warning: my knowledge of Somali is derived from teaching a one-term field methods course on it quite a few years ago, and from perusing Zorc and Osman's dictionary. I'd strongly recommend that before acting on the alleged facts presented above, the reader should check them with someone who actually knows the language. For example, maybe Zorc and Osman got a couple of index cards mixed up, and Somalis don't really think that rubbing coffee oil onto your skin at breakfast is reminiscent of the agitated circling of a camel in labor. If so, then this little fragment of lexical poetry should be credited to the goddess of chance rather than the genius of the Somali.]

[Update: a bit more poking around in Zorc and Osman reveals another fool, meaning "face; brow, forehead; front tooth, incisor". No doubt that's the fool involved in the coffee-lotion word, so that this is another example of the ubiquitous (and thus unremarkable) polysemy that John McWhorter wrote about here. The process is revealing, though: an adult learning a new language encounters all sorts of lexical relationships, both real (like "bust a froth" = "start public speaking") and accidental (like "rubbing in coffee-oil lotion" = "agitated pregnant camel"), that attract conscious attention in a way that the familiar patterns of one's native language normally don't.]

Posted by Mark Liberman at 10:12 AM

February 10, 2004

Words for life, the universe, and everything

Joann Loviglio does indeed tell the Chulym story very nicely and accurately. But even she feels impelled to include a lexical profusion remark: "The Middle Chulym language echoes their way of life, with an abundance of words related to hunting and fishing, plants and flowers, weather and family relations."

Think about that. Of what human language, exactly, could one conceivably not say that it had an abundance of words related to hunting and fishing, plants and flowers, weather and family relations? I know I could write down a thousand or two. Think of all the words you know: all the plant and flower-related terms, the entire weather vocabulary, every word for family relations, even (though you may not actually spend much time subsisting in the backwoods) hunting and fishing words you've encountered. Would it not be an abundance? Then what's the point?

Why do people yearn so desperately to believe that there is some kind of incredible profusion of words for such things among hunter-gatherer peoples, when they have never been shown a single scintilla of quantitative evidence? Suppose I said that that German echoes the German way of life, with an abundance of words for beer, sausage, trains, freeways, and high-end automobile engineering. Would you take this seriously, given that I have absolutely no evidence that the numbers of words for these things in German makes it significantly different from English? Then why do people keep on repeating it about far-away tribesmen they know so much less about?

Posted by Geoffrey K. Pullum at 07:47 PM

The Passion

Actor Mel Gibson has produced a controversial new movie The Passion about the last 12 hours of Jesus Christ which will appear in theatres in two weeks. It is controversial for several reasons. Some people consider it to stimulate anti-Semitism by blaming the death of Jesus on the Jews. The Jewish community is divided on this issue. Some Protestants object to what they see as the mariolatry of the film. The Passion is being heavily promoted, and is seen by Gibson and others as an evangelical tool. It has been endorsed by Billy Graham and the Vatican.

The promotional web site is currently in 18 languages, including Aramaic and Latin. The Latin, by the way, begins with a little error: the title, intended to be "The Story", is L'Histoire, which is French; in Latin it would be Historia. It is worth checking out for novelty's sake; you'll learn, for example, that the ubiquitous FAQ ("Frequently Asked Questions") is Saepe Interrogata ("those things which are often asked").

From a linguistic point of view, the most interesting thing about the film (which I haven't seen) is the fact that the dialogue is in what Gibson believes to be the languages spoken in Israel 2000 years ago: Aramaic, Hebrew, and Latin. It isn't often that one gets to hear Aramaic spoken. Observant Jews still read Aramaic routinely as it is the language of the Books of Daniel and Ezra and of the Talmud, and a few prayers are recited in Aramaic. One of these is the Kaddish, the prayer of mourning. But as a spoken language Aramaic is on its last legs. Varieties of modern Aramaic are spoken by small groups, mostly Jews and Christians, in Georgia, Iran, Iraq, Syria, Turkey, and Israel. In most of these Aramaic is moribund. (Details can be found in the Ethnologue.)

The use of Aramaic as the main language of the film is authentic. Hebrew had passed out of daily use in most of Israel several centuries earlier. Although it continued to be used for religious purposes and continued to be spoken in a few areas, most people in Israel at this time spoke Aramaic. A man from Nazareth would have had Aramaic as his first language and Hebrew as a second language used largely for religious purposes.

What is not authentic is the use of Latin. By this time, Latin, originally the language of the area surrounding Rome, was spoken throughout Italy, though even there it had not entirely replaced the indigenous languages. In Pompeii, destroyed in 79 C.E., many of the street signs and graffiti are in Oscan, a language related to Latin but quite distinct from it. Latin had also spread to some areas colonized by Rome. However, in the Eastern Roman Empire, then as later, the lingua franca was not Latin but Greek. Greek was widely used throughout the Mediterranean area before Rome rose to power, and was extensively used in Rome itself. The upper classes were educated by Greek-speaking slaves and often spoke Greek among themselves. Other people often knew Greek through trade or by virtue of their contact with slaves, many of whom were Greek-speaking. As Rome spread eastward, the Roman army enlisted soldiers for whom Greek was the lingua franca. Thus, Greek was widely known in Israel and the surrounding area and was also the dominant language of the occupying Roman army. It is not an accident that most of the New Testament was written in Greek and that for the first few centuries most Christians read the Hebrew Bible in Greek. A high-ranking Roman like Pontius Pilate, educated in Rome, undoubtedly spoke both Latin and Greek, but very few of the local people spoke Latin.

It is a bit of a mystery why Gibson chose to use Latin rather than Greek in a film that otherwise goes to considerable lengths to be authentic. As far as I can tell, he hasn't offered an explanation. A guess is that it reflects the fact that he is a conservative Catholic, one who rejects the reforms of Vatican II and is reported to attend a Tridentine Mass, in Latin. The use of Latin may reflect his personal attachment to Latin as the traditional language of the Roman Catholic Church. It is true that Latin became the language of the Church, but the origins of that church are in Israel, not Rome, and at the time the dominant languages were Aramaic and Greek, not Latin.

[Update 2004/03/22: The Archaeological Institute of America has an excellant commentary on the historical accuracy of The Passion here. It concurs that the use of Latin is not authentic.]

Posted by Bill Poser at 11:37 AM

Joann Loviglio gets it right

Since I've sometimes complained when news stories about linguistic research are misleading or false, it's a pleasure to cite one that is really well done. Joann Loviglio has written an AP newswire piece about David Harrison's work on Chulym, here as published by USA Today, that gets everything right, as far as I can tell. David Harrison (who pointed the story out to me) is happy with it, which is the best indication of its accuracy.

Posted by Mark Liberman at 08:38 AM

February 09, 2004

They can spam

With great struggle, the lawmakers who crafted the name of the Controlling the Assault of Non-Solicited Pornography And Marketing Act managed to get its acronym to be CAN SPAM.

You are meant to read can spam as a verb and its direct object: can is a regular transitive verb (forms: can, cans, canned, canning), to be read here in the 3rd sense of the 3rd entry given by Webster's, "to put a stop or end to"; and spam is meant to be a noun meaning "unsolicited mass-mailed electronic mail messages".

What a pity there is another parse of can spam available: can also exists as an irregular and defective modal auxiliary verb (forms: can, could), with deontic (permission), epistemic (possibility), and dynamic (ability) senses, and spam can be a (transitive or intransitive) verb meaning "send [someone] unsolicited mass-mailed electronic mail messages".

When I originally read about "the Can Spam Act" I actually thought for a moment it was using the modal sense of can -- that it was an act intended to ensure that corporations can spam us, and the name just came right out and said so. A stupid mistake of mine. I should have realized after the "Clean Air Act" (a law that liberalizes air pollution regulations) and the "No Child Left Behind" act (which forces public schools to spend time on morale-destroying administration of tests instead of education), and so on, I should have realized that no law designed to permit bad stuff would have an honest name. Laws always get named in a way that suggests they are doing good stuff, and there is nothing good about spam.

Posted by Geoffrey K. Pullum at 07:20 PM

Word up, word down

Today's A.L.D. has a link to John Powers' lexicographically-challenged rant about the "growing imprecision of usage" among contemporary Americans, who allegedly "may mean what they say, but they can't always summon the language to say what they mean", and right next to it, a link to an SF Chronicle article by Rona Marech, explaining that "[w]ith the universe of gender and sexual identities expanding, a gay youth culture emerging, acceptance of gays rising and label loyalty falling, the gay lexicon has exploded with scores of new words and blended phrases that delineate every conceivable stop on the identity spectrum." Her article mentions and briefly defines a couple of dozen of these new terms.

Just as Powers' article is typical of the genre that sees lexical and conceptual distinctions dissolving in the acid bath of ill-bred ignorance, Marech's article is also a common type, reveling in the exuberant fruiting of new concepts and words in the fairy rings of fertile subcultures. Both kinds of articles can be factually problematic: new "mistakes" and "creations" alike are sometimes not so new after all (though I have no basis for doubting Marech's discussion). Even if complaints and enthusiasms are sometimes misinformed, words and word senses really do die out, and new ones really do emerge. But common sense suggests that the forces of destruction and creation must be in balance, over the long term, or the language of our everyday life would turn into something very different from what it's always been, and probably always will be.

Posted by Mark Liberman at 06:02 PM

Igry and ghits

Trevor at Kaleboel observes that igry "already gets slightly more ghits (ca 200 to 175, once you've sifted out the Russians) than its Catalan equivalent, vergonya aliena, although it's still way behind plaatsvervangende schaamte and vergüenza ajena (not to mention verguenza ajena)."

All the same, he says that "I don't think it'll stick for two reasons: firstly, it gets you tied up in the back of your throat in a way English speakers don't like; secondly, it sounds too much like the Antipodean rendering of angry." Maybe so, though avoidance of Australian sound-alikes is a hitherto-undocumented force in linguistic history -:).

For me, though, the most important thing in his post is the neologism ghits. Now there's a word that fills a need! I don't know if this is Trevor's coinage, but it seems to be pretty new: "ghits" has 2380 ghits, at the moment, but all the 50 or so that I checked were programming language variable names, words in languages other than English, alternative spellings of "gits", or jokes like "ghits and siggles". Anyhow, I'm in Trevor's debt for the tip, and if he's the author, he deserves immortal renown.

One question, though: does one have ghits or get ghits? Trevor uses get. My unthinking reflex is have, as above, but on reflection, get is more in tune with the ephemeral and process-dependent nature of ghits.

Sic transit googlia mundi, and all.

[Update: it seems that Trevor is the responsible party. He's posted that

I am having 1,500 cards printed with "ghit = google hit 2004 followthebaldie.com" and am going to flog them down the Ramblas this lunchtime. I am unsure as to whether this constitutes a business plan.

I'm not sure about the law in Calalunya, but in the U.S. I don't think you can copyright a word. Luckily. You could register it as a trademark or service mark, but in this case, for what?]

Posted by Mark Liberman at 05:11 PM

At a loss for lexicons

Can't anybody use a dictionary anymore? I enjoy a good curmudgeonly rant about how English is going to the dogs these days, I really do. But why can't the journalists who crank out such screeds check their lexical prejudices against a good dictionary or two?

A couple of months ago, I complained about Cullen Murphy, who "drew the line" in the Atlantic on the meaning of three words: 'Notoriety does not denote "famousness," enormity does not denote "bigness," and religiosity does not denote "religiousness."' As I pointed out, a quick peek at the OED reveals that the three senses that bother him are earlier (even original) ones, sanctioned by centuries of use, and only recently falling out of favor. I don't recommend that everyone start using those senses -- I agree with his judgments that they're no longer quite the thing -- but to see this as holding off the forces of cultural degeneration is like "holding the line" against gingham bonnets and beaver hats.

Yesterday, John Powers weighed in ("A Loss for Words", Boston Globe 2/8/2004) with another triadic tirade:

We say "transpire" when we mean "happen." We say "momentarily" when we mean "soon." We say "livid" when we mean "angry." This growing imprecision of usage may not be what fictional professor Henry Higgins declared "the cold-blooded murder of the English tongue." But it does matter if you don't know what you're saying. If you don't, how will I?

Let me be clear -- I'm not asking anybody to document that there is actually an overall "growing imprecision of usage". That's one of the assumptions of the genre of curmudgeonly rants. It's surely false, if only because the genre appears to date all the way back to Sumerian times, so that if the hypothesis were true, human communication would by now have been reduced to occasional grunts and growls, soon to be grunts alone. But all the best literary forms require that we suspend disbelief and grant the author certain essential premises, and I'm fine with that. "Growing imprecision of usage," check. Let's get to the good stuff, the clever skewering of boundary-blurring innovation!

"Transpire" does not mean "happen." It means "to leak out." "Momentarily" does not mean "soon." It means "for a moment," or "from moment to moment." "Livid" does not mean "angry." It means "black-and-blue," the color of a bruise.

Transpire. He sets, he throws; strike one, a long foul down the right field line.

The (free) American Heritage explains that

Transpire has been used since the mid-18th century in the sense “leak out, become publicly known,” as in Despite efforts to hush the matter up, it soon transpired that the colonels had met with the rebel leaders. This usage has long been standard. The more common use of transpire to mean “occur” or “happen” has had a more troubled history. Though it dates at least to the beginning of the 19th century, language critics have condemned it for more than 100 years as both pretentious and unetymological. There is some sign that resistance to this sense of transpire is abating, however. In a 1969 survey the usage was acceptable to only 38 percent of the Usage Panel; nearly 20 years later, 58 percent accepted it in the sentence All of these events transpired after last week's announcement. Still, many Panelists who accepted the usage also remarked that it was pretentious or pompous.

From the OED (not free, but surely the Globe can afford it), we can learn that some pretty famous people have been on the wrong side of Powers' screed -- Abigail Adams, Noah Webster, Charles Dickens, Nathaniel Hawthorne:

1775 A. ADAMS Let. 31 July in J. & A. Adams Familiar Lett. Revolution (1876) 91 There is nothing new transpired since I wrote you last. 1804 Age of Inquiry (Hartford, Conn.) 46 When..the reformation transpired in England..almost the whole nation rejoiced. 1810 F. DUDLEY Amoroso I. 14 Could short-sighted mortality..foresee events that are about to transpire. 1828 WEBSTER, Transpire..3. To happen or come to pass. 1841 W. L. GARRISON in Life (1889) III. 16 An event..which we believe transpired eighteen hundred years ago. 1848 DICKENS Dombey xxxii, Few changes -- hardly any -- have transpired among his ship's company. 1858 HAWTHORNE Fr. & It. Note-bks. I. 225 Accurate information on whatever subject transpired. 1883 L. OLIPHANT Altiora Peto I. 277 His account of what transpired was so utterly unlike what I expected.

Momentarily. He sets, he throws; the batter fouls it back to the screen. Strike two.

The usage note in the American Heritage says

Momentarily is widely used in speech to mean “in a moment,” as in The manager is on another line, but she'll be with you momentarily. This usage rarely leads to ambiguity since the intended sense can usually be determined on the basis of the tense of the verb and the context. Nonetheless, many critics hold that the adverb should be reserved for the senses “for a moment,” and the extended usage is unacceptable to 59 percent of the Usage Panel.

The OED's examples show that momentarily has been used to mean something like "instantly" or "quickly" since the 18th century:

1739 H. BAKER & J. MILLER Psyche ii. 207 Apply thy Thoughts to nothing but to endeavour momentarily to sacrifice a Victim to my injur'd Honour. 1799 R. SICKELMORE Agnes & Leonora I. 8 This was momentarily agreed to. 1801 E. HELME St. Margaret's Cave II. iii. 60 The friar groaned, but almost momentarily recovered his emotion. 1899 W. J. LOCKE White Dove (1900) iii. 43 Sylvester.., having done all that was momentarily possible, was at last able to reflect. 1984 ‘TIRESIAS’ Notes from Overground 7 He travels on dozens of different trains which he can momentarily distinguish by certain details.

So a deprecated sense, but not exactly evidence for "growing imprecision of usage".

Livid: he sets, he throws. Powers swings from the heels and misses everything. Strike three -- back to the dugout!

The American Heritage tells us that livid means

1. Discolored, as from a bruise; black-and-blue. 2. Ashen or pallid: a face livid with shock. 3. Extremely angry; furious.

with no usage note. The OED gives the same third sense: "Furiously angry, as if pale with rage," and cites examples from 1912 onwards:

1912 Collier's 9 Mar. 21/1 He sprang to his feet, livid. ‘That's a lie,’ and he stopped suddenly, startled by his own violence. 1918 C. MACKENZIE Early Life Sylvia Scarlett II. ii. 292 He was livid with fury. He asked if I thought he was made of money. 1936 M. KENNEDY Together & Apart II. 151 Betsy is livid. She says now she will fight to the last ditch to get complete custody of the children. 1949 R. CHANDLER Little Sister ii. 10 Orrin would be absolutely livid. Mother would be furious too. 1959 J. VERNEY Friday's Tunnel xxiv. 214 Friday's livid because he thinks you've punctured his bike. 1973 ‘D. SHANNON’ No Holiday for Crime (1974) x. 162 Mr. MacFarlane would be livid to have it [sc. whisky] impounded as evidence.

Mr. Powers, meet Raymond Chandler, "America's foremost detective novelist." I think you were just explaining something about his misuse of words. Oh, and there are some folks named Adams, Webster, Dickens and Hawthorne who wanted to talk with you. Something about "growing imprecision of usage", I think it was?

As I asked at the start of this discussion, can't anybody use a dictionary anymore?

Posted by Mark Liberman at 03:09 PM

"sex pro" (not)

A Language Log piece on recent Sapir-Whorf research is (at this moment) the top-ranked Yahoo search result for "sex pro." This underlines something that we all know: as useful as information-retrieval systems are, they still have a problem with precision (which is what IR researchers call the proportion of things found that are relevant). Either that, or I need to work on my headline writing.

Just for the record, I know about Yahoo's current results for "sex pro" because some internet pilgrim found our site by following the link from Yahoo, and this was duly noted in our server log, which I check from time to time. These days, I'm happy to say that most of the (roughly 400) folks a day that come to our site via search engines probably find what they're looking for. Last night's 27 pilgrims interested in talking parrots, the 30 who wanted to know something about emo, and even the 22 who asked about "wedding vowels", all probably went away with something useful to them. Well, I'm not sure about the substantial group who asked about "emo girls" and similar things, but I'll give us the benefit of the doubt there.

The logs of a site like ours are not a very good measure of the precision of the search engines, since mostly people should be able to tell from the short context presented with each suggestion whether it's a hit or a miss (and often it's pretty hard to figure out from the query text what the searcher was really looking for). But there were certainly several other crystal-clear false positives in last night's crop, such as the folks who found our site as a result of queries like "how to say things about sex in other languages". We haven't provided any relevant information here, but judging from the level of interest, there's a market for a book by that title.

Posted by Mark Liberman at 09:07 AM

SCO, Linux and Freedom

As a long-time user of Linux (I've done almost all of my work on Linux since 1995) and a member of the free (as in "free speech", not "free beer" - click here for a discussion of the distinction.) software movement, I'd like to expand on Geoff Pullum's comments on SCO's attack on Linux. First, a bit of background. Linux is a Unix-like operating system kernel, originally written by Linus Torvalds, subsequently developed under his direction by a large group of volunteers. Unix was originally written at AT&T Bell Laboratories. It is usually used together with software produced by the GNU Project, the founding and leading light of the free software movement, a combination referred to by some as GNU/Linux. The Unix trademark currently belongs to The Open Group. The copyright to a version of the Unix code that originally belonged to AT&T was transferred via a chain of transactions to the Novell Corporation. SCO purchased certain rights from Novell, including, according to SCO, the Unix copyright.

There are actually two things at issue. First, SCO has sued IBM, claiming that IBM contributed to Linux intellectual property belonging to SCO. Originally, this took the form of a trade secret claim, in which SCO claimed that IBM had put SCO trade secrets into Linux. Judging from SCO's filings with the court, they appear to have abandoned this claim and to have shifted to a copyright claim. This claim does not involve the old Unix code. Rather, SCO's claim is that under their contract with IBM, IBM's own work on its version of Unix constitutes a derivative work of the original Unix code and that therefore IBM has infringed SCO's copyrights by contributing its own code to Linux. One of IBM's defenses is that SCO is just plain wrong about the contract and that SCO has no claim to IBM's code.

Second, SCO has been claiming that large amounts of Unix code to which it owns the copyright have been copied into Linux and that it is therefore entitled to demand license fees from Linux users. SCO hasn't yet tried to sue any Linux user; in fact, it hasn't sent out any invoices. As a result of SCO's public claims, Red Hat, a major Linux distributor, has sued SCO for damaging its business.

One major issue here is of course whether there is any infringing code in Linux. For the most part, SCO has refused to identify any. They originally claimed that the infringing code had been identified by a team of MIT mathematicians. When pressed to identify them, SCO first retreated to the claim that its team had formerly been associated with MIT, then just became silent. It looks like they just made them up. Anyhow, SCO eventually presented a couple of snippets of allegedly infringing code at a trade show. Even then, they obscured the code by using Greek letters. This pathetic attempt at encryption was broken without delay, and the code shown not to be infringing. You can read Bruce Peren's analysis of the allegedly infringing code here.

More recently, SCO identified some more allegedly infringing code. This turns out to consist entirely of header files that define what is called the Application Binary Interface. Everybody else believes that these files are in the public domain and not copyrightable.

The other major issue is whether SCO in fact owns the copyrights on the UNIX source code that it claims has been incorporated into Linux. Novell claims that it did not transfer these copyrights to SCO. SCO has sued Novell for "slander of title" but the case has yet to come to trial. On my reading of the contract, Novell is right.

SCO also has a problem because until very recently SCO programmers made contributions to Linux and SCO itself distributed Linux. That puts them in a very poor position to complain. Furthermore, by demanding a license fee for Linux, SCO has violated the GNU General Public License (GPL) under which Linux is distributed, which automatically terminates SCO's right to distribute Linux. In response, SCO has alleged that the GPL is invalid. All of the legal opinions that I have seen are that SCO is spouting nonsense. See, for example, this article, entitled SCO: GPL Unconstitutional. Lawyers Scratch Heads. If the GPL is invalid, then SCO has no right to distribute Linux at all.

SCO would like everyone to think that it is angry Linux advocates who are responsible for the MyDoom worm, but this is almost certainly not the case. (Here's a good summary of the evidence.) To begin with, analysis of the worm's code shows that only 1/4 of the machines infected by MyDoom will launch distributed denial of service (DDOS) attacks on SCO. After attacking SCO, MyDoom will attack users of the Kazaa peer-to-peer file sharing network, which has nothing whatever to do with either SCO or Linux. MyDoom creates "backdoors" in all of the machines it infects, allowing them to be controlled from the outside.It also installs a keylogger, allowing crackers to see everything the user types, such as credit card numbers and passwords. Another piece of evidence is the fact that the volume of email containing MyDoom was so large from the outset that only spamming techniques could have created it. The evidence is that MyDoom was created by professional criminals for the purposes of spamming and identity theft. The DDOS attack on SCO is simply a diversion.

[Update 2004/02/10: According to this Eweek story, the latest version of MyDoom is now circulating. This version attacks Microsoft's main web site. It does not attack SCO.]

There are some excellant sources of information on the SCO affair. The Free Software Foundation has issued a statement and the Open Source Initiative has issued a position paper. The single most important source is Groklaw, which includes frequent news updates, extensive commentary and analysis, and an archive of documents. Groklaw is an amazing and inspiring phenomenon in its own right. It started out as a blog by Pamela Jones, a paralegal and Linux user. Her thorough research and penetrating analysis attracted a large readership, and the site turned into a collective, volunteer research effort. Groklaw members report leads, scan and format legal documents, and send in eyewitness reports from sessions of the court. Groklaw is an outstanding example of the role that the network can play in the functioning of a free and open society.

What does this have to do with language and linguistics? Well, in part I'm going on about it just because it is important, even if it doesn't have to do with language. SCO's attack on Linux and more generally on the free software movement, is an attack on freedom. But there is a more specific connection to language, namely the role that free software is beginning to play in the maintenance of minority languages.

The companies that produce proprietary software are out to make money, and they generally aren't terribly interested in small markets. At best, they get around to it when they feel like it. The result is that proprietary software tends to be available only for major languages. It is much easier to adapt free software to smaller languages, where by "small" I mean both languages with small numbers of speakers and languages whose speakers, though comparatively numerous, are too poor or otherwise disadvantaged to be well served by proprietary software. In some cases, proprietary software vendors have moved to adapt their software to small languages only after free software did so. This article describes how Microsoft released Welsh versions of its software only in response to the creation of Welsh versions of Linux and other free software. Here are web sites where you can read about how Linux is being adapted to the languages of South Africa and India.

Only free software was used in the generation of this post.

Posted by Bill Poser at 12:52 AM

Cotton-top tamarins: on the road to phonology as well as syntax?

Fitch and Hauser's recent investigation of monkeys' perception of sound patterns featured cotton-top tamarins, whose own vocalizations are interesting. Specifically, they seem to exhibit what Charles Hockett called "duality of patterning": larger patterns made up of well-defined combinations of recurrent, well-defined smaller units. Hockett's model was phonology: words are not arbitrary equivalence-classes of vocal noises, but instead spell out their claims on sound in sequences of phonemes, whose phonetic interpretation is independent of the identity of the words involved. (A more modern story would represent the sound of words with phonological features arranged in syllables and similar structures -- but the principle is the same).

This idea was generalized by Michael Studdert-Kennedy and Louis Goldstein in their paper “Launching language: The gestural origin of discrete infinity” (In Language Evolution, edited by Morten H. Christiansen and Simon Kirby,  New York: Oxford University Press, 2003):

"...the only route to unbounded diversity of form and function is through a combinatorial hierarchy in which discrete elements, drawn from a finite set, are repeatedly permuted and combined to yield larger units higher in the hierarchy and more diverse in structure and function than their constituents. The particulate units in physical chemistry include atoms, ions, and molecules, in biological inheritance, chemical radicals, genes and proteins, in language, gestures (as will be argued below), segments, syllables, words and phrases."

Another way to think about this is that such systems are essentially digital:  a (relatively) small number of qualitatively distinct things can be combined in a limited number of ways to create a much larger number of distinct things. This is a useful property for a communications system, if you want to be able to send messages reliably over noisy channels. Many descriptions of animal vocalizations lack this property: instead, we find a limited number of qualitatively distinct vocal displays (such as alarm calls or isolation calls), with no further (digital) substructure, and no combinatoric properties other than what follows from being emitted at various different times as individually appropriate.

But cotton-top tamarins are said to have at least a two-level hierarchy. This page claims that "Cotton-top tamarins have an extensive vocal repertoire which is derived from the variation of two basic elements and the sequential combination of those elements (Cleveland and Snowdon 1982)." Looking at the spectrograms and listening to the recordings, I can't figure out how to cash in this generalization as a model of the specific calls shown, but I'll keep at it.

There are quite a few animals whose communicative displays have at least the beginnings of this sort of digital structure. Bird song is a well-known and well-studied case, where some terrific science is being done these days. It's less well known that cephalopods have complex communications systems that exhibit clear duality of patterning. Their systems involve body patterns (of color, reflectance and skin texture) and posture (especially of tentacles). Here's a picture of the "chromatic components in Loligo vulgaris reynaudii that are used to build up body patterns" (from Hanlon et al. 1994):

Sepia officinalis (the cuttlefish) has even more "chromatic components" (35), and apparently cephalopods combine such components in many different ways, sequentially as well as synchronously. The displays to which researchers assign meanings are combinations of these components, for instance the "lateral display" of Loligo plei involves (among other things) "arms compressed dorsoventrally, tentacles extended/dark, mid-ventral ridge extended, lateral flame markings, dark eye ring, arm spots, dark stitchwork fins, iridescent arms." As the costume designer William Ivey Long said about another kind of visual display, "those are just words. The effect is, of course, insane." At least if you're down with the semiotics of loliginid courtship displays.

I'm sure that monkeys have more to say to one another than squid do, even if they don't come up to Geoff Pullum's standards for conversational partners. But still, it's interesting that among non-human primates, some 400 million years after cephalopods evolved, the complexity of communication systems does not seem to have increased at all, at least as measured by the size of the set of basic communicative "atoms" and the richness of their modes of combination. In fact, I don't know of any non-human primates that are said to have as many qualititatively distinct elements of communicative displays as squid and cuttlefish do.

The mechanical substrate for language seems to have been lying around, ready for use, for hundreds of millions of years. Why didn't evolution pick up on the possibilities in a serious way until so very recently? There are several ideas -- new and old -- for an answer to this question, which I'll survey in a later post. One theory is provided by von Humbolt: "The articulated sound, the foundation and essence of all speech, is extorted by man from his physical organs through an impulse of his soul; and the animal would be able to do likewise, if it were animated by the same urge." But there are other stories about this as well, all interesting, none yet shown to be true.

Posted by Mark Liberman at 12:30 AM

February 08, 2004

An Orthographic Eggcorn?

Twice in the last week I've come across the spelling lynchpin for what the Oxford English Dictionary defines as `a pin passed through the end of an axle-tree to keep the wheel in its place'. Both the OED and Webster's Collegiate Dictionary (9th edition) give the standard spelling of the word as linchpin. It's from an Old English word lynis, meaning `linchpin', to which a redundant pin was later added (redundant in much the same way as the is redundant in the hoi polloi.) But the vowel spelled y in Old English later merged with Old English i and is normally spelled i in Modern English, as in kiss from Old English kyss-an, so the Old English vowel sound can't account for the odd Modern English spelling lynchpin. Webster's does give an alternate spelling lynchpin, but it's clearly indicated as the less favored spelling. The OED doesn't have a modern spelling with y.

So why did two different writers use the spelling lynchpin? The context suggests a clue: although in both cases this word was used in its usual modern sense of a crucial linking element tying two concepts together, the surrounding passages concerned minority-group members. I suspect that the spelling was a folk etymology, an eggcorn, that replaced the unfamiliar element linch with the familiar word lynch -- all too familiar a word when the topic is minority groups in the United States. If I'm right about this, it's only the spelling that signals the eggcorn, because lynchpin of course sounds just like linchpin.

The word lynch, by the way, has no etymological connection with linch(pin). The OED says that lynch is derived from lynch law, a modern American phrase whose origin is obscure. One claim, according to the OED, is that it comes from a Mr. Charles Lynch, a justice of the peace who was given to imprisoning people without bothering about any formal legal proceedings; but Mr. Lynch has his defenders, who assert that he never did any such thing. And it seems that the original penalties suffered under lynch law were unpleasant but not fatal, involving things like whipping or tarring & feathering. Times changed.

Posted by Sally Thomason at 11:24 PM

The Bird Clapper: A New Tool in Semiconductor Fabrication

My cousin David recently had occasion to read a Japanese patent on the fabrication of light-emitting diodes. Japanese patents are, naturally, written in Japanese, but for the benefit of foreigners the Japan Patent Office provides machine translation. David asked me about this one because it mentions a "bird clapper", an item not usually used in semiconductor fabrication.

Here is the machine translation of the first claim in the patent. Before you read it, it is only fair that you should read the disclaimer that the Japan Patent Office attached to it:

This document has been translated by computer. So the translation may not reflect the original precisely.
I'm afraid that this is a bit of an understatement. I've highlighted bird clapper for you lest you get lost en route:
Have the following and the laminating of the aforementioned joint object is carried out to multi-stage on an insulating transparent substrate through an electrode. It is the organic thin film light emitting device to which a latter joint object, on the whole, covers the joint object of the preceding paragraph on the occasion of the laminating, and the aforementioned joint object is characterized by the bird clapper by connecting the same polar electrode mutually electrically on an insulating transparent substrate while, as for an electrode, a positive electrode and a negative electrode are arranged by turns through a joint object. Insulating transparent substrate Electrode The joint object of a charge pouring layer / luminous layer

Even for those who can't read Japanese, I'm not sure that this is much more enlightening than the original Japanese text:

Japanese text of patent claim

Here is my own translation, which I've tried to keep fairly close to the structure of the Japanese:

An organic thin film light emitting device such that, having in hand a transparent insulating substrate, electrodes, and the bonding substance for the electric charge injection/light-emitting layer, the aforementioned bonding substance is laminated on top of the transparent insulating substrate in multiple stages by means of the electrodes, and as for the electrodes, the electrodes of the same polarity are connected to each other electrically by means of the transparent insulating substrate at the same time as the positive and negative electrodes are positioned alternately by means of the bonding substance, and as for the aforementioned bonding substance, at the time of lamination, has the distinctive property that the bonding substance of the latter stage comes entirely to cover the bonding substance of the former stage.
[What I have translated as "bonding substance" is apparently a technical term in the fabrication of organic light-emitting diodes, an area in which I am not expert. It could be "bonding body" or something else. If any reader knows the correct translation for this term, I would appreciate it.]

So, where did the "bird clapper" come in? Well, toward the end of the third clause within this tripartite relative clause we find the following sequence of words:


The translation program misparsed this and grouped the verb naru and the first syllable of the noun koto together to form the word naruko なるこ, usually written 鳴子 A naruko is a kind of noisemaker or clapper, made of a small board to which lengths of bamboo are affixed. It is also called a "bird rattle" because it was originally used to scare birds away from farmers' fields. It is now used as a percussion instrument in some types of dancing. There aren't any bird clappers in the Japanese text - the machine translation program invented them.

Grouping together parts of different words into a single word isn't as stupid as it might seem since Japanese does not separate words in writing, but both this and the garbled syntax of the rest of the claim indicate that this MT system isn't doing too well. A guess at one problem is that many MT programs assume that the text consists of sentences, but this one doesn't: each claim is a Noun Phrase.

Posted by Bill Poser at 11:06 PM

The Lotus 1-2-3 position

These three legal cases might not appear to have much in common except the connection to copyright and California:

  • Lotus sues Borland over a Borland product, Quattro Pro, that was designed to look and feel like Lotus 1-2-3 so users could readily switch. (Case finally won by Borland in 1996 when the US Supreme Court deadlocked on it -- Justice Stevens recused himself -- and thus let an earlier judgment of the 1st US Circuit Court of Appeals stand.)

  • SCO sues IBM alleging that Unix-derived code to which SCO owns the rights is present in Linux code. (Case filed but not yet heard.)

  • Yoga guru Bikram Choudhury throws cease-and-desist letters around at San Francisco Bay area yoga schools who are (he alleges) teaching his conception of yoga.

But they are actually all about drawing the line between languages and the things uttered in them -- how to tell the dancer from the dance.

Borland, a relatively small Santa Cruz County software company (rah!) was sued by the big Lotus corporation (boo! hiss!) for using in the Quattro Pro spreadsheet product a set of menus with the same commands in the same order as Lotus 1-2-3 menus. Lotus was claiming in effect that they could copyright a (partially visual) language of screen boxes and command lists and command names etc. (See this essay for a nice discussion of some issues involved in lawsuits about copyrighting of software.) Copyright a program or a script or a package of macros, sure, but not window borders or color schemes or the notion of having Quit on the Quit button.

The SCO case is different in that SCO sued IBM for something that would be a valid cause for complaint if it were true. They claim ownership of key bits of code that turn up in the open-source Linux operating system (installed on some IBM machines). SCO purchased the rights to the proprietary Unix code developed decades ago by AT&T. Here the claim is in principle a reasonable one: use the same programming language that I write in, by all means, but if I write a clever function that does some key trick in the operating system, I own the piece of code expressing it and you can't plagiarize it or give it away to your friends. Use the same programming language to devise your own way of doing exactly the same task, sure, but don't steal my code.

So what's got everyone so mad at SCO (mad enough that some think it might have been renegade Linux militants who released the W32/Mydoom@MM virus with its payload of SCO-targeted service-denial anthrax) is not that copying code isn't a copyright infringement, but that the Linuxites suspect the SCO suit is just harassment -- SCO can't prove its complaint, the Linuxites believe, but can cause trouble, and is hoping to be bought off, or purchased outright. Linux programmers are well aware that parts of Unix are proprietary. They don't normally have access to those parts anyway, but where the content of any of the source code does become known, Linux programmers always rewrite it in a different way. What they're after is the same input/output behavior as classic Unix -- the file copying program is called cp, it does interactive prompting if you call it with the -i option, it takes either two file names or an arbitrary list of file names and a directory name, etc. -- but that isn't copyrightable (that would take us back to the Lotus case). So Linux enthusiasts think SCO is just bluffing, and have dared the company to publicly reveal which pieces of code it thinks have been plagiarized.

Finally, Bikram Choudhury says he has taken a select 26 postures from the classical 84 of traditional Indian yoga and knitted them, with accompanying script, into a carefully designed exercise session that constitutes something he can claim rights to (San Francisco Chronicle, Thursday, February 5, 2004, front page and page 10, or online in Salon magazine's story by Nora Isaacs). He claims that no one else should be allowed to teach that series of 26 positions in just the way he does, and has his lawyers fire cease-and-desist letters at anyone who gets too close to doing his thing.

As I read the law, he could well find the courts ruling in his favor. What he has done could readily be represented as a lot like composing a poem or choreographing a ballet or writing a program. It's not look-and-feel -- anyone is entitled to teach yoga in their own way according to the ancient Indian tradition. And it hasn't got the uncheckability of the SCO case. SCO's claim that ripped-off proprietary Unix secrets that they won't name are buried in plain sight in openly available Linux code, but they won't tell you what or where, seems utterly inscrutable. But it should be straightforwardly checkable whether a yoga studio is doing the Choudhury program according to the Choudhury script. He's not taking the Lotus position, because he's not trying to copyright the lotus position.

[Thanks to Ari Kahan (UCSC linguistics B.A. and UCLA law graduate, now an intellectual property lawyer) for advice and corrections.]

Posted by Geoffrey K. Pullum at 06:22 PM

Reanalysis -- and not

Last week, the always-interesting Rosanne posted over at the X-Bar:

Here's a tidbit from my friend's son -- she refers to him as a linguistic adventurer, which I think is a delightful way to describe anyone acquiring language:

He objects to wearing long underwear beneath his clothing once he’s inside the house. He lets me know about this objection by peeling himself out of his topmost layer of clothing and bellowing: "Too both! Too both!"

Analysis withheld until I've had a good night's sleep. It is what it is, just too both.

As adult English speakers, we experience "too both" as a small collision between grammatical matter and anti-matter that vanishes in a flash of amusement. But in a more analytic mode, I suspect a prosaic back-story. For a kid just learning to talk, "too X" might as well mean "I don't like X" or "I'm uncomfortable about X" or "I can't deal with X" -- "too hot", "too cold", "too big", "too heavy". And "both" might as well mean "two layers of clothing" -- "let's see, should you wear your t-shirt? or your sweatshirt? or both?" So "too both" might alternatively be glossed "I-can't-deal-with two-layers-of-clothing", though this completely lacks the poetic concision of the original.

Traces of such infant reanalyses are everywhere in the speech of small children. Similar things sometimes happen in the larger history of languages, but not often in comparison. One little kid can't shift the whole cultural weight of the adult world, at least not very often. If a reanalysis takes hold, it must usually be because it spontaneously happens over and over, not because one infant's idea spreads to the speech community as a whole.

Some apparently plausible reanalyses are conspicuous by their absence. For example, the sequence "the uh" occurs about 600 times per million words of English conversation. This is roughly as frequent as words like look, ever, else, try, why, away, again, few, type, give, made, once or old. Sequences of filled pause ("uh") following other common determiners (such as this, that, my, your, his) are also common.. But I've never heard of a "linguistic adventurer" developing new words like "the-a", "this-a", "his-a", though it would be reasonable to guess that these might be (say) distal versions of the core determiners: "my-a X" = "that X of mine that we haven't just been talking about".

Is this because kids recognize that filled pauses are not exactly morphemes? That seems plausible, but there's quite a bit of research suggesting that "disfluencies are often really communicative choices rather than system failures." In that case, why don't filled pauses get lexicalized, alone or in combination with common adjacent words?

Is it because the communicative connotations of filled pauses are not salient enough, or at least not available to learners until later in development, when their basic grammar and lexicon of functional categories is already established? Maybe.

Or do filled pauses in fact get lexicalized (in language learning and/or in language history), so that I'm just wrong about the facts? In that case, some reader will probably set me straight.

[Update: Maryellen MacDonald emailed:

I read your language log post on child reanalysis (too both, etc.) and wanted to comment on a couple of points. You were speculating on why children don't reanalyze the-uh despite its extremely high frequency, and in connection with this you noted that

there's quite a bit of research suggesting that "disfluencies are often really communicative choices rather than system failures."

There's room to interpret much of the data here somewhat differently, I think. That is, there is good evidence that comprehenders make very good use of disfluencies to help comprehension (and so can language models, as Stolcke and Shriberg show), but it does not follow from this result that the disfluencies are provided for the comprehender's benefit. Disfluencies appear to be uttered (perhaps among other reasons) to buy the speaker more time to plan upcoming material, and thus they have a very non-random distribution--they appear in advance of more difficult material--lower frequency words, longer phrases, more complex construction, etc. The presence of a disfluency therefore has a great deal of predictive value for the comprehender concerning the upcoming material. Thus uttering "uh" is certainly a choice on the part of the speaker, and it also has communicative consequences, but it need not be a choice with the comprehender's needs in mind. Comprehenders may just be very good at taking advantage of choices that are being made to meet the speaker's needs alone.

Now as for why a child wouldn't reanalyze the-uh as a variant of the, etc. I lean toward your first hypothesis, that kids realize that filled pauses aren't exactly morphemes, and that they do this because their distributional pattern is unlike that of real morphemes. Filled pauses show up in a variety of structures and follow a variety of words beyond just deteriminers, of course. A major commonality is that they follow easy (short, high frequency) material and precede longer, lower frequency words. Also their acoustic duration probably varies more than for other syllables that are bound morphemes. And this isn't like the distribution of real morphemes.

I mostly agree, though I'm not at all sure that the "acoustic duration [of filled pauses] probably varies more than for other syllables that are bound morphemes". Since pauses (especially unfilled ones) induce lengthening of final syllables, I bet that things like "-ing" are pretty variable in duration. And uhs are often pretty stereotyped in their performance. Checking this out would be a nice term project for a student in a phonetics course.

In a different vein, Nicholas Widdows emailed that

Very interesting point about the possibility of lexicalization of fillers. My first reaction was that the filler isn't analysable as having any particular syntactic function, since it can occur almost anywhere. There's not enough purchase for the child to analyse it as a determiner affix.

Then I looked at the earlier statistics you quoted and wondered what entropy could actually do to syntax learning. If [@] has some predictive value at some point, how could that get grammaticalized? In DP = [D N] it's phonologically attached to D but pragmatically marks N as unfamiliar, as opposed to 'the cat', familiar, no [@]. That is, if the child begins by perceiving it as a distinct element in [D X N], the semantic relationship is [D [X N]] and you need some kind of reanalysis to create syntactic [[D-X] N].

Presumably the fillers are more likely just before the high-entropic parts, N or V, and we can pre-construct the functional category scaffolding for what we plan to say before realizing we need to dredge up the lexical word. So we get 'and then I, er...' and 'put the, er...', but that doesn't necessarily mean that the parser has a slot at that point available for syntactic content.

What I'm trying to say with all these points is that an analysis of the filler as material actually attached to a preceding functional category won't be made because the syntactic structure doesn't license it there.

I'm not sure that I buy this. It's not true that determiners are always atomic, certainly -- it's normal for them to show gender and number agreement, of course, and there are some other sorts of morphologically complex determiners as well.

It seems like all you'd need would be an accessible meaning dimension to associate, and there are some plausible ones. For example, some languages have a systematic proximal/distal distinction in determiners. So the learner could hypothesize that "uh" is a distal morpheme, since it tends to occur before items that are being newly introduced, are less familiar etc. Some languages also encode new/old information differences, so the same patterning might lead a learner to think that "uh" marks new information.

Though I guess I have to admit that I've never heard of the theme/rheme distinction being marked on determiners. And I might well be missing some of Nicholas' point, not being a syntactician.

In any case, I'm inclined to agree with Maryellen that learners catch on quickly to "uh" being some kind of out-of-band signal, even if they also learn to recognize its information content. The best argument would be that filled pauses never seem to get lexicalized in any context. I cited the case of "the uh" because it's the most common, but you could have "nice uh" or "mean uh" or "boy uh" as lexicalization candidates just as well.

Of course, another problem is that if a child went through a stage of using "uh" in some novel way, it might be pretty hard to notice it. The meaning is likely to be somewhat subtle, and there aren't any contexts in which "uh" would be strikingly out of place. We notice things like "too both" because they're ungrammatical in the adult language; we notice things like "let's fight with our chudders" (generalized with re-analysis from "each other") or "whobody's there?" because they involve recognizably non-standard lexical items; but would we even notice "the-a" and "my-a" if a three-year-old started using them as rhematic determiners? Heck, would we notice if 10% of the population of North America was doing it, day in and day out? ]

Posted by Mark Liberman at 10:29 AM

February 07, 2004

The giant logical spider web of grammar

Sir Winston Churchill is reported to have said once about his electoral opponent, Labour party leader Clement Atlee, "He's a modest man, with much to be modest about."

Friday afternoon's colloquium in my department at the University of California, Santa Cruz, was by a modest man whose modesty is completely unjustified. Dick Oehrle doesn't do anything much in the way of self promotion. He just produces wonderfully rich research papers, mildly unorthodox but not intellectually isolated, often very technical -- quite a bit of logic and mathematics. His fascinating talk on Friday was built from a detailed examination of the extraordinary surprises of the English tense and modality system and a proposal about the precise description of the meanings of the tenses. "Fasten your seatbelts," said Dick quietly just before launching into the main part of the talk.

An encouraging event. Don't beam me up yet, Scotty; there is intelligent life in my subject down here.

Since you weren't there, I'll give you just one example, out of dozens, of the sort of strange factual stuff to be described. It comes in four steps.

1. The form knew is the preterite (or "past") tense form of know. So if you think you know but you don't, the way I will describe the situation tomorrow, when it's in the past, is to say that you thought you knew but you didn't.

2. In the same way, the form could is the preterite of can. So if you think you can but you can't, the way I will describe the situation tomorrow, when it's in the past, is to say that you thought you could but you couldn't.

3. You can't generally use a preterite with an explicit indication of future time; it sounds completely nuts:

??I knew the answer tomorrow.

There's really no way to make that sound sensible without a fairly strained invocation of time travel or dreaming something. It does not sound normal.

4. So now look at this sentence, with the preterite form could:

I could do the job tomorrow.

No time travel there. It's a perfectly normal thing to say. So why on earth would that be? Do we have to postulate two different words with the spelling could, suspiciously similar in their connection to ability but differing just in their compatibility with future time reference?

The point here is not to come up with some particular way of brushing aside this as merely a case of this, or merely necessitating that, and easily handled in some ad hoc way. The point is, rather, that the grammarian confronts dozens, scores, hundreds of facts of similarly unexpected character, all apparently related in some vast abstract web of grammatical and semantic connectedness whose nature is not known, so that any ad hoc descriptive move you make is likely to be undone by its deleterious consequences for the solution of one of the other puzzles (either grammatical or semantic).

Plenty is known about the relevant generalizations. There is a careful description of all of this domain of English, due to Rodney Huddleston, in The Cambridge Grammar of the English Language (see chapter 3), which Dick's talk referred to several times. It fits togther proposed solutions to most of the descriptive puzzles (though possibly not quite the right solutions; you can never quite get all the data to fit in perfectly). Dick thinks all previous accounts place too much reliance on syntactic and morphological classification and have not devoted enough effort to work on the semantics of tense.

What I liked was not any particular technical proposal in the talk, but the way Dick sees what the project is. What we're doing here is like exploring a giant logical spider web of abstract relationships, in many dimensions, in darkness, with no access to evidence about it other than to tally the reports of people who have run into tiny bits of it. That's what makes it so interesting (and such a long-term prospect) to investigate the structure of even a well-studied language like English. That's what Dick Oehrle understands about the work that grammarians do.

Posted by Geoffrey K. Pullum at 02:55 PM

February 06, 2004

Weblog marketing opportunities

Yesterday's USA Today had a front-page article on high Google rankings as a marketing tool. The (nonexistent) Language Log marketing department is (alas not) working on how to leverage our current high search engine rankings for "incall" (#1), "the difference between right and wrong" (#2), "emo girls" (#7) "wedding vowels" (#1), "high jinx" (#5), "something I need to know" (#1), "captive bolt stunner" (#6) "communication tricks" (#1) and "talking parrot bbc" (#1), all taken from searches that reached our site in the past hour, according to the referrer logs.

[Update: from a sample taken a couple of hours later, I can add "he she sex adult" (#2), along with less lucrative placements such as "autodidacts", "welcome new overlords", and "nuclear pronunciation".]

Posted by Mark Liberman at 05:28 PM

Excitement at the Guardian about language research

In the Guardian of 2/3/2004, under the headline "Brain scan sheds light on secrets of speech", Ian Sample surveys a potpourri of recent language-related research in the UK. He sketches fMRI studies on lateralization of intonation perception and functional localization of speech vs. nonspeech sounds, and other work on aphasia, on stylistic correlates of personality types, on textual information extraction for data mining, and on a new screening test for children's language-related disorders.

It would be easy to find something to complain about in every paragraph. The commonest mistake is presenting a recent link in a long chain of research as if it were the first work ever. There are also cases where the thumbnail sketch of a research result is misleading or even outright wrong as stated. For example, the work of Jon Oberlander and Alastair Gill at Edinburgh, which Oberlander's home page describes as "modelling personality-based differences in discourse generation", is said by the Guardian piece to have "found tell-tale signs that reveal how extrovert or neurotic you are". As a shy person, I object to the implied opposition between "extrovert" and "neurotic"; and in fact the Edinburgh work (e.g. here) uses conventional personality scales, in which extravert/intravert and neuroticism are separate dimensions.

But really, it's wrong to complain. Or at least, those of us who care about the study of language should celebrate first, and try to straighten out some of the mistakes later on.

The main point here is that the popular press is excited about linguistic research. It might be technically false to talk about how we've "discovered for the first time" things we've known for a while, or to exclaim that the "results are startling" when they're more or less what most researchers have thought for decades -- but it's poetically true. There really are discoveries here, even if the press is sometimes a little confused about just who discovered what when, and the research really is startling and exciting, on a slightly larger time scale. The article is full of words and phrases like "amazing", "big guns", "unprecedented insight", "research thrust", "revolutionise information gathering", "acute interest", "direct impact". So let's just relax for a while and enjoy the flow of positively-associated vocabulary items.

The key to the article is this quote (about half-way through):

"Language is at the very heart of what makes us human," said Geoffrey Crossick, chief executive of the Arts and Humanities Research Board. "It is about how we think, understand the world and communicate with each other. If we are to understand these activities, let alone to harness technology to help us carry them out, it is essential that we understand language and how it works."

I'll raise a glass to that.

[Guardian link via Zoe Toft]

Posted by Mark Liberman at 10:01 AM

"Snowclone" as a chart R&B song

J-P Stacey is annoyed by "clichéd journalistic laziness", but is even more annoyed by Glen Whitman's coinage "snowclone":

Redolent of William Gibson and other “writors”, it sits on my screen like a chart R&B song in my CD collection. And although it was coined in mid-January (they’re able to date it to the second because it was sent by e-mail: how terribly clever of them) it nonetheless sounds like something from either the mid-90s or early 80s. Retro kitsch or merely a flailing motion made by the thinly read?

It’s ironic that the name itself has its origins in a fit of laziness, grasping for the nearest thing to hand: the apocryphal, middlebrow saw about Inuit words for snow. An irony that, in a burst of meta-irony, would be best appreciated by those involved: those least likely to admit to having been slapdash. Even that meta-irony is—I could go on, but that would make me one of them.

I'm afraid that I lost the thread of this complaint somewhere in the second level of meta-irony, because I was distracted by the scare-quoted "writors" in the first clause. Is Stacey suggesting that Gibson has a spelling problem, or has adopted a non-standard spelling for this word as a sort of badge of identity, or is commonly referred to in this way by some other group for some reason? If so, I've missed it. Or is this just an obscurely snarky way of saying that Stacey thinks Gibson is a bad writer? And why are the words of writers who can't (or won't) spell writers right "like a chart R&B song"? Probably it's just that Stacey feels socially and culturally superior to both, but perhaps there's a more substantive connection.

I'm just a bystander here, but if I were Glen Whitman, I'd be happy to be classed with William Gibson and chart R&B songs. I'm tempted to try to annoy Mr Stacey enough to get him to call me some names too. Let's see, could I be 'redolent of Stravinsky and other "muzishuns", sitting on the screen like a Iain Banks novel in my collection of Victorian masterpieces.' No, I'm no good at this, I'll leave it to him.

[Update: John Kozak emails:

I can elucidate "writor" (I think).

The UK political/cultural TV satire programme "Spitting Image" had as a stock character a Leonard Nimoy puppet, who would start with the declamation: "I am not Spock! I am LeoNARD Nimoy, the actOR", do a chunk of Shakespeare, then spasm into Trekisms (e.g. "to be or to ... Beam me up Scotty!").

So, "actOR" is widely used in the UK to denote a certain kind of thesp. I'm guessing "writor" is by analogy with this, though I haven't come across other evidence that this is productive. Interesting inflection, though, if so.

I'm always thankful for clues to linguistic puzzles, but I don't think I get it yet. If John's construal is correct, then J-P Stacey was suggesting not that William Gibson has a spelling disability, but rather that he is -- pretentious? That's at least not completely nonsensical, but it seems too meta-ironic (infra-ironic? ultra-ironic?) to credit.

However, it indeed would be a devilishly subtle inflection: to imply that someone is aspiring above his station, merely by failing to reduce the vowel of the agentive suffix. K3wl! I'd try it out the next time I'm patronized by a "waitor", but I'm afraid that this may turn out to be like Marmite, one of those features of British culture that doesn't cross the Atlantic very well. ]

[Update #2: T Campbell emails another theory:

I suspect it has less to do with Nimoy and more to do with the slangy spellings of the "leet haxors" or "leet haxorz" or "1337 h4x0rz."

That makes sense, but (like many sensible things) it would be a disappointment -- I was getting attached to the theory of an unreduced-agentive morph meaning "inappropriately ambitious". And shouldn't it be "wr17Orz" or "wr173rz" or "wr1tOrz" if it's 1337-speak? Anyhow, it all goes to show that "theory of mind" reasoning is hard. Stacey meant something by misspelling "writors" in scare quotes -- but what?]

[Update #3 (4/10/2004): Stacey has noticed this note, and explained himself (as of 4/8). After reading what he has to say, I still don't know why he called William Gibson a "writor". He did oblige me with a few paragraphs of wordy insults, but none are very memorable. I'm disappointed, since he has a real talent. ]

Posted by Mark Liberman at 08:01 AM

February 05, 2004

Inflections, genes and western Iran

As a gloss on the discussion a while back on Indo-European origins, I have just discovered the neatest thing.

Persian is odd among Indo-European languages in how low on inflection it is. In particular, it is one of the few which (like its fellow oddball English) has no grammatical gender marking. And forget middle marking, separate paradigms of endings for the past and future, etc.

Specialists tend to just assume that this is because word-final unaccented vowels tend to fall away. But this is like claiming that mammals, once they emerged, had to start eating meat. Lots of mammals eat plants -- properly, unstressed vowels MIGHT wear away. But they might not, and there was more going on in Old Persia than historical phonology. I have recently found that obscure genetic mutations might show us the light.

After all, Baltic and Slavic languages have retained an alarming amount of Indo-European inflections in all of their mumbled, word-final splendor. Germanic isn't too shabby, Icelandic being Exhibit A and German itself respectable. Really, the situation in Romance and English is rather anomalous.

But still, could Persian just be an accident? The diachronic and comparative situation suggests not.

In a nutshell, Old Persian and its close sister Avestan, what we have of "Iranian Part One" in any real way, are card-carrying early Indo-European, bristling with inflectional paradigms that barely look acquirable. Then we meet Middle Iranian and things get weird -- namely, an east-west split. In the east, languages like Sogdian look like "sons of Old Persian." Some collapses of case markers here and there, not quite as baroque overall, but still players -- "Germans," so to speak. But in the west, Middle Persian and sister Parthian are suddenly Englishes -- inflections vastly fallen away. Middle Persian is, in essence, today's Persian.

But meanwhile, not a single other Indo-Iranian language -- western or eastern -- is as naked as Persian. None are "Lithuanians" by any means, but all have some case marking left, many retain gender marking, and all have some ergativity in the past, while Persian alone lacks it.

What happened to Persian? Arabic was not the culprit, despite Persian's heavy Arabic lexical overlay. Islam came to Iran long after the lurch from Old Persian to Middle Persian, and heavy lexical borrowing does not entail structural reduction and usually occurs without it (witness Australia).

Lately I have been evaluating a hypothesis that things like this only happen to a grammar because of a spate of what Östen Dahl calls "suboptimal transmission" -- heavy non-native acquisition streamlining a language. Old Persian dropped a stitch under the reign of Darius I, and thus I have wondered whether heavy immigration at that time impacted Persian.

To suppose that Persian got a close shave in being imposed upon peoples across the Persian Empire, à la Latin in Europe, is a dead end -- the Persians did not impose the language abroad, and allowed Aramaic to be the lingua franca. So we must look to what was going on right in Persia itself.

But the historical record this far back only gives us flickers, like surviving fragments of a silent film. Indeed, Darius brought foreigners from all over to Persia to build and work. But that's all we know. The Greeks and Romans to whom we owe most of our information didn't know from "multiculturalism" and have little to say about "networks" and "ethnicity." All they really cared about was battles, court rituals and contempt, and otherwise we have little to work with but some clunky inscriptions from the king and stuff on coins.

Yet something happened to Persian. If it didn't, then there would be at least one other Indo-Iranian language as stripped down. So I started wondering: could genetic evidence perhaps shed some light on what was different in western Persia? One must leave no stone unturned, after all.

Combing the sources by geneticists tracing human migrations through reference to genetic mutations, damned if I haven't found that there is indeed a glitch in the data regarding none other than Western Iran!

A nice link can be found in a 27-author paper on Y-chromosome diversity in the Eurasian heartland by Walter Bodmer and colleagues ( available in PDF here). The mutation known as M17 is found in Europe, pours through the Caucasus bypassing Turkey and the Middle East and spouts down into India -- a nice reflection of how Indo-European would have spread (pace Renfrew). But get this -- the mutation is common in east Iran, but strangely low in west Iran!

But why couldn't people have migrated through there? Okay, a desert separates the west from the east, but coming from the north of the desert one could have taken either route. The scientists know this, and only venture that maybe west Iran harbored unusual population density (why?) or already harbored an Indo-European language (but how could this have been before M17 carriers got there?). In other words, they don't know -- and have no reason to care within their bailiwick; to them it's a mere blip in the data.

But I can't help noticing that most of the people history tells us were brought to Persia under Darius were from Turkey and the Middle East, where M17 has no juice. Could it be that west Iran is so low on M17 because of heavy admixture from migrants from these places?

I'm not done investigating this. But what a cool correlation between language and genetics this could be -- if the absence of grammatical gender and ergativity in Persian could tip us off to population mixtures that explain the odd sparseness of M17 in western Iran.

I just had to share.

Posted by John McWhorter at 06:51 PM

schild en vriend

Another example of the military use of a shiboleth cited by Mark Liberman took place 700 years ago in what is now Belgium. On July 11, 1302 the resistance of the people of Flanders to the attempted annexation of their country by the King of France, known as the Flanders War of Liberation, came to a head in the Battle of the Golden Spurs, fought outside the city of Courtrai. The anniversary of this victory is now the national day of the Flemish community. The battle is of significance in military history as the first recorded occasion on which an army of footsoldiers defeated professional cavalry. At this time, those suspected of being Walloons or Frenchmen who could speak Dutch were asked to say schild en vriend "shield and friend", an expression regarded as particularly difficult for those who were not native speakers. Those who did not pronounce it correctly were determined to be the enemy and killed.

[Update 2004/03/01: Alex Baumans writes from Flanders that this story, which I learned from my Belgian father, is a myth. There probably was some sort of password like this, but it couldn't have been schild en vriend. One reason is that schild didn't acquire the fricative [x] that makes it difficult for non-natives to say until much later and indeed still hasn't in some Flemish dialects. Another is that the enemy consisted not only of the French but of native speakers of Flemish who supported the French crown.]

Posted by Bill Poser at 10:50 AM

HMMs at the fords of Ephraim

Anoop Sarkar has an interesting weblog Special Circumstances, which mingles (computational) linguistics with reviews of science fiction, movies and other notes. In the "other" category, he noted last week that

Australian science-fiction author, Greg Egan, has taken time off from his fiction writing to investigate the procedure of immigration detention in his country.

His essay on the topic is called The Razor Wire Looking Glass.

One sentence in this essay was particularly intruiging:

There are institutionalised flaws in the system, such as the language tests routinely used for validating people's nationality that have been discredited by professional linguists.

I wonder what kind of language test can prove that one is from a particular country. Kafka (if he used speech reco) might imagine the following scenario. Perhaps they ask people to talk into ViaVoice and measure the word error rate: "Edit distance of 24? You must be from Bhutan.''

This is not entirely a joke, as Anoop doubtless knows, though probability with respect to a model is a more likely measure than edit distance -- here's a link to some relevant research. Of course, no responsible person would suggest assigning any legal weight to the results of such automated diagnosis. And I imagine that the Australian test cited by Egan is some paper-and-pencil thing, anyhow, though he doesn't give any details at all.

A more traditional (and fatal) example of language as gatekeeper is given in Judges 12:


Jephthah then called together the men of Gilead and fought against Ephraim. The Gileadites struck them down because the Ephraimites had said, "You Gileadites are renegades from Ephraim and Manasseh."


The Gileadites captured the fords of the Jordan leading to Ephraim, and whenever a survivor of Ephraim said, "Let me cross over," the men of Gilead asked him, "Are you an Ephraimite?" If he replied, "No,"


they said, "All right, say `Shibboleth.'" If he said, "Sibboleth," because he could not pronounce the word correctly, they seized him and killed him at the fords of the Jordan. Forty-two thousand Ephraimites were killed at that time.

A 20th-century parallel to the Biblical shibboleth story took place in the Dominican Republic in 1937, when tens of thousands of Haitians were massacred on the basis of whether or not they could roll the /r/ in the Spanish word for "parsley" (the cited page is some on-line background reading for an introductory linguistics couse).

[Update: the author of cannylinguist emails cannily that

I suppose it doesn't detract from the injustice, but the /r/ in "perejil" is flapped, not rolled.

I'm no expert in the phonetics of Spanish dialects, but I guess that's right -- at least it's consistent with what I've heard and seen in other varieties of Spanish, where (as I recall) word-initial /r/ is trilled, and word-medial /r/ is written "rr" when trilled (as in perro), but is written "r" when tapped (as in pero or, I suppose, perejil).

I just copied the story uncritically from Wucker's account and from Dove's poem, and of course neither of them is trained in phonetic vocabulary or its application to speech. I'll check among folks who know something about Domenican Spanish and Haitian Kreyol, and post an update if I learn anything new about what the difference in /r/ pronunciation in perejil would have been.]

Posted by Mark Liberman at 07:51 AM

February 04, 2004

Attributional abduction strikes again

In this post, Language Hat mildly rebukes linguist David Harrison for the recent press coverage of David's work on Chulym, saying that David "claims to have discovered a new language in Siberia," and that "if it's Turkic, it's not some amazing new language with a unique worldview".

I asked David about this, and he responded:

Thanks for this, I will respond to the author.

In his 'blog', he has misquoted both me and the press release: The words "new" and "discover" do not appear anywhere in the press release, nor would I ever make such a claim. Unfortunately several of the news reporters who interviewed me (or their editors) could not resist putting in those particular words in the headline, or even attributing them to me in the body of the article. It's unfortunate, but the alternative of not talking to the press at all is also not so great.

Middle Chulym (native name = ös) is definitely Turkic, and most Turkic languages are fairly closely related. It was previously wrongly lumped together (both in Russian bureaucracy and in Soviet era ethnography) with Shor, and later with Xakas, two neighboring but quite distinct Turkic languages. The Middle Chulym were even dropped from the census as a distinct ethnic group for over 40 years. They recently regained their ethnic status and registered as a 'tribe' with 426 members (35 people still speak the language fluently).

The Middle Chulym language is unique and distinct enough from Lower Chulym (the next closest language) enough to warrant its own Ethnologue entry. I will be communicating with the SIL folks shortly to make the case for this and to send them exact statistics on the number of speakers and the advanced moribund state of the language. I'm also going to publish the first book ever in the language later this year, a collection of hunting stories told by Middle Chulym elders and illustrated by their grandchildren.

Thanks again for noticing this. The full press release (and links to stories) may be found at http://www.swarthmore.edu/news/releases/04/harrison.html/

In further correspondence, David observed that

I've really been struggling with the press on this issue. I've been so careful in what I say, and yet they consistently produce headlines like "linguist discovers new language".

Sometimes I get the reporter to agree in advance of an interview not to use such words, and then the editor comes along and tacks on a headline, which the reporter has no control over.

It's discouraging, but all in all I think it's more important to get the word out there about language endangerment.

I agree entirely with David's belief that "the alternative of not talking with the press is also not so great", and that "all in all it's more important to get the word out there." When our colleagues take their misrepresentational lumps for getting linguistic research out in the popular press, we all need to apply a bit of charity, and withhold judgment for a while about the responsibility for implausible or exaggerated headlines, paraphrases and even quotations.

In earlier posts I've referred to this as the problem of "attributional abduction" -- reasoning to the most likely explanation for some piece of reportage that doesn't make sense. As I've written before, we can't tell: was the journalist or news release writer misled by the source? did the journalist misremember, misunderstand or invent something independently? was the piece subverted by an editor, accidentally in the course of hasty re-writing, or on purpose due to conceptual confusion or some independent agenda? Some earlier remarks on cases of this kind are here, here, here, here and here. And as I've also written before about such cases, in my experience it's a good rule of thumb to blame the journalist -- or the journalistic process, including the editor(s) and the headline writer -- before blaming the scientist. Though Lord knows, scientists are not always blameless.

Posted by Mark Liberman at 11:02 PM

Postrel and Pinker push Hayek -- a bit too far

In article entitled "Friedrich the Great" (in the Boston Globe on 1/11/2004), Virginia Postrel writes:

Hayek's 1952 book, "The Sensory Order," often considered his most difficult work, foreshadowed theories of cognitive science developed decades later. "Hayek posited spontaneous order in the brain arising out of distributed networks of simple units (neurons) exchanging local signals," says Harvard psychologist Steven Pinker. "Hayek was way ahead of his time in pushing this idea. It became popular in cognitive science, beginning in the mid-1980s, under the names `connectionism' and `parallel distributed processing.' Remarkably, Hayek is never cited."

This paragraph makes one assertion which is wildly exaggerated (that "Hayek is never cited"), and an implication that is false (that Hayek's 1952 book suggested the basic idea of connectionism 30 years ahead of anyone else).

Asking google about "hayek cognitive science" turns up this page as the highest ranked result, which provides dozens of citations for books and articles that (are said to) discuss Hayek's contributions to cognitive science, including works by Edelman, Minsky, and Rosenblatt. Looking around a bit on the hayekcenter.org website, or asking google about "quote hayek neuroscience" or "quote hayek cognitive science", turns up this page, which includes many actual quotes, for instance:

"[Hayek] made a quite fruitful suggestion, made contemporaneously by the psychologist Donald Hebb, that whatever kind of encounter the sensory system has with the world, a corresponding event between a particular cell in the brain and some other cell carrying the information from the outside word must result in reinforcement of the connection between those cells. These day, this is known as a Hebbian synapse, but von Hayek quite independently came upon the idea. I think the essence of his analysis still remains with us . . ". (Gerald Edelman, Neural Darwinism, 1987, p. 25).

"Most theoretical work since the proposals of Hebb (1949) and Hayek (1952) has relied upon particular forms of dependent synaptic rules in which either pre- or postsynaptic change is contingent upon closely occurring events in both neurons taking part in the synapse." (Gerald Edelman, Neural Darwinism, 1987, p. 181).

"The first proponent of cortical memory networks on a major scale was neither a neuroscientist nor a computer scientist but .. a Viennese economist: Friedrich von Hayek (1899-1992). A man of exceptionally broad knowledge and profound insight into the operation of complex systems, Hayek applied such insight with remarkable succes to economics (Nobel Prize, 1974), sociology, political science, jurisprudence, evolutionary theory, psychology, and brain science (Hayek, 1952)." (Joaquin Fuster, Memory in the Cerebral Cortex: An Empirical Approach to Neural Networks in the Human and Nonhuman Primate. Cambridge: MIT Press, 1995, p. 87)

These quotes simultaneously suggest that Hayek's 1952 work has not been completely ignored by neuroscientists and cognitive scientists until today, and also that it had many of the same themes as the (much better known) work of Donald Hebb, most famously published in 1949.

So what's the truth of the matter? Well, it seems to be that Hayek's important 1952 work is not nearly as widely known or as widely cited as it should be -- for instance, Pinker's own widely-read popular books on cognitive science, like his 1997 How the Mind Works, may fail to cite Hayek. (I'm traveling and don't have a copy at hand, so I can't be sure, but I don't remember any discussion of Hayek there. [Mark is absolutely right: there is no reference in the bibliography to anything by Hayek and no appearance of his name in the index. --Geoff Pullum, 02/04/04, 2:10pm EST.]) And it's clear that Hayek came independently to some very important basic ideas about emergent organization in collections of neurons, shortly after psychologist Donald Hebb and others came up with similar notions.

Summing it up: Hayek deserves plenty more cogsci kudos than Pinker gave him in 1997, but not as much as Postrel quotes Pinker as giving him in 2004.

This is another example of attributional abduction -- should we really hold Steven Pinker responsible for the irresponsibly misleading content of the quote as deployed in the paragraph cited above? or was it Virginia Postrel's fault? or perhaps an emergent property of their conversation? an artefact of some subsequent editing process?

By the way, Donald Hebb also deserves to be better remembered, even though "Hebb's rule" and "Hebbian synapses" are commonplace terms. So here is an interesting "personal recollection" by Stevan Harnad.

[Postrel link via Mark Seidenberg]

[Update: Cosma Shalizi emailed to say

I think the back-story to the "Hayek invented connectionism" meme is that he claimed to have come up with the basic idea in the 1920s, after reading Ernest Mach, but didn't publish (at least in English?) until _The Sensory Order_; this at least is the story he tells in the preface to that book. My memory of this is a bit hazy, because my copy was destroyed by the post office; I'll have to track this down in the library.

Also, Mark Seidenberg emailed the comment that "Hayek gets no citations in MITECS (there is one for Jurgen Habermas, however) or in Talking Nets, the very interesting collection of oral histories of the founders of modern neural network research." ]

Posted by Mark Liberman at 07:04 AM


As Sally Thomason mentioned in her post on William and Mary Morris, an important theme of some language pundits as well as other people who like to criticize other people's language and spelling is that deviation from the norm, as they perceive it, is a sign of "illiteracy". Its pretty clear that they don't just mean that these people can't read; they mean this as a negative judgement of their value as human beings. Often they aren't talking about the ability to read at all; "illiterate" is simply shorthand for "the wrong kind of person": poor, rural, uneducated, Black, Jewish, Asian, or whatever. Sometimes they really are referring to literacy, or at least to limited education. Whether or not illiteracy is used as a euphemism for something else or is really what is meant, the idea that literacy is a measure of a person's value is mistaken and offensive.

The first illiterate person I met, or at least, knew that I had met, was in China. I didn't realize that she was illiterate until one day we were talking and, my Mandarin being poor, I encountered something I didn't know how to say. Since I knew how to write it, as is customary in countries where Chinese characters are used, I began to draw the character with my finger. I was taken aback when she cut me off, saying "I can't read." Her inability to read had nothing to do with any failing on her part. It wasn't because she was lazy or unwilling to learn. She was a peasant, and female, and as such would not in the best of circumstances have had much access to education. Her childhood was a time of constant warfare: depradations by warlords and bandits, civil war, and the invasion by Japan. By the time the fighting was over, she was grown and had to go to work. She couldn't read because she had never had the chance to learn.

Some of the other illiterate people I know are Indians in British Columbia. They can't read because the only chance they had to learn was in boarding schools run by the various churches on behalf of the Government of Canada. These schools were designed for the express purpose of destroying their culture and making them into second class white people. The children were forbidden to speak their own language and taught that their culture was backward and evil. The schools were deliberately built apart from the native communities so as to minimize contact. The children's families were allowed to visit only on Sundays and only briefly. The children were poorly fed. The schools were physically brutal, and in many of them, there was extensive sexual abuse. The people I know who are illiterate were hidden by their families so that they wouldn't be taken to school. Their families didn't want to be separated from them for months at a time, didn't want them to suffer, and didn't want them to lose their culture and fail to learn their traditional livelihood.

Throughout the world, the overwhelming majority of illiterate people are illiterate because they haven't had the opportunity to learn to read. They haven't had that opportunity because they are poor or female or the wrong ethnic group. People who talk about illiteracy as if it were a moral failing or character flaw demonstrate ignorance and arrogance, not sophistication.

Posted by Bill Poser at 01:58 AM

February 03, 2004

White House or white house?

Does anyone but me remember William & Mary Morris (later just William Morris), who wrote a syndicated newspaper column on language for many years? I just came across an old clipping, while I was looking for something else (of course), that reminded me why they were my favorite source of misguided punditry on Standard English.

A reader had written to ask why sports announcers `continually refer to "time outs remaining", when it ought to be "times out remaining": `the term "time out" is composed of two words. It is NOT a compound word,' wrote N.H. of Wauwatosa, Wisc., indignantly.

William Morris (in his column of 10/24/87) answered, `Sorry, but the preferred spelling of this expression is now as a single word, "timeout". So "three timeouts remaining" is perfectly correct.' (My 1983 edition of Webster's Collegiate Dictionary, 9th edition, has the spelling as time-out, but never mind.)

So in this pundit's view, the spelling determines compound status and thus the plural. But that can't be right. If I pronounce time out with stress on both syllables, then it's a two-word phrase regardless of how I spell it; and if I pronounce it with stress on just the first syllable, then it's a single compound word no matter how I spell it. It's like the White House in Washington vs. the white house on the corner: both are written as if they were two-word phrases, but the president's residence is a compound while the lower-case white house is a phrase.

Now if only I could find the Morris column that pronounced authoritatively that would've is really would of and therefore a sign of illiteracy, or at least of terminal nonstandardness --- a classic example of `eye dialect' being used to sneer at people who actually talk just like Morris.

Posted by Sally Thomason at 08:24 PM

Fight fiercely, Democrats

The right-column headline of the San Francisco Chronicle on Sunday bore the headline

Fierce battle to stop Kerry

I suppose only someone as politically naive as, say, a grammarian (one such as I) would even imagine that the Democratic presidential hopefuls might find ways of behaving that would get the headline writers clutching for phrases like "Constructive discussion homes in on choice of Kerry". But it is not to be.

The Republicans' plan for November: put up an anointed leader, unchallenged in any primary, and win. The Democrats' plan: now that Dean has been humiliated (guilty of a linguistic crime: hoarse inspirational shouting and hollering while not being an African American), divide time between two key tasks: (1) working out how to discredit Kerry, and (2) planning the destruction of nice-guy Edwards should he manage to surge ahead after South Carolina ("Bitter struggle to tear Edwards to pieces before Michigan", the headline will read). A wonderful bon mot sprang to mind as I looked at the headline, one that I have seen attributed to Will Rogers. "I belong to no organized political party", he is reputed to have said. "I'm a Democrat."

Posted by Geoffrey K. Pullum at 03:27 PM

Attributional abduction again

Cosma Shalizi, always interesting, discusses a fascinating PNAS paper on emergent distributed computation in plants. (If you're a sociolinguist, or can play that role, you might read Cosma's summary and ask yourself whether there are "domains" and "particles" in patterns of linguistic variation and change, and if so, what computations they might be performing...) At the end of his piece, Cosma says

For another take, see this news piece in Nature ..., in which the usually-reliable Philip Ball (or his copy editor?) manages (1) to confuse the emergent particles with the basic cells of a CA, and (2) to say that Wolfram was the first to show cellular automata can "mimic computers", something established by John von Neumann and Stanislaw Ulam before Wolfram was born.

This is an all-too-familiar example of the problem of attributional abduction. A piece of traditional journalism says something very implausible, obviously wrong, or completely nonsensical. Is it there because the journalist was misled by a source? because the journalist misunderstood or misremembered something independently? or because the piece was subverted by an editor, accidentally in the course of hasty re-writing, or on purpose due to conceptual confusion or some independent agenda? In this case, Cosma has enough experience with the particular journalist to suspect an editor -- though the "copy editor" is probably not the most likely culprit in the editorial chain at Nature, whatever sins copy editors may sometimes be guilty of.

This case is also a good example of how useful weblogs as a form can sometimes be, in helping those outside a narrow subdiscipline sort out new scientific research. This is simultaneously despite and because of the lack of an editorial process. I certainly don't trust Cosma to get everything right all the time -- he'd be the first to tell me not to -- but I put a lot of credence in what he has to say on a wide variety of topics where experience has taught me that he knows a lot and has an interesting perspective. In this case, his blog gives me a clearer and more reliable summary of the Peak et al. PNAS paper than I can get from Nature. (Though I should say that the Nature piece seem pretty good overall, even if it sacrifices accuracy to simplicity in the two respects that he mentions). If he gets something badly wrong, or even if someone else thinks he has, he will most likely post an update or a link to the outside discussion; or at worst, if I care enough, I'll find out about other perspectives by looking at the pages that link to him.

We've made a few analogous contributions of our own, for example Bill Poser's evaluation of the Gray and Atkinson paper on dating Indo-European, or my discussion of the Fitch and Hauser work on monkey "grammar learning." As a result, some people have started looking to our site for reactions to newly-reported results in the sciences of language, just as I look to Cosma Shalizi (among others) for help in understanding new work in what one might call "natural distributed computation."

There's certainly a lot of garbage out there on the web. But a surprising amount of it is in the digital pages of reputable publications. And the low barriers to informal publication and re-publication, combined with (up to now) trustworthy information about authorship of such material, and the still-emerging mechanisms for establishing and navigating cross-links, combine to produce (the beginnings of) a dynamic, distributed information source that can be more reliable than the major outlets of science journalism are.

Posted by Mark Liberman at 07:20 AM

February 02, 2004

Talking birds vs. singing birds

Among the echoes of the BBC's telepathic talking parrot story, there has been a certain amount of renewed debate about animal communication in general, for instance in the comments to this post on Language Hat's site. I'll have a few things to say on this topic here later on, starting with a description of some wonderful work on cephalopod morphophonemics. However, there's a general tension in the air whenever this kind of question comes up, so the first thing I want to do is to get some of the feelings out in the open, by considering an analogous case that people discuss very differently.

Forget about language for a moment. Homo sapiens may be the only genuinely musical species.

At least, no other species seems to have songs in which musical intervals (small whole-number ratios between pitches) play a role. No other species seems to create rhythmic patterns by subdivisions of a regular beat, or by repetitions of sequences of rhythmic cells composed of units with small-integer time ratios.

There might even be some related evolutionary specialization of the human auditory system -- at least, I've read that the just noticeable difference in pitch perception is about an order of magnitude smaller for humans than for other mammals (though a recent attempt to find the citation came up empty). There may also be evolutionary specializations of the motor system, for example to allow stronger voluntary control of the vocal apparatus. (Darwin thought that sexual selection for performance of courtship duets was a crucial step in the evolution of human language).

Then again, maybe this is going too far. What about the songs of whales or songbirds? Don't they have real melodies? Don't apes sometimes beat out rhythms?

Frankly, I don't think so, but the point is, no one seems to get worked up about this question. Books and articles are not written about whether the songs of whales or sparrows are Truly Song, or whether chimps' tree-root-pounding displays are Truly Rhythm. There is plenty of study, scientific and otherwise, of the vocal displays of whales and songbirds, and some work on the tree-pounding of chimps. However, the interest is in what these animal acoustic displays are like -- what they sound like, how they are made, how they are learned, what they are used for -- and not whether they are basically the same as, or basically different from, human music.

By comparison, there is enormous and continuing controversy about whether various sorts of animal behavior, both natural and taught by human trainers, are Truly Language.


For some reason, people find it more interesting -- either more attractive or more threatening -- to think of animals as perhaps having command of language, than to think of them as having command of music. The debate often seems to take on a quasi-religious tone, as if the issue were whether animals have (or can acquire) souls rather than whether they have (or can acquire) languages.

As with many such controversies, it's hard to stay out of the fight. However, I'd like to suggest that strong quasi-religious convictions on this subject are a bad idea, in both directions, because they make it hard investigate the phenomena as they are. Many (though not all) researchers and commentators come into the arena to look for evidence to support their beliefs, rather than to investigate the facts.

That doesn't mean, by the way, that I disagree with anything that Geoff Pullum wrote here. I do think, though, that it's sometimes helpful to view animal behavior from the perspective of what one might call "the communicative stance", by analogy to Daniel Dennett's "intentional stance". This amounts to treating an animal as if it has (or recognizes and responds to) communicative intentions, irrespective of whether it actually does. I recognize that this is dangerous at best, and expect to get a certain amount of flack for it. But that's for later.

[Note: the above is a slightly expanded version of the start of a lecture that I wrote a few years ago for an undergraduate course].

[Update: Bill Poser emailed this:

My impression is that one major aspect of the division between linguists and others on animal language is that there were different ideas in different fields as to what it means to have language and also different expectations. As of, say, 1950, biologists and psychologists generally thought of "language" as the use of symbols and thought that only humans could use symbols. When they discovered that other primates could use symbols, this was a big deal, and for them, it constituted having language. Linguists, on the other hand, seem not to have had any strong beliefs about only humans being able to use symbols, so the discovery that other primates did wasn't really news to them. Furthermore, for linguists language consists of much more than symbol use, so in the absence of non-trivial syntax, they weren't interested. Although there are other issues, such as the problems of data collection and interpretation with Penny Patterson's work with Koko, it seems to me that this has had long term effects. Linguists tend to dismiss anything short of full human-like language as uninteresting, while non-linguists often don't quite understand what the difference is, and even if they do, feel miffed by the linguists' reaction.

I agree, basically, though I think there is still a lot of disagreement (and confusion) about what it means to "use" a symbol, specifically in the "theory of mind" aspects. However, basically the same thing could be said about the many differences between musicologists and others in understanding what "music" is, and it doesn't lead to the same sort of dynamic between experts and outsiders about (say) whether birds "sing" or not.]

Posted by Mark Liberman at 02:36 PM

If, then, and

[Note: Due to blogging with half my mind while doing other things at the same time, my original post on this topic contained an elementary logical error. I hope I've now fixed it -- the conclusions and examples haven't changed, just a piece of the discussion. I've also added another post with some pointers to background information that may make the discussion more accessible to outsiders. As usual, we stand behind our our offer to refund your subscription fees in full in case of less than full satisfaction.]

Allan Hazlett has further thoughts about odd ifs, following up on my post here. The money quote from Hazlett: "If we can say the same thing with a conjunction of the two clauses, it's not a conditional, but a masked conjunction ..." He's talking about examples like

(1) If you're a good citizen, then I'm Donald Duck.
(2) You're a good citizen? And I'm Donald Duck.
(3) Yeah, you're a good citizen. And I'm Donald Duck.

I don't think that this is right. For one thing, the argument runs at least as well in the other direction. The interpretation of and in (2)-(3) requires some extra pragmatic work, while the interpretation of if in (1) is rather closer to its putative logical meaning. So we might with at least equal justice say "if we can say the same thing with a conditional expression, then it's not a conjunction, it's a masked conditional..." We really have to look at what and how the particular uses of if and and (and then) mean. My own (provisional) opinion is that each of these examples is itself, not a masked version of one of the others, even if the interpretations sometimes overlap in meaning.

Another problem is that the if in (1) is probably not the same case as the if in the original NYT quote, which (edited for brevity and clarity) was a sentence like

(4) But if this dramatic move was necessary, it was (nevertheless) risky.

Both (1) and (4) are certainly non-vanilla-flavored uses of if, but the extra flavor seems to be different in the two cases. We can make a Gricean argument for the extra flavor being concocted on the spot out of thin pragmatic air, so to speak, but the recipes must at least be different.

(4) is an example of the OED's sense 4.a. for if, glossed "Even if, even though; though; granted that". We can render (4) as "but even though this was necessary, it was nevertheless risky", with roughly the same meaning as the original. (In my earlier post, I sketched a neo-Gricean story about how this might develop). But we can't render (the intended sense of) (1) as "even though you're a good citizen, I'm nevertheless Donald Duck." And turning it around, we (or at least I) can't use then in (4), in the original NYT example, or in the OED's examples for sense 4.a., such as

If Mozart was a life-long admirer of J. C. Bach, (*then) his views on Clementi were disparaging, to put it mildly.

Nor does and have the same force as the "concessive" or "excessive" uses of if:

 If Mozart was a life-long admirer of J.C. Bach, his views on Clementi were disparaging, to put it mildly.
≠Mozart was a life-long admirer of J.C. Bach, and his views on Clementi were disparaging, to put it mildly.

 Virtual colon dissection is promising, if flawed.
≠Virtual colon dissection is promising, and flawed.

 Dr. Lee's behavior was curious, if not criminal.
≠Dr. Lee's behavior was curious, and not criminal.

Hazlett's (2)-(3) are examples of the rhetorical device most famously illustrated by Dorothy Parker's little verse

Oh, life is a glorious cycle of song,
A medley of extemporanea;
And love is a thing that can never go wrong;
And I am Marie of Roumania.

Such examples seem to be ordinary coordinations, interpreted with respect to a Gricean version of the (logically fallacious) pragmatic maxim falsum in uno, falsum in omnibus ("(if) false in one thing, (then) false in all things". That is, the speaker asserts "A and B", but B is obviously false, so the speaker must mean to cast doubt on A as well.

What about Hazlett's case (1)? The speaker says "if A (then) B". B is obviously false. The truth table for the (standard, "material implication" logical version of) "if A then B" is


so if the whole sentence is true and B is false, then A must also be false. As usual, there is more to be said about what the speaker communicates by choosing this (roundabout) form of expression over other simpler and more direct ones, and also about what rhetorical niches it's adapted to. For example, intuition suggests that one common context is for K to assert "A", and L to respond with "if A then (nonsensical) B", implying that K has not thought enough about the elementary implications of A. (Alas, there is no easy way to query internet search engines for rhetorical patterns).

A more general comment... Similar flavored interpretations of if and and were the occasion of Grice's original work on how speaker's meanings can be created out of sentence meanings. This work has been deservedly influential, and among all its intellectual progeny there may well be a whole body of writings on the cases we've been discussing. As a simple phonetician, I don't keep up with such things very systematically. But it seems to me that it might be worthwhile to revisit the original problem of explaining the diverse uses of if and and and or.

Among the things I (somewhat naively) wonder about: To what extent is relevance logic relevant? Are these uses always just the same in different languages, as one would expect on the Gricean account? Do common cases become conventionalized as new senses for the words involved? If so, do these senses then gather additional (non-Gricean) moss? Are such cases subject to priming effects, either in terms of speed of interpretation or in terms of frequency of use? Does conventionalization make a difference in this respect?

And what about then, anyhow?

I look forward to learning about this from various friends and colleagues who actually know something about it.

Posted by Mark Liberman at 12:33 PM

A note to the reader

There's a bit of linguistic "inside baseball" in this post and this one, and some of those that they reference. For those of you who think that the issues seem interesting, but are puzzled by some of the discussion, here are a few pointers.

The general background includes the logical interpretation of words like "if", "and" and "or" in terms of truth tables -- e.g. "A and B" is true if both A and B are true, and false otherwise -- and the various apparently non-logical intepretations of such connectives, such as the interpretation of "and" in terms of time sequence ("(first) A and (then) B") or causal consequence ("you touch me again and you're toast"). In an influential essay entitled "Logic and Conversation", the philosopher H.P. Grice proposed a general theory about the relationship between meaning as something sentences have ("sentence meaning") and meaning as something that people do ("speaker meaning"), and he used some examples of extended senses of such logical words as cases in point. He argued that the observed uses could be explained as inferences from the context, the logical meanings of the words involved, and some very general considerations about the nature of communication, in a process he called "conversational implicature."

The question at issue in the discussion between Allan Hazlett and me (and indirectly Geoff Pullum) is what to make of a particular use of "if" in the New York Times. Allan (and indirectly Geoff) suggested that it was really a sort of stealth version of "and". I connected it to a sense for "if" recorded in the OED, as well as some other classes of examples, and suggested that maybe all of them could be analyzed in the style of Grice. Or maybe I should say, "waved my hand in the direction of a suggestion that ..." Aside from any interest that these particular examples may have, there are thus some large general issues looming vaguely in the background.

Posted by Mark Liberman at 11:53 AM

February 01, 2004

Non-English Super Bowl broadcasts

China Daily writes about non-English Super Bowl broadcasts:

Sunday's game will be beamed to a potential audience estimated by the NFL at 1 billion in 229 countries and regions. It will be broadcast in 21 languages, including Arabic, Mandarin, Icelandic, Russian, Serbian and Thai.

Fourteen television and radio stations from 10 countries will broadcast the game on-site including, for the first time, a crew from China. Philadelphia Eagles tight end Chad Lewis, who speaks fluent Mandarin, will be the colour analyst.

"The event is one of the greatest sporting occasions in the world," Shi Zhigang, producer of China's CCTV broadcast, said in a statement released by the NFL, "and we are looking forward to capturing the drama for our viewers."

Chad Lewis' NFL page doesn't mention his Mandarin proficiency. However, this Philadelphia Daily News article indicates that he is a "former Mormon missionary to Taiwan." According to the article:

Right now, Lewis is more than a little worried. He's honored to have been chosen as one of the 47 international broadcasters for Super Bowl XXXVIII, beaming the game to 229 countries in 21 languages, but Lewis has never before done color commentary, and now he's being asked to do it in a language that is not his native tongue. There is fluent, and then there is fluent. Being able to hail a cab is a little different from going on TV and pithily explaining how the receiver found a seam in the Cover 2.

"I've never even broadcast a game in English, not even a junior-high game," Lewis said. "So to broadcast the Super Bowl in Mandarin Chinese is really something. I'm studying as much vocabulary as I can, listening to tapes, trying to get the accent right. We have a production meeting all day Friday, and I plan to go over everything really thoroughly."

We can categorically deny rumors that N'kisi the parrot will be providing Super Bowl color commentary for the BBC, who were reportedly persuaded by Geoff Pullum's arguments that parrots' inability to voice opinions would be be a problem. The BBC then tried to hire N'kisi for the play-by-play, on the grounds that 950 words and the ability to name newly-presented objects and events should be enough, especially in the U.K. market, where N'iki's demo tested well ("it's-a-blitz! he-sets-he-throws! can-I-give-you-a-hug? boom! aawk!"). However, it turns out that Fox has his rights wrapped up anyhow. Geoff's offer to do a BBC World Service Super Bowl broadcast in Parrot is still under consideration as we go to press.

Posted by Mark Liberman at 03:05 PM

Stupid Dead People Communication Tricks

Geoff Pullum's recent discussion of stupid fake pet communication tricks reminded Bill Poser of research, I mean "research", on "languages" of "reincarnation" (hereafter I'll dispense with the shudder quotes; just imagine them). Bill thought that languagelog ought to take note of this research, which has also, now and then but as far as I know not at the moment, been reported with insufficient skepticism by the popular press. He passed the buck to me because I'm probably the only linguist who has ever published an article called "Do you remember your previous life's language in your present incarnation?". The idea is that you get hypnotized and age-regressed not just to childhood but beyond, back to one or several earlier lives, and then you provide evidence of the super-successful age regression by speaking the languages of those earlier lives, languages that you have had no opportunity to learn in your nonhypnotic life. This is xenoglossy, which is defined by its most prominent proponent, Professor Ian Stevenson, as "speaking a real language entirely unknown [to the speaker] in his ordinary state" ("Xenoglossy: A review and report of a case", in the Proceedings of the American Society for Psychical Research, 1974, p. 1). Stevenson holds (or held?) a chair in Psychiatry at the University of Virginia, which helps make him a voice to be heeded.

The BBC might fall for Stevenson's claims and "evidence" (oops, sorry, there I go again), and at least one philosopher finds them appealing, but no linguist is likely to. Stevenson seems quite sincere -- he goes to great lengths, for instance, to rule out the possibility that his subjects were lying when they said they hadn't ever studied their earlier-lives' languages in their current lifetimes -- but he doesn't know much of anything about language. He succeeds in showing that his two main American subjects had no systematic exposure to German or Swedish in their unhypnotized state; where he fails is in his efforts to demonstrate that they speak the languages at all. He emphasizes what he calls "responsive xenoglossy", an ability to carry on a conversation.

To test his subjects' conversational abilities, he sets up sessions in which they answer questions posed to them by native speakers of their purported earlier lives' languages. As experiments, these sessions are eyebrow-raising. Not only are the native-speaker testers believers in reincarnation, but they often repeat their questions in English when the subject doesn't respond immediately to the foreign-language question. Many of the questions are yes/no questions, and of course there's no way to know whether the answers are correct, since the questions are about the subjects' own earlier lives and nobody else can be presumed to know any details about those lives. An example: "Hast Du eine Puppe?" "Ja." ["Do you have a doll?" "Yes."] (Stevenson does not dispute the claim that the subjects had minimal present-life exposure to German and Swedish, enough to have learned a few words.) With content questions, the subjects often give bizarre answers, like "my wife" in response to a question about what the subject would pay for an article at the market.

Stevenson surmises that the previously German subject's German conversational abilities suffered from the linguistic deficits to be expected from an illegitimate, illiterate servant girl , as if education, legitimacy, and high social position are requisites for fluency in any language. All this is entertaining (except when the press takes it seriously), but it raises a few questions of potential linguistic interest. For instance: just how much can one understand of a language one doesn't know, after minimal exposure? More than you might think, it turns out. And there are real-world consequences of successful guesswork about what someone just said to you in a language you don't know -- especially in the horror stories about American judges who, in order to determine whether a defendant requires an interpreter, bark questions like "What is your name?" and "Do you understand me?" Said judges have been known to conclude from appropriate answers to such obviously guessable questions that the unfortunate defendant understands spoken English just fine and therefore needs no interpreter. So although Stevenson's brand of linguistic ignorance may just be amusing, it's a common brand -- and it can kill, literally.

Posted by Sally Thomason at 09:03 AM


Mark Liberman and Geoff Pullum's recent posts on talking parrots draw attention to the gullibility and ignorance of some journalists in matters linguistic. Sometimes it seems like there is no end to the linguistic claptrap that we must endure. Another example is the book 1421: The Year China Discovered America by Gavin Menzies.

The thesis of 1421 is that in the years 1421-1423 a Chinese fleet commanded by admiral Zheng He circumnavigated the globe, along the way visiting the Americas and Australia. That this expedition took place is a matter of record, well known to historians. You can read about it in Louise Levathes' book When China Ruled the Seas: The Treasure Fleet of the Dragon Throne 1405-1433. It is undisputed that the Chinese reached as far as East Africa. What is new and controversial is whether they reached West Africa, the Americas, Antarctica, and Australia. Either way, it would make a great movie.

Reviews have been mixed. The New York Times was critical, as were The Asian Review of Books and Publisher's Weekly, but other publications, such as the Salt Lake Tribune, Science News, The Christian Science Monitor, and The Asian Reporter have been positive. Nonetheless, 1421 has been a major commercial success. Published in January, 2003 in hardcover by HarperCollins, a major publisher, it immediately reached the New York Times best-seller list. The paperback edition is currently number 23 in the paperback non-fiction category. It has been translated into a dozen languages. Menzies spoke at the National Press Club, the Asia Society, and Stanford University. A documentary is reported to be forthcoming. According to Library Journal, nearly 50 companies bid for the television rights. The book has its own website, with an exotic Tuvalu top-level domain, which is supposed to provide detailed documentation for the claims made in the book as well as additional evidence turned up after publication.

The focus of the book is on pre-Columbian maps that allegedly show places that could not have been known to Europeans at the time. According to Menzies, a retired Royal Navy submarine commander, the information could only have come from Chinese maps. Most historians evidently don't find his argument convincing. In addition, he cites a variety of other sorts of evidence, some of it linguistic, which is what I'll comment on.

The first linguistic point raised in the book (p. 104) concerns an inscription found in the Cape Verde islands off the West coast of Africa, which Menzies attributes to Zheng He. Unable to identify the writing system, he wonders whether it is an Indian writing system and faxes a query to the Bank of India, which informs him that it is Malayalam. Unfamiliar with Malayalam, he asks where it was spoken and whether it was in use in the 15th century. According to Menzies, the Bank of India responded as follows:

Yes, it had been in common use since the ninth century. It has largely ceased to be spoken today, though it is still used in a few outlying coastal districts on the Malabar coast.

In fact, Malayalam is spoken by over 35 million people. It doesn't seem likely that the Bank of India was unaware of the principal language of Kerala State, one of the national languages specified in Schedule Eight of the Constitution of India. Maybe they were pulling Menzies' leg, or maybe he just can't get his facts straight. Whatever the problem may have been, this exemplifies his peculiar approach to research and the failure of his publisher to perform the most elementary fact checking. It's not like this is obscure information known only to specialists, available only at secret annual cabals. If you want basic information about a language, such as where it is spoken and by how many people, all you have to do is check the Ethnologue. If you don't know to do that, a Google search for "Malayalam language" produces 185,000 hits. For those without internet access, Malayalam will be found in any encyclopaedia.

Assuming that there is an inscription in Malayalam in the Cape Verde Islands, what does this tell us about Zheng He's voyage? Is there evidence that it dates to the 1420s? Whenever it was made, isn't the most likely hypothesis that an Indian made it? The content of the inscription might shed light on this, but although much is made of the writing system, we never find out what it says!

Moving on, at p. 226 we read:

Until the late nineteenth century, villagers in a mountain village of Peru spoke Chinese.

Even if this is true, this hardly demonstrates pre-Columbian contact between China and Peru. These Chinese-speakers could be the result of immigration to Peru in the nearly four-hundred years since the Spanish conquest.

Menzies continues:

There is also linguistic evidence of Chinese visits to South America. A sailing ship is chamban in Colombia, sampan in China; a raft, balsa in South America and palso in China; a log raft, jangada in Brazil, ziangada in Tamil.

We aren't told which of the 98 languages of Colombia, the 234 languages of Brazil, or the roughly 700 of South America as a whole, these words come from. In any case, isolated similarities like these are meaningless; it is easy to find a few words similar in sound and meaning in any two languages. At least two of the three examples here are wrong. You'd think that a Royal Navy man would know that a sampan is not a sailing ship; it is a small boat usually propelled by two oars. There is no Chinese word palso meaning "raft"; no Chinese syllable ends in /l/. And even if the pair of words for "log raft" are correct and their resemblance is not accidental, how would this prove contact between China and Brazil? Menzies is apparently assuming that the only way a Tamil word could get to Brazil is via Zheng He's fleet, and that it is likely that Brazilians would borrow a word for something with which they were no doubt already familiar from the tiny minority of Tamil speakers who might have accompanied the Chinese fleet.

[Update (2004/02/03): Kevin Ryan has pointed out that the Tamil form ziangada is also spurious. It is phonologically impossible since Tamil has no [z] sound and since the retroflex approximant sometimes romanized <z> (Tamil ழ) cannot appear in initial position. When I asked them about this form, Dravidianist Harold Schiffman agreed with Ryan, and Tamil scholar and native speaker Vasu Renganathan said that he knew no such word.]

Menzies gives further evidence of contact between China and the New World on p. 414:

Like the Waldseemüller chart, another map of Vancouver Island, called `colonie chinois' by its Venetian cartographer, Antonio Zatta, was published before Vancouver or Cook `discovered' the island. The Squamish Indians there have more than forty words in common with Chinese, including tsil (wet), also tsil in Chinese; chi (wood), which is chin in Chinese; and tsu (grandmother), which is etsu.

Menzies does not give the other 37 putatively similar words in Chinese and Squamish, nor does he cite sources for the Chinese and Squamish words. The fact that he is wrong about where the Squamish live (their territory is on the mainland of British Columbia, just north of the city of Vancouver, not on Vancouver Island) does not give confidence in his data. In any case, the examples that he does provide are dubious. Not one of the three words claimed to be Chinese is identifiable as Chinese.

The additional evidence to be found on the website isn't any better. Here's a doozy:

Linguistic groups - The Chinese, Basque and Navajo languages all belong to the Dene-Caucasian language group. Could this be coincidence, or could the fact that Zheng He's fleets visited all of these areas have resulted in such a linguistic distribution?

To begin with, the hypothesis that Basque, the North Caucasian languages, the Sino-Tibetan languages (which include Chinese), and the Athabaskan-Eyak-Tlingit languages (which include Navajo), form a language family is not generally accepted by historical linguists. The evidence for it is very weak. But supposing they do, what could Menzies be arguing here? Does he think that Navajo and the other AET languages are the descendants of Chinese brought to North America in the 15th century? To anyone familiar with both Chinese and Athabskan, it is extremely implausible that Chinese could have been so transformed in only a few hundred years, or could have differentiated into more than forty diverse languages ranging from the Southwestern United States to Alaska. And where do Basque and North Caucasian fit in? Does he really think that Basque and the North Caucasian languages only reached their current locations in the 1420s?

Here's one more gem from the web site:

American Indian names which are Chinese (Martin Tai)
Columbus' arrival: met Indians = Yin dian (people from Yin [China])
Pizarro: Inca = Yin ka (people who live in Yin)
Vancouver: Inuit = Yin uit (people originating in Yin)

Here again it takes some effort to work out exactly what argument he intends to make. It seems to go like this:

  • /yin/ is a Chinese word meaning "China"
  • Several native American peoples call themselves by names containing /yin/
  • These peoples would have adopted as their own name the Chinese visitors' name for themselves

To begin with, I am unable to identify /yin/ as a Chinese word meaning "Chinese". The closest I can come is /yen/ 燕, the old name for the Beijing area. But surely people from all over China identified themselves as coming from China, not Beijing. Second, Menzies offers no account of the second part of each word, the residue after removing (y)in. None of the three make sense as Chinese. Third, it seems highly unlikely that people would adopt as their own ethnonym the name of foreign visitors. Finally, there are problems with each of the individual examples:

  • The word Indian is not the term by which the people first encountered by Columbus, the Taino, called themselves. It is a term that the Spanish applied to the inhabitants of the Americas, which they initially believed to be part of Asia.

  • The Inca did not call themselves Inka. In their language, Quechua, inka means "ruler, person of royal lineage".

  • As for Inuit, this is the plural of inuk. The /k/ is an inherent part of the word. Here is an extract from the entry in the Comparative Eskimo Dictionary With Aleut Cognates by Michael Fortescue, Steven Jacobson, and Larry Kaplan published in 1994 by the Alaska Native Language Center, p. 137:
    PE [proto-Eskimoan] iŋuɣ or inuɣ `human being' ... this base, the orginal Eskimo ethnonym, is everywhere attested also in the senses `resident spirit', `core of boil' and `chick in egg'; cf. also perhaps Aleut iŋisxi-X `owner', ...

    Menzies' decomposition into /yin/ and /uit/ is incorrect.

The linguistic "evidence" in 1421 is a joke. It's sad that a major publisher obviously didn't do even the most elementary fact-checking or have the manuscript read by people competant to evaluate it, but it is worse that such nonsense has become a best-seller and is soon to be made into a documentary. What I want to know is, are the purveyors of this tripe incompetent? Or do they simply not care about the truth of their "non-fiction"?

[Update (2004/02/03): David Nash has brought two recent news items to my attention. There is a skeptical piece by Ken Ringle in the Washington Post of 12 January 2004 (p. C01). It reports that Menzies defended his work by pointing to the fact that:

...last October, Chinese President Hu Jintao told the Australian Parliament in Canberra that Ming Dynasty explorers had discovered Australia in the 1420s.
[update (2004/02/06): Courtesy of David Nash, here is the relevant passage from President Hu's address on Friday, 24 October 2003 to the Joint Meeting of the House and Senate of the Australian Parliament, as recorded in the Hansard at page 21,697, available here and here [PDF file].
Though located in different hemispheres and separated by high seas, the people of China and Australia enjoy a friendly exchange that dates back centuries. The Chinese people have all along cherished amicable feelings about the Australian people. Back in the 1420s, the expeditionary fleets of China's Ming dynasty reached Australian shores. For centuries, the Chinese sailed across vast seas and settled down in what was called 'the southern land', or today's Australia.]

A politician's endorsement doesn't carry any weight in my book. Indeed, I think that this is rather disturbing. A Chinese invasion of Australia does not seem imminent, but this exemplifies the sort of real world trouble that claims like this can cause. It's best that they be based on real evidence. It is also worth pointing out that even if Zheng He visited Australia in the 1420s, he couldn't have "discovered" it; Australia had already been inhabited for at least forty thousand years.

The Contra Costa Times of 25 January 2004 reports that Menzies will not be going ahead with a dig to find a purported Chinese junk in Glenn County, California due to the insistence of one of the three landowners involved that he receive all television revenues that may result from the dig. The article contains a skeptical statement by Chico State University archaeologist Greg White.]

Posted by Bill Poser at 01:47 AM

Concessive if: bleached or pregnant?

Allan Hazlett quotes a NYT story (by Jodi Wilgoren and Glen Justice) on the shake-up in Howard Dean's campaign staff:

But if such a dramatic move was necessary to signal understanding that something has gone awry, losing Mr. Trippi — who may be followed by several top loyal aides — is risky, since he has become a sort of cult hero to the legions of Deaniacs at the core of the movement.

and comments

"So if the move was nescessary, then losing Trippi is risky? That hardly makes sense. What makes sense is something like "even if the move was necessary, it's the case that the resultant loss is risky." This should have been a conjunction, not a conditional. I've never seen use of "if" and "then" much like this before."

(In fact Wilgoren & Justice don't use "then" -- it would have been weird if they had).

This use of if was discussed a few months ago here by Geoff Pullum, under the heading of "bleached conditionals", with reference to the archtypal snowclone "If Eskimos have dozens of words for snow, Germans have as many for bureaucracy." Geoff agreed with Allan in calling this "a special kind of conditional that does not appear to have conditional force at all; it is more like a coordination."

It's certainly true here that if has a particular sense, different from the straightforwardly conditional one that it has in "if the wind blows, the cradle will rock". The OED lists this sense of if as 4.a. "Even if, even though; though; granted that" and cites among others these examples:

1965 New Statesman 16 Apr. 598/3 If Mr Stewart is top of the Tory pops, other ministers are also high up in the charts. 1967 Listener 17 Aug. 205/1 If my father's people were mill-workers.., my mother's people were agricultural workers. 1969 Ibid. 24 Apr. 585/1 If Mozart was a life-long admirer of J. C. Bach, his views on Clementi were disparaging, to put it mildly.

Note by the way that this doesn't mean that the syntax is changed. In these examples (as in the original NYT sentence) if seems just to be introducing a subordinate clause, exactly as in more straightforwardly conditional uses like

If you cannot see the Breaking News window on the previous page, you may need to download a newer version of your web browser.
If 2004 goes bad, it will go really bad.

But it's also less clear that it might seem exactly what the difference in meaning is. The OED introduces its fourth set of senses of if as follows:

4. In pregnant senses: a. Even if, even though; though; granted that.

I was not familiar with pregnant as a linguistic term of art, but the OED has

pregnant construction, in Gram. or Rhet., a construction in which more is implied than the words express.

So Geoff and the OED are at odds. Geoff says that these conditionals are bleached, that is, they've had the conditional tint somehow washed out of them. The OED says that these conditionals are pregnant, that is, they're carrying a little something else besides their normal (conditional) meaning.

If the meaning of if in the sense in question should be seen as the basic conditional sense plus something extra, what would the extra stuff be? I guess the extra would be that both clauses are true (which is what Allan means by saying that he expects a "conjunction", and what Geoff means by saying that"this type of conditional is more like a coordination"), and that the second clause is surprising given the first, or goes beyond the first in some salient dimension. Is the conditional sense still there? Well, if the apodosis is assumed to be true, then the conditional relation is truth-conditionally moot. And perhaps the sense of NOT (A and NOT B) is what somehow gives rise to the concessive (and/or "excessive") meaning: "you might have thought that A should be true and B false, but not so..." I think we need some help from a neo-Gricean semanticist here.

FWIW, note that this concessive (or "excessive") meaning of if is especially common in introducing adjective phrases

It's all perfectly normal — if troublesome to varying degrees.
Virtual colon dissection is promising, if flawed.
It was fair and balanced if perhaps a little old.

or noun phrases with similar force

The final episode started with an explanation for the town mystery (if a problematic one), but if you thought this was designed to be a closing episode, guess again.

and the same thing is also often found with prenominal concessive modifers

Today hashing is a global, if little known, pursuit.
Vegetables and sour cream dip are a good (if common) idea

There is a related "excessive" meaning for A if not B, where the ordering of A and B on a scale is crucial, but the validity of B is not assumed. In fact, the Columbia Journalism Review suggests that this construction is often an underhanded way to insinuate something without stating it (the first two examples below are from CJR). It's also beloved of headline writers, probably for exactly the same reasons.

"...at worst, he bullied his opponents and impugned their integrity, if not their patriotism."
"Off and on for two decades, Dr. Lee's behavior was curious, if not criminal."
Money funds' tires are deflated, if not flat.
Paying property taxes made easier, if not painless.
Tomorrow Never Dies leaves you shaken, if not stirred.
Is local, if not organic, the better consumer choice?

Posted by Mark Liberman at 12:48 AM