December 31, 2007

Absence from the Water Cooler

Eric hasn't quite figured out the way things work here at Language Log. He was wondering why he couldn't find any of the senior staff around the water cooler. The answer is, we only go there to gossip. Unlike the junior staff, who actually use it to get a drink, our spacious offices are equipped with bars. Who needs a water cooler when you've got spring water and single malt scotch?

Posted by Bill Poser at 03:34 PM

The Origin of Speeches: Wrathful Dispersion for real?

Following up on "The comparative theology of linguistic diversity" (12/31/2007), Tracy Walsh has written to draw my attention to Isaac E. Mozeson, "The Origin of Speeches: Intelligent Design in Language". This book is apparently not a joke. The description on's web site reads:

The Origin of Speeches begins by recapping the history of our views about the source of language. It then debunks the errors that infuse your dictionary, like those about how words in "unrelated" languages could only have identical sound and sense by "coincidence." It does so with both quality and quantity of data. The next chapters give anyone the skills to sleuth out the Edenic origin of any human word. One learns about letters that shift in sound and location, and letters that drop in and drop out. We discover how Edenics works much like other natural sciences, such as chemistry and physics. Like-sounding opposite words were certainly programmed, not pragmatically evolved.

According to the review by Richard D. Wilkins, "Examples are provided here in copious detail; many hundreds more English words and foreign cognates can be explored in the companion E-Word CD Dictionary." There's an Edenics website:

Here you will discover that ALL human words contain forms of the Edenic roots within them. These proto-Semitic or early Biblical Hebrew words were programmed into our common ancestors, Adam and Eve, before the language dispersion, or babble at the Tower of Babel -- which kickstarted multi-national human history.

In the Edenics FAQ, Mozeson claims (improbably) that "Since the 1990s M.I.T. Professor Noam Chomsky was behind attacks on The Word in academic journals, and even forced the editor of a chain of Midwest Jewish newspapers, The National Jewish Post & Opinion, to drop an Edenics column. Amazingly, this world-class linguist and anarchist had to write a negative review for where he put absurd words in my mouth."

Mozeson is also the author of "The Word: The Dictionary that Reveals the Hebrew Sources of English".

His theory seems to be that God was a sort of weak cryptographer, who didn't actually create any new languages after Babel, but simply mixed up the old ones ("letters that shift in sound and location, and letters that drop in and out") in ways that Mozeson has figured out how to decrypt.

This strikes me as crank etymology with a religious overlay, rather than a serious attempt at rationalizing the linguistic aspects of Genesis.

[Update -- Rosie Redfield writes:

The biological equivalent of Edenics is Baraminology - phylogenetic analysis of the evolution that's assumed to have happened since Noah's Ark. Wikipedia has the details.

It seems to me that Baraminology is closer to this. ]

[Update #2 -- John Brewer writes:

no discussion of the intersection between historical linguistics and Biblical study should be considered complete without a mention of the legendary 17th century or thereabouts scholar who determined that Eden was at least tri-lingual, with God speaking Swedish, Adam speaking Danish, and the serpent speaking French. Googling seems to attribute this to someone named Andreas Kemke. (I remember the theory, but not the name of the theorist, from my undergraduate days in the pre-Google 80's.) LL seems not to have referred to it previously (assuming I'm working the search feature correctly), but Sally Thomason discussed the Kemke Hypothesis in a 1994 post to something called Darwin-L.

I am particularly intrigued by the Wrathful Dispersion Theory because the scriptural account of Babel seems to be pretty much the only thing in Genesis prior to the birth of Abraham which does not appear to be in irreconcilable tension with the consensus of modern secular science (at least if you don't get hung up on the dating of the event, which of course may vary depending on which textual tradition for Genesis you think is authoritative). Not that the plain-of-Shinar account has been confirmed; rather the ultimate relationship, if any, between Proto-Indo-European and, say, Proto-Na-Dene remains at least as unexplained as the origin of language in the first place. It's not clear whether this is because historical linguistics hasn't achieved very much compared to geology, biology, astronomy etc etc etc or because its practitioners are more modest and thus more willing to 'fess up that they don't have sufficient data to answer certain very interesting questions and may never get to the point of being able to answer them.

By the way, there is a certain Christian tradition of understanding the gift of tongues recorded in Acts as a specific reversal of the unfortunate results of Wrathful Dispersion, which can be seen in some of the Eastern Orthodox hymns used on Pentecost Sunday, such as the konkation which begins (in one English translation) "When the Most High came down and confused the tongues, He divided the nations, but when He distributed the tongues of fire, He called all to unity."

Happy New Year!

And Faith Jones writes:

Isaac Mozeson reminds me of Izzy Cohen. This guy shows up on all kinds of listservs promoting the idea that idioms are based on mis-hearings of the Hebrew Bible, a string of Hebrew words, or in a pinch Aramaic ones. Who exactly is supposed to have mis-heard someone speaking Biblical Hebrew and assimilated these expressions into modern English--presumably a time-traveller of some kind--is not explained.

Here are some places where he expounds his theory. Google "Izzy Cohen idioms" to get lots more. Very good for a laugh.

I worked for a number of years as a Judaica librarian, and it was my observation that many people felt the need to promote Hebrew and/or Aramaic as a source for pretty much any language. This mania was certainly found among the highly religious, but many moderately observant Jews latched on to the language issue as well. It seemed to me at times the religious fervour they couldn't work up for the theology they invested instead in linguistics. This perhaps made them feel that the basis of their belief was scientific rather than superstitious. In the event it seems to amount to the same thing.


Posted by Mark Liberman at 03:18 PM

Another Perspective on English-Only

Over the past fifteen years there has been a large influx of Chinese immigrants into the Vancouver area, so large that the impact is quite noticeable. People comment that half the students at the University of British Columbia seem to be oriental. The most common family name in the Vancouver area is now Lee 李 (also Korean), followed by Wong 王, with Smith in third place. Chan 陳 is fourth. Chinese are now the largest ethnic group in a number of areas, and in many areas most of the stores have signs in Chinese. One such area is Richmond, the suburb near the airport, where over 30% of the population is Chinese-speaking. In Richmond, all of the top ten family names are Chinese.

Not surprisingly this has caused grousing by some of the Euro-Canadian population, especially Anglo-Canadians. The biggest complaint seems to be signage in Chinese, which they can't read and which gives the area a saliently East Asian character.

A while back I was visiting a Native (that is "American Indian") friend and we went to a computer and electronics shop in Richmond. Like other stores in the area, most of the signs were in Chinese and the staff were Chinese. We got to talking about the increased Chinese presence and the unhappiness it had caused in some quarters. He said:

I don't know who these white people are to be complaining. As far as I'm concerned, they're all illegal immigrants. But if I have to choose, I prefer the Chinese.
Posted by Bill Poser at 03:10 PM

Camp language

A few days ago, I described some of the linguistic and ethnic background of the current situation in Pakistan ("Language in Pakistan", 12/28/2007). The key point: the national language of Pakistan, Urdu, is now the native language of only about 7% of the population. It was imposed at the time of the 1947 partition of British India, by the leaders of the All-India Muslim League who established the new country of Pakistan. At the time, Urdu was mainly spoken by Muslims living in what is now India, several million of whom moved to Pakistan during the upheavals that displayed some 20 million people in both directions across the new borders. The decision to impose Urdu led directly to the secession of East Pakistan (as Bangladesh) in 1971, and caused almost as much unrest in Sindh.

This post gives some historical background on the development of Urdu during the previous four centuries.

The OED on Urdu:

A. n. Formerly, = HINDUSTANI n. 2; in recent use distinguished from Hindustani (the lingua franca) and designated as the official language of Pakistan.

1813 J. SHAKESPEAR Gram. Hindustani Lang. 1 The dialect most generally used in India, especially among the Muhammadan inhabitants, called Urdū (camp) or Urdū zabān (camp-language). 1847 W. YATES Hindustani Dict. Pref., The Hindustaní or Urdú is peculiarly the language of the Muhammadan population of Hindústán. 1872 BEAMES Comp. Gram. Aryan Lang. I. 39 By a curious caprice, Hindi, when it uses Arabic words, is assumed to become a new language, and is called by a new name -- Urdu.

The entry for Hindustani:

2. The language of the Muslim conquerors of Hindustan, being a form of Hindi with a large admixture of Arabic, Persian, and other foreign elements; also called Urdū , i.e. zabān-i-urdū language of the camp, sc. of the Mogul conquerors. It later became a kind of lingua franca over all India, varying greatly in its vocabulary according to the locality and local language.
Formerly called Indostan, Indostans (cf. Scots). By earlier writers sometimes applied to Hindi itself.

Under whatever name, Urdu came into existence in the 16th and 17th centuries, as a lingua franca spoken in the polyglot armies, courts and administrations of the Mughal conquerors, who (though nominally derived from earlier Mongol invaders) were basically persianized turks from Central Asia. The name Urdu comes from a turkic word meaning "tent" or "army", which is also the source of English horde. The OED's etymology for horde:

[Ultimately ad. Turkī ordā, also ordī, ordū, urdū camp (see URDU), whence Russ. ordá horde, clan, crowd, troop, Pol. horda, Ger., Da. horde, Sw. hord, It. orda, Sp., Pr. horda, F. horde (1559 in Hatz.-Darm.). The initial h appears in Polish, and thence in the Western European languages. The various forms horda, horde, hord were due to the various channels through which the word came into Eng.]

And for Urdu:

[a. Hindustani (Pers.) urdū camp (ad. Turkī ordu, etc.: see HORDE n.), ellipt. for zabān-i-urdū ‘language of the camp’.]

The official language of the Mughal court and administration was Persian, with Arabic as the language of the conquerors' religion. However, the native language of the region was a dialect whose standard forms came to be known as Urdu or Hindi, depending on whether it was written with characters derived from Arabic or Sanskritic models. As Bob King puts it ("The poisonous potency of script: Hindi and Urdu", International Journal of the Sociology of Language 150:43-49, 2001):

Sanskrit diversified regionally into the languages known as the Prakrits, from which the major languages of northern and central India derive: Bengali, Marathi, Gujarati, and Oriya, as well as Hindi and Urdu. Both Hindi and Urdu evolved from Khari Boli, a branch of Western Hindi (Madhyadeshi) spoken in the region of northern India known as Haryana, which includes the present-day capital of India, Delhi.

King describes the early historical and social context of Urdu:

Arab traders began traveling to India as early as the seventh century C.E. The riches and stories they returned with whetted the appetites of the more adventurous of their coreligionists, and by the tenth century Turko-Afghan Muslims were regularly invading northwest India in search of plunder and converts. By the thirteenth century Muslims were in control of most of northern India, and three centuries later the (Muslim) Mughal dynasty ruled with an iron hand all of north India down well into the Deccan (the Deccan plateau cuts a swath from west to east through central India). The impact of Mughal rule, its greatest emperor Akbar (1542-1605) in particular, on the course of Indian history and on all aspects of Indian culture was enormous -- and fateful.

Part of the historic legacy of that impact was linguistic and graphemic. Urdu arose as the everyday language of the Mughal Empire, whose official and administrative language was Persian. Urdu was the language of the princely courts such as Delhi and Lucknow. The designation "Urdu'' is not found until 1752, when the poet Mir gave it the name Urdu-e-Mu'alla 'courtly language' (Dittmer 1972: 48). The word "Urdu'' is of Turkish origin (ordu) and originally meant 'camp'. Thus Urdu arose essentially as "the language of the (army) camp.'' Because of its Mughal and therefore Islamic provenance Urdu had by 1600 C.E. diverged from its Hindi origins through extensive absorption of Persian and Arabic linguistic material: loan words, syntactic turns of phrase, a handful of phonemes borrowed from Persian, a certain precious "courtly style,'' a Persian cast to poetry and song. The ghazal, for example, a genre of song much admired in India by Hindus and Muslims alike, is Persian in origin.

It must be remembered that from roughly 1400 C.E. onward Persia was to those countries in its cultural orbit what France was to Europe during the reign of Louis XIV, and for a long time afterward. Persian ways set the tone in the ruling courts of Turkey, Afghanistan, and northern India. Persian culture and cuisine were highly valued. The Persian language was the language of diplomacy, of treaties, of art and beauty, of song, of love. Even after the British had assumed control of most of India, moving into a political vacuum created by the shrinking of the Mughal dynasty, they continued the use of Persian as the language of administrative records until the 1830s, when English became the official language of the Raj. The Persian language maintained a "high'' function in Indian Muslim life long after it had ceased to be anybody's first spoken language there. A favorite diversion -- opium was another -- of the last Nizam of Hyderabad, a Muslim-ruled enclave in the Deccan, whose reign was ended in 1948, was composing quatrains in the Persian language. His native language was of course Urdu.

In this context, an important Urdu literary tradition, based on Persian and Arabic models, developed in the 18th and 19th centuries. This took place at the same time as the British take-over of the subcontinent, and there came to be a significant connection between the British administration and the cultural development and spread of Urdu. King explains that

The British had introduced Urdu in the Perso-Arabic script as the language of the courts and administration in the Northwestern Provinces after the Sepoy Mutiny in 1857, and Urdu was mandated as the language of the Indian army in 1864. British officials were in agreement that Urdu or, as they had begun to call it, Hindustani should become the lingua franca of all India, at least of north India.

This seems to have led to a paradoxical situation, in which the Muslim elites were simultaneously more pro-British and more anti-English (language) than their Hindu counterparts. According to R. Powell, "Language Planning and the British Empire"

In the early 19th century the Hindu elites, who had been something of an under-class in the Mughal Empire, favoured English more than the Muslims, but wanted Hindi recognised to the extent that Urdu was. While Shah Abdul Aziz was describing English education as something 'abhorrent’ and 'improper’ for Muslims, the Calcutta bourgeoisie were organising the Hindu College (opened 1816) so that their own elite could acquire English language and literature and European science.

Later in the 19th century, Syed Ahmed Khan founded the Muhammedan Anglo-Oriental College, which became Aligarh Muslim University. According to his wikipedia entry,

... Sir Syed was suspicious of the Indian independence movement and called upon Muslims to loyally serve the British Raj. He denounced nationalist organisations such as the Indian National Congress, instead forming organisations to promote Muslim unity and pro-British attitudes and activities. Sir Syed promoted the adoption of Urdu as the lingua franca of all Indian Muslims, and mentored a rising generation of Muslim politicians and intellectuals.

One of his protégés was Maulvi Abdul Haq, known as Baba-i-Urdu ("father of Urdu"). From wikipedia:

Following the establishment of the Osmania University by the Nizam Osman Ali Khan, Asif Jah VII of the Hyderabad State in 1917, Haq moved to Hyderabad to teach and help build the university. All subjects at the university were taught in Urdu, and under Haq's influence the institution became a patron of Urdu and Persian literature and linguistic heritage. ... in 1930 Haq led the group in protest against a campaign by Indian nationalists to promote the use of Hindi as the national language of India. Haq became a fierce critic of Indian leader Mahatma Gandhi and the Indian National Congress, the largest political party in the nation. Suspicious and averse to the Congress and the Indian independence movement, in which Hindus composed a majority of leaders and participants, Haq joined the All India Muslim League led by Muhammad Ali Jinnah.

After partition in 1947, Haq moved to Karachi as one of the millions of Urdu-speaking muhajirs ("settlers"), and

... re-organised the Anjuman Taraqqi-e-Urdu ..., launching journals, establishing libraries and schools, publishing a large number of books and promoting Urdu education and linguistic research. ... Haq also used his organisation for political activism, promoting the adoption of Urdu as the lingua franca and sole official language of Pakistan. He criticised the popular movement that had arisen in East Pakistan (now Bangladesh) to demand the recognition of Bengali, stressing his belief that only Urdu represented Muslim heritage and should be promoted exclusively in national life. Condemning the 1952 Language movement agitations in East Pakistan, Haq was infuriated by the decision of the Constituent Assembly of Pakistan to make Bengali a second official language.

More later, on the differences between Hindi and Urdu, and on the problems that Urdu apparently poses for literacy in Pakistan.

[For an interesting analysis of the political background,(before Benazir Bhutto's assassination) from the perspective of one of "the Urdu-speaking descendents of immigrants from India", see Salim Chauhan, "With or Without Musharraf -- A Mohajir's Perspective", 4/11/2007. His bitter comments about the "sons of the soil" are (I guess) directed at the feudal landowners who are still a dominant power in Pakistan -- as William Dalrymple wrote recently ("Pakistan's flawed and feudal princess", 12/30/2007):

Real democracy has never thrived in Pakistan, in part because landowning remains the principal social base from which politicians emerge.

The educated middle class is in Pakistan still largely excluded from the political process. As a result, in many of the more backward parts of Pakistan, the feudal landowner expects his people to vote for his chosen candidate. As writer Ahmed Rashid put it: 'In some constituencies, if the feudals put up their dog as a candidate, that dog would get elected with 99 per cent of the vote.'

And Chahan's mention of "[t]he Urdu-speaking collaborators, left at the mercy of the victorious Mukhti Bahini, ... 'stranded' in a hostile country and shamelessly abandoned by the very country which they supported" is a reference to the 600,000 or so Urdu-speaking ethnic Biharis who have been stranded as officially stateless refugees in Bangladesh since 1971, one of the many deeply depressing aspects of this region's recent history.]

Posted by Mark Liberman at 02:32 PM

Bab-El in the Book of Mormon

To follow up on Mark's discussion of the story of the tower of Bab-El, I had a look at the Book of Mormon, which turns out to mention the story in passing in several places.

Omni 1:22

It also spake a few words concerning his fathers. And his first parents came out from the tower, at the time the Lord confounded the language of the people; and the severity of the Lord fell upon them according to his judgments, which are just; and their bones lay scattered in the land northward.

Mosiah 28:17

Now after Mosiah had finished translating these records, behold, it gave an account of the people who were destroyed, from the time that they were destroyed back to the building of the great tower, at the time the Lord confounded the language of the people and they were scattered abroad upon the face of all the earth, yea, and even from that time back until the creation of Adam.

Perhaps the most interesting part is the bit in the Book of Ether which mentions that the language of certain people was not "confounded", which presumably means that these people were allowed to retain the language that they had hitherto spoken.

Ether 1:33-37

Which Jared came forth with his brother and their families, with some others and their families, from the great tower, at the time the Lord confounded the language of the people, and swore in his wrath that they should be scattered upon all the face of the earth; and according to the word of the Lord the people were scattered.
And the brother of Jared being a large and mighty man, and a man highly favored of the Lord, Jared, his brother, said unto him: Cry unto the Lord, that he will not confound us that we may not understand our words.
And it came to pass that the brother of Jared did cry unto the Lord, and the Lord had compassion upon Jared; therefore he did not confound the language of Jared; and Jared and his brother were not confounded.
Then Jared said unto his brother: Cry again unto the Lord, and it may be that he will turn away his anger from them who are our friends, that he confound not their language.
And it came to pass that the brother of Jared did cry unto the Lord, and the Lord had compassion upon their friends and their families also, that they were not confounded.

There is also a reference to a language having become "corrupted", which presumably describes language change:

Omni 1:17

And at the time that Mosiah discovered them, they had become exceedingly numerous. Nevertheless, they had had many wars and serious contentions, and had fallen by the sword from time to time; and their language had become corrupted; and they had brought no records with them; and they denied the being of their Creator; and Mosiah, nor the people of Mosiah, could understand them.
Posted by Bill Poser at 02:15 PM

Proxy debates

Speaking (a tad belatedly) of English-only insanity (of the non-parodic kind), y'all might want to take a look at the mass of comments on my "Speak xkcd or die" from earlier this month (Dec. 6). I believe this is the most commentary I've ever gotten on a post (35 so far, several in just the past week or so). Almost certainly, many of these comments are the longest ever for one of my posts. And no doubt, several of these comments contain some very virulent vitriol (sorry, couldn't resist the alliteration). People really seem to care about this national language business, and folks like Fred Thompson are speaking right to them.

I haven't censored any of the comments, and so far I have also resisted any temptation to jump in and respond to any of them. Like the comic that was the subject of the post, I think each comment speaks for itself. But I will add here one overall response, a quote from Sally Johnson's excellent 2001 Journal of Sociolinguistics article ("Who's misunderstanding whom? Sociolinguistics, public debate and the media"), which I found via American English: Dialects and Variation (by Walt Wolfram and Natalie Schilling-Estes; see p. 212).

"It is not language per se, but its power to function as a 'proxy' for wider social issues which fans the flames of public disputes over language." (Johnson 2001, p. 599)

[ Comments? ]

Posted by Eric Bakovic at 11:30 AM

Mailbag: the comparative theology of linguistic diversity

Following up on yesterday's post "The science and theology of global language change", Cosma Shalizi writes:

I cannot, to my shame, recall whether the Qu'ran includes a version of the Babel story, but there is a famous passage where it seems to look favorably on this sort of diversity (49:13, Pickthall trans.):

O mankind! Lo! We have created you male and female, and have made you nations and tribes that ye may know one another. Lo! the noblest of you, in the sight of Allah, is the best in conduct. Lo! Allah is Knower, Aware.

Similar 30:22,

And of His signs is the creation of the heavens and the earth, and the difference of your languages and colours. Lo! herein indeed are portents for men of knowledge.

Arguably 11:118 is to a similar vein,

And if thy Lord had willed, He verily would have made mankind one nation, yet they cease not differing.

(Pickthall was an English convert, and self-consciously tried to make his translation sound like the King James Bible, which I find dubious but it's public domain now.)
See more generally.

I wonder, have these verses ever been used as the basis for language documentation, preservation or revival?

With respect to the Babel story, a bit of web searching turns up two koranic references to an unnamed impious tower, involving Pharaoh and Haman but not linguistic diversity. Sticking with the Pickthall translation:

28:38 And Pharaoh said: O chiefs! I know not that ye have a god other than me, so kindle for me (a fire), O Haman, to bake the mud; and set up for me a lofty tower in order that I may survey the god of Moses; and lo! I deem him of the liars.

40:36 And Pharaoh said: O Haman! Build for me a tower that haply I may reach the roads,
40:37: The roads of the heavens, and may look upon the god of Moses, though verily I think him a liar. Thus was the evil that he did made fairseeming unto Pharaoh, and he was debarred from the (right) way. The plot of Pharaoh ended but in ruin.

Marc van Oostendorp writes:

Obviously, the most famous story about linguistic diversity in the Bible is Genesis 11. But there was linguistic diversity even before they built the tower. Genesis 10 lists the children of Noah after the flood, and where they went to live. Gen 10:5 then says that they went there 'each with their own tongue, according to their clans in their nations'. I once read about a theological debate about this apparent inconsistency, but I don't remember what the outcome of this debate was.

For those people who believe in the New Testament, there obviously also is the story of Pentacost, when Jesus' disciples start speaking in tongues, and this is considered to be a gift of God.

David Eddyshaw writes:

I'm probably one of a rather small proportion of LL readers who actually *is* a Biblical inerrantist.

I can't see myself why believing the Babel story would entail denying subsequent changes in language. I haven't ever heard of anybody who did make this deduction, though I don't doubt there are some out there. (I know a really excellent surgeon, who is highly intelligent and incidentally one of the nicest people I've ever worked with, who believes that Heaven is a cube, based on a totally literal interpretation of the book of Revelation which I would have thought would have astonished the author of the book).

Come to that, as a Brit, I may well not be very like a typical American inerrantist. When I lived in Nigeria I used to know quite a lot of missionaries from the US, and found that in their terms I often counted as a dodgy liberal theologically (I appreciated the thrill).

FWIW there are quite a number of different attitudes to the Bible held by people all of whom would sincerely describe themselves as inerrantists. It's not always clear a priori what "counts" as an error, so denying their presence can amount to rather different things in practice. Evidently- parallel passages in Kings and Chronicles, for example, have (very) different numbers for sizes of armies; unless you magic away all such instances as convenient textual corruption, you have to accept that there are errors ("errors"?) which don't matter. In the UK at any rate, I've yet to meet anybody who truly maintained the contrary.

In practice, I've found that most people happily apply labels like "inerrantist" to themselves and blithely leave worrying about the details to theological geeks. It's a pretty standard trope in sermons that pretty few of us who claim to believe in Biblical inerrancy are all that familiar with the actual book in any detail. But mutatis mutandis, that applies to pretty much everybody outside their own areas of geekitude, I suppose.

Well, there's the same Occam's Razor argument as in the case of biological evolution -- some may find it odd to suppose that diversity (of languages or of species) is of two kinds, one natural and the other miraculous. If natural processes can create some languages and species, why not all? I guess that there's no problem here for those who think that the world mostly runs on naturalistic principles, with occasional divine interventions ad libitum.

Bill Poser writes:

M____ R_________ [someone who was in grad school with Bill] experienced a problem with the story of Bab-el when we were students. The orthodox believe that Hebrew was the first language. (God knows all languages, but the angels know only Hebrew, which puts paid to that Joan of Arc nonsense...) It seemed to her that the arguments for a relationship between Hebrew and Arabic were sound, but she wasn't sure that this view was consistent with the Torah: if Hebrew is the first language and existed prior to the sundering of languages at Bab-el, and if Arabic is one of the many languages resulting from the sundering, could there be a relationship between them? She was going to ask her tzadek. I don't know what answer he gave.

George Corley writes:

Reading your post Science and Theology of Language, I was reminded of a passing mention of linguistics buried in the Creation Museum -- which was recently built by Answers in Genesis to present archaeological evidence in the framework of the Biblical creation myth. The museum itself includes pseudoscientific explanations for everything from carnivorous animals to the disappearance of the dinosaurs and in the framework of that mythos while explicitly rejecting "Human Reason" (actually stating such in several displays) in favor of "God's Word".

The relevant panel is here:

The section includes a radial diagram of language families with Babel at the center, with the text:

The Bible claims that God created a number of human languages at the Tower of Babel "according to their families." Nineteenth-century linguists argued that languages evolved slowly, one by one. Today linguists recognize languages fall into distinct "families" of recent origin.

I'm sure you can see the fallacies of the statement rather quickly. Notably, the phrase "according to their families" does not appear in the Babel myth, but rather comes from the wrap up to the Flood story, describing the origins of different peoples and languages as originating from Noah's three sons.

The Creation Museum is, of course, among the most extreme creationist organizations there are. I don't think there's any mention of the Babel theory among more moderate Christians.

Sally Thomason discussed the Creation Museum's linguistic theories last summer -- see "Creationist Linguistics", 8/1/2007. As she suggests, the panel's theory seems to be that all the distinct language families -- between whom no relationship can be proved, due to the inevitable decay of evidence with time -- originated at the babelian dispersion, with further subdivision by natural evolutionary processes since then. This seems analogous to the view that some taxonomic level of plants and animals (genera? famlies? kingdoms?) was created by the intelligent designer, with finer distinctions evolving by darwinian means. I guess that creationists do believe something of the sort, though perhaps it's that the species level was created, with subspecific variation evolving by natural processes.

[Update: Matthew Watson has drawn my attention to this discussion, which basically confirms in a more detailed way the theory that "each distinct language family is the offshoot of an original Babel 'stem language' which did not arise by change from a previous ancestral language".]

Posted by Mark Liberman at 06:08 AM

December 30, 2007


Mark Liberman and I have gone one round on blame Y on X, thanks to the 1915 Funk & Wagnalls Faulty Diction booklet, which categorically rejects the usage.  Through the mediation of MWDEU we were led to the apparent source of the peeve, Alfred Ayres's 1881 book The Verbalist, and now we're into meta-matters -- not so much about the syntax of blame, but about the advice literature on it.

1. Why this antipathy to innovation?  I'm only going to scratch the surface here (see note below on the placement of only), and very briefly, but at least eight attitudes and beliefs might contribute to this antipathy.

[Note: please don't write me about how only is "misplaced" here, because it should be right next to the thing it modifies, scratch the surface.  I mean, think about it: putting only right next to scratch the surface gives you going to only scratch the surface, which is a split infinitive even I am uncomfortable with, and locating the only before to scratch the surface interrupts the very tight idiom of prospective be going to -- going only to scratch the surface, ugh -- so only going to scratch the surface is by far the best version.  In any case, I'm not at all set against "high" placement of only and even (preceding a VP that contains the modified constituent), as in I only saw one dog 'I saw only one dog'.]

(1) Belief: innovation of new variants is pointless elaboration.  Why invent new expressions when old ones do fine?  In the case at hand, why concoct blame Y on X when we already have blame X for Y?  (There are answers to these questions, which I'll sketch later.  Here I'm trying to unpack the reasoning of usage critics like Ayres.)

(2) Belief, lying behind (1): for the most part, alternative expressions are paraphrases, differing at most in style.  In particular, blame Y on X and blame X for Y are (it is assumed) paraphrases.

(3) Belief: when alternative expressions differ in style, a "higher" variant (standard rather than non-standard, of course, but also formal rather than informal, written rather than conversational, and in general use rather than restricted socially or geographically) is intrinsically better than a "lower" variant and therefore should be preferred.  So insofar as blame Y on X is perceived as being "colloquial" (informal and conversational) or restricted to certain groups of speakers, it is to be avoided.

(4) Attitude: variation should not be tolerated.  In any context, there should be One Right Way.  In particular, either blame Y on X or blame X for Y, but not both.

(5) Attitude: the past should be preserved.  In particular, blame X for Y should continue in use.  Putting this together with (4): blame Y on X should not be tolerated.

(6) Belief: innovations threaten older alternatives.  Even if you don't subscribe to (4), this belief in combination with (5) means that innovations are to be rejected.  In particular, blame Y on X should not be tolerated, because it threatens the older variant blame X for Y.

(7) Belief: most innovations arise "from below", from the working class, the uneducated, the uncultured, the frivolous young, and so on, and so are to be resisted on this basis alone.

(8) Belief: most innovations arise from ignorance or laziness or both, and so are to be resisted on this basis alone.

[Note: "Alfred Ayres" is a pseudonym used by Thomas Embly Osmun, described in his 1902 NYT obituary as an "elocutionist and critic of dramatic expression".   With Richard Grant White, one of the great American grammar/usage ranters of the 19th century.]

2. Why innovate?  And why tolerate alternatives?  There's something a bit off-center in all of the attitudes and beliefs above, but here I'll look at just a few, beginning with the question of why people don't just leave the language alone, why people innovate.  Why don't they just preserve older forms, as in (5) above?

One part of the answer is that people are always trying to find ways to express what they want, and they're willing to stretch things a bit for their purposes.  Everyday conversation is full of language play of all kinds, novel figures of speech (metaphors and metonymies), other extensions of meaning, extensions of syntactic patterns, exploitations of implicature, intrusions from other varieties, and more more.  So is more elevated speech and writing.  Everybody innovates, all the time.  And that's a good thing.  (Of course, people also use a lot of expressions as wholes, formulas, and prepackaged routines.)

Another part of the answer is that people can't possibly know what the language as a whole is like, so that in an important sense much of the time they won't know whether they're innovating or just using existing patterns.  After all, people say lots of things they've never heard before, or don't recall having heard before.  If there's some backing for a usage, then go for it!

But on to specifics: diathesis alternations in English involving direct objects (DOs) and another complement.  Diathesis alternations are alternative distributions of syntactic arguments of heads (I'll stick to verbs here), where arguably the "same" head -- that is, a head with the same semantics, phonology, and morphology -- occurs with different sets of syntactic arguments:

Kim ate something.  [SU x, DO y]  (where SU = subject)
Kim ate.  [SU x]

Standard English has many patterns of diathesis alternations for verbs (Beth Levin's English Verb Classes and Alternations is a significant beginning of an inventory of them), including a number in which SU remains constant but DO can be switched with PO (prepositional object), among them:

the verb present (recipient x, thing presented y):
  I presented Kim with an award.  [DO x, PO y, P = with]
  I presented an award to Kim.  [DO y, PO x, P = to]

smear-class verbs (material x, location y):
  They smeared paint on the wall.  [DO x, PO y, P = on]
  They smeared the wall with paint.  [DO y, PO x, P = with]

"dative alternations (2O is "second object", not marked with P)":
  transfers (recipient x, thing transferred y):
    I gave Kim an award.  [DO x, 2O y]
    I gave an award to Kim.  [DO y, PO x, P = to]
  benefactives (beneficiary x, thing affected y):
    I baked Kim a cake.  [DO x, 2O y]
    I baked a cake for Kim.  [DO y, PO x, P = for]

(Please don't write me to tell me about more types -- these are merely illustrative -- or about further details of these types.  The literature on diathesis alternations is staggeringly huge, even just for English, even just for some of the types, like the dative alternations.)

I have to stress here that these alternations are entirely standard in modern English and have been so for some time.  The point is that speakers of English have every right to assume that there can be different ways of "saying the same thing" via different deployments of DO and PO.

But of course -- contra (2) -- these are not quite ways of saying the same thing.  The truth conditions for the alternatives might (or might not -- there are disputes about particular cases) be the same, but the alternatives are not otherwise equivalent; in particular, they do different things in discourses.  This is not just pointless elaboration (cf. (1)).  In contrast to the assumption in (2) that variation is (except perhaps for style) free, I've repeatedly argued that variation is typically unfree: there are contexts in which the alternatives do different things (even if they're mostly interchangeable) -- a moderated version of Dwight Bolinger's position that there is no formal difference without a semantic difference.  (Some discussion here, here, here, here.)

In all of the DO alternations, including the blame case, there's a difference in the status of the DO, versus PO or 2O: it's closely associated syntactically with the V, leading to the expectation that its referent is in some way "focussed on" or foregrounded; and it precedes the other (PO or 2O) argument, leading to the expectation that its referent is a given rather than a new item in the discourse (old before new).  In addition, the linear order favors short NPs (especially pronouns) as DO, as againt longer NPs, which are preferable in later constituents (shorter before longer): Blame it on Canada works better than Blame Canada for it, for instance.

All of these effects, and others I haven't mentioned here, are familiar from the literature on diathesis alternations involving DOs and other non-subject arguments of a V.  Put them together with the existence of blame X for Y, and you have a formula for the innovation of blame Y P X, for some preposition P.  It would be a way of promoting the resultant/caused situation NP to the position of prominence/givenness immediately after the V, and postponing the source/cause NP (usually, but not necessarily, referring to a human being) until later in the VP:

the verb blame (source/cause x, resultant/caused situation y):
  I blame Kim for the problem.  [DO x, PO y, P = for]
  I blame the problem P Kim.  [DO y, PO x, P = ?]
The only question is what P to choose.  Here, Faulty Diction alludes to a possible source for the P:

... [either] "I do not blame the President for the defeat," or "I do not lay the blame . . . upon," etc.  Here [in "I do not blame the defeat on the President"] two points of view essentially different are confused.

That is, the idiom family

lay/put/place/fix (the) blame (up)on ...

supplies a model with a candidate P, namely on (earlier upon).

And so it was.  The innovative construction blame Y on X supplies a result/caused situation DO to go along with a source/cause PO, and another construction with similar meaning provides the P on.  It's hard to believe that speakers at the time batted an eyelash over the innovation -- it would have been easily comprehensible, and in fact most people probably wouldn't have recognized it as something they hadn't heard before -- which means that it was able to spread quickly, simply because it was a Good Thing, because it does useful work communicatively. 

(Why it didn't arise earlier is a knotty question, the answer to which is likely to be: sunspots.  Lots of changes are likely to happen, for language-internal reasons, and they do happen, with modest frequency here and there, but only a few catch on -- because the planets are properly aligned, or whatever.)

More generally: most innovations have a communicative rationale -- not necessarily the same one as in the blame example -- and are not just pointless elaborations. 

Yes, I know, some lexical innovations have primarily a social rationale, as when new items mark off the social groups that use them or establish personas for particular speakers in context.  And I know that the spread of variants runs significantly along social lines (I am, after all, some kind of sociolinguist), but I'd also like to point out that a great many variants are out there just because they're good to go.

3. A few brief remarks, on things I'm flagging for possible posting on a later occasion.

3.1. Innovations as threat.  Although neither Ayres nor Faulty Diction makes this explicit for blame on, innovations are routinely seen as threatening to the older variants ((6) above).  If you see alternants as free variants ((2) above), then they're in competition, and the newcomer is a threat to the oldtimer.  (Indeed, there are a fair number of well-known cases where innovative variants have in fact supplanted older ones.)

But when the alternants are differentiated in use, stable variation can result.  This seems to be the situation for blame on, which has co-existed with blame for for at least 125 years.

3.2. The class issue.  Ayres on blame on: "a gross vulgarism, which we sometimes hear from persons of considerable culture".  Faulty Diction: "indefensible slang".  There are several things to unpack here, starting with the assumption that blame on originated "from below" ((7) above), in the vulgar mob, the users of slang, and therefore is to be rejected on that basis alone ((3) above).  No source I've seen provides evidence for this claim -- I assume that it's the result not of observation, but of reasoning from first principles, in particular from the assumption that innovations GENERALLY originate from below -- and I suspect that it might be false.  Innovations that provide communicatively useful variants, as blame on does, could in principle originate with speakers of any sort.  The fact that Ayres in 1881 was already hearing the innovative variant in "persons of considerable culture" suggests that it might have originated among such speakers (and, quite likely, independently among speakers of other sorts as well).

3.3. The education/culture issue.  Why should usage advisers assume that innovations generally originate from below?  Probably because they take innovation to be the result of ignorance ((8) above); the innovators don't know better (they should look it up!).  Educated and cultured people, on the other hand, are assumed to be able to differentiate novelties from established forms, and to know the norms that are recommended in the advice literature.  That is, the assumption is that education and cultural refinement provide a kind of synoptic knowledge of linguistic variants and their social values.  This is, on the face of it, a preposterous idea -- not even scholars of variation have such comprehensive knowledge -- but there it is.

3.4. Fashions in peeves.  The peeves that make their way into the advice literature are not an even sampling of (what are judged to be) "lower" variants (non-standard, informal, primarily spoken, geographically or socially restricted) -- I am constantly coming across such variants that get no, or very little, press from usage advisers -- nor are the variants that get the most press and excite the most passionate criticism necessarily of much, or even any, significance on rational grounds.  There are fashions in peeves, some of which persist more than a century.  (Some are extraordinary.  I have a posting in preparation about one of them, which I'm nominating for a Peevy Award for Lifetime Achievement.  But blame on is certainly a contender for the award.)  As Mark noted in his posting,

Ayres' perspective in this case is an inspired one, precisely by virtue of lacking any pretense of logical or empirical support. There are no grounds for refutation -- grammatical logic is irrelevant, and if many excellent writers often use the construction, that simply shows that their culture, although perhaps "considerable", is in fact inadequate.

and concludes with a wonderfully pointed critique, worth repeating here:

... is there any constellation of facts that could prevent Ayres' little off-hand expression of prejudice from echoing down the centuries in the unconsidered repetitions of his cultural copyists?

From a functional perspective, I suppose that this is an ideal sort of prescriptive norm. Since it's an entirely artificial policy, with no basis in the past century of speaking and writing, there's no way to learn it simply from attending to even the "best" speakers and writers. The only possible source is works on usage. This adds essential value to such works, by giving those who read them carefully and credulously a reason to feel superior to every one else.

Posted by Arnold Zwicky at 02:16 PM

What in the fudge?

It's Bring a Friend to Work Weekend (BaFtWWe) here at Language Log Plaza, and I decided to bring two friends, my UCSD colleagues Andy Kehler and Roger Levy. They've both been dying to meet some of the senior writing staff, but nobody seems to be around. (Funny, since it was a couple of members of the senior staff who told me about BaFtWWe in the first place. Hmm.) Anyway, while I was running around trying to find Mark, Arnold, either of the Geoffs, Bill, Sally, Roger, or anyone at all really, I left Andy and Roger by the water cooler, which we've recently equipped with a recording device in order to capture the unique conversations that so often take place there. Here's an expurgated transcript of Andy and Roger's conversation which I thought Language Log readers might enjoy.

Andy: I just saw this headline on the FoxNews website.
Woman whose diamond ring vanished while she made fudge for bake sale turns up inside piece of the candy she sold
This can only mean that the woman showed up inside the candy, and not the ring, right? Yeesh.
Roger: According to Whitney Tabor it can also mean the bake sale turned up inside the candy...
Andy: That's pretty funny. Somehow that parse didn't jump out at me as much as it does in the "the coach smiled at the player tossed the frisbee" cases.
(notices the recording device; leans in) (Background if this is opaque to any of you: people read the above sentence more slowly than one in which "tossed" is replaced with "thrown", which, according to Whit Tabor, is because "the player tossed the frisbee" is an allowable sentence, even though an incremental parser should know it can't be one in this particular syntactic context.)

The relevant FoxNews story is here; the headline was on the FoxNews homepage at one point, as shown here.

[ Comments? ]

Posted by Eric Bakovic at 01:50 PM

The science and theology of global language change

Some of our satirists need to go back to Sunday school. Dennis Baron writes ("U.N. proclaims 2008 the International Year of Language", 12/30/2007):

While the rest of the world lines up to support the U.N.’s International Languages Year, U.S. ambassador to the United Nations Dr. Zalmay Khalilzad has announced that America’s participation remains problematic.  The Bush administration is claiming that languages were theories, not scientifically-proven facts, and the president himself recently affirmed his belief that God created English in just six days and promised to veto the use of federal funds to teach language evolution to impressionable children.

But surely those who believe in biblical inerrancy accept that the bible treats linguistic differences as facts, e.g.

Neh.13:24 And their children spake half in the speech of Ashdod, and could not speak in the Jews' language, but according to the language of each people.
Esth.1:22 For he sent letters into all the king's provinces, into every province according to the writing thereof, and to every people after their language, that every man should bear rule in his own house, and that it should be published according to the language of every people.
Pss.81:5 This he ordained in Joseph for a testimony, when he went out through the land of Egypt: where I heard a language that I understood not.
Isa.36:11 Then said Eliakim and Shebna and Joah unto Rabshakeh, Speak, I pray thee, unto thy servants in the Syrian language; for we understand it: and speak not to us in the Jews' language, in the ears of the people that are on the wall.
Jer.5:15 Lo, I will bring a nation upon you from far, O house of Israel, saith the LORD: it is a mighty nation, it is an ancient nation, a nation whose language thou knowest not, neither understandest what they say.
Ezek.3:6 Not to many people of a strange speech and of an hard language, whose words thou canst not understand. Surely, had I sent thee to them, they would have hearkened unto thee.

And most important, the world's different languages (presumably including English) were not part of the original six days of creation, but were ginned up much later, in Genesis 11:

[1] And the whole earth was of one language, and of one speech.
[2] And it came to pass, as they journeyed from the east, that they found a plain in the land of Shinar; and they dwelt there.
[3] And they said one to another, Go to, let us make brick, and burn them throughly. And they had brick for stone, and slime had they for morter.
[4] And they said, Go to, let us build us a city and a tower, whose top may reach unto heaven; and let us make us a name, lest we be scattered abroad upon the face of the whole earth.
[5] And the LORD came down to see the city and the tower, which the children of men builded.
[6] And the LORD said, Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.
[7] Go to, let us go down, and there confound their language, that they may not understand one another's speech.

The international movement to teach this history in the public schools is known as "Wrathful Dispersion Theory". Or, rather, it might be know by that name, if it existed -- the status of linguistic instruction in our schools is so anomalously low that no one has felt the need to create such a movement

This all leaves me uncertain what the theology of linguistic diversity might be, for those people to whom such things matter. On one hand, the creation of diverse languages was a punishment, not a reward or an example of divine bounty, so eliminating all the world's languages in favor of English might be seen as a good thing, restoring the world to a state closer to the creator's original plan. On the other hand, linguistic diversity is a divine punishment for human technological presumption, and who are we to interfere?

While we're on the subject, I've often wondered whether it's consistent with biblical inerrancy to believe that new languages (such as English) have continued to evolve after the Babelian dispersion? And if so...

In any case, it's lame to mock the fundamentalists without maintaining a modicum of biblical consistency.

One of Baron's digs at our current president did make me laugh, though I've never been a fan of the Bushisms industry in general:

Reacting to a New York Times report that Marvel Comics has just released a bilingual Fantastic Four comic book, the president also told reporters in a Rose Garden press briefing that the United States would not be a signatory to any multinational treaties attempting to reverse global language change. He urged everyone living in the United States to speak English, not Spanish, and, demonstrating his commitment to make English America’s official language, he resolved to begin learning English right away.

During George W. Bush's first term, I wrote to the folks at to suggest that their name, coined in response to Republican exploitation of Bill Clinton's extramarital peccadillos, ought also to be applied to W's linguistic practices. But there are a few good Bush language jokes, just as there were a certain number of good Clinton sex jokes, and this is one of them.

Posted by Mark Liberman at 12:54 PM

You know

Matt Hutson writes:

My older sister used to be chastised by my parents for, like, saying "like" too much, when she was younger. Over Christmas, I realized how much she, you know, says "you know." I wonder if she simply replaced "like" with "you know" at some point many years ago. (She is an extrovert, by the way, perhaps hence not saying "I mean.")

Matt is following up on an earlier on-line conversation about personality, gender and filler phrases ("News flash: the biggest users of 'like totally' are middle-aged men", 8/18/2007; "I mean, you know", 8/19/2007 ).

I still don't know anything about possible correlations between personality types and choice of fillers; and I don't have anything helpful to say about possible trade-offs among fillers, though there's an interesting literature on functional similarities and differences in such phrases. But I can contribute a quick Breakfast Experiment™, based on LDC Online's collection of English Conversations, to show that there's general tendency for people to use "you know" somewhat more often in their middle years:

Total uses
of "you know"
20-39 5,356 58,364 10.9
40-59 19,920 278,099 14.0
60+ 2,908 33,477 11.5

The age categories are pretty crude, and not optimized to find differences in any particular variable, so a 28% increase between the "20-39" and "40-59" categories is striking. If we took the time to break out the age effect without grouping into pre-defined bins, the magnitude would probably be even greater.

Of course, Matt is talking about perception of individual differences, which might be very far from the group norms. But what he perceives as his sister's filler-phase trajectory is qualitatively congruent with the overall generational pattern, at least as shown in this apparent-time snapshot.

Interestingly, the general tendency of older women to talk more like younger men (see "Young men talk like old women", 11/6/2005; "Busy tongues", 12/31/2006; "What men and women blog about", 7/8/2007) is confirmed again in this case:

Here are the counts in tabular form:

Total uses
of "you know"
Total uses
of "you know"

It seems to me that there's something going on here, in the effects of age and sex on linguistic variables, that's worth following up further. We've seen a similar general pattern in use of uh and um, where men use uh more than women, and older people use uh more than younger people, while men use um less than women, and older people use um less than younger people; in amount of talk, where males talk more than females, and older people talk more than younger people; in web-log vocabulary choice, where Schler et al. 2006 found that

Regardless of gender, writing style grows increasingly "male" with age ...

and now in the conversational frequency of you know.

This is reminiscent of the general tendency of the male/female opposition to line up sociolinguistically with informal/formal and lower-class/higher-class oppositions. For linguistic variables that are changing rapidly, we also expect to see male/female lining up with older/younger because women tend to lead most changes. However, it seems very unlikely that any of the variables discussed here are involved in such secular trends.

We might also connect this pattern with the general tendency in our culture, where feminine identity tends to be culturally marked (though biologically the default), and the degree of marking stereotypically decreases with age. But in the case of uh and um, the degree of gender differentiation seems to stay the same across age, although its effects change systematically in ways that see older women and younger men behaving similarly.

[Caveats: Other demographic variables (like education, and geographical region) are not necessarily balanced across sex and age classes in the data given above. And the differences by sex in number of you knows may be partly attributed to the small sex differences in word count per conversation -- see "Gabby guys: the effect size", 9/23/2006.

And as always, we're talking about modest group differences in overlapping distributions of individual characteristics, not qualitative differences in platonic archtypes -- though it remains difficult to find ways to talk and write that aren't misleading in this respect, and (perhaps therefore) deeply problematic for most of us to avoid sliding into conceptual confusion.]

Posted by Mark Liberman at 10:19 AM

December 29, 2007

Grape genetics versus ground characteristics

The Christmas double issue of my favorite magazine, The Economist, has enough bad puns in the headlines and subheads for the various features to fill a box of British traditional Christmas crackers. (They always contain a little slip of paper with a bad pun to read out to the assembled Yuletide company. "How does Santa Claus like his pizza? Deep pan, crisp and even!". And everyone groans to show that they know the words to "Good King Wenceslas".) The baddest subhead in the whole issue? A matter of taste; I'm sure "From mutiny to bounty" (over a story about ending staff discord and acquiring new resources at the World Bank) will appeal to some. But my vote goes to the translinguistic punning contents-page subhead for an article about the genetics of wine grapes. There is a dispute among theoretically-inclined wine experts, you see, somewhat akin to the division between nativists and empiricists in the matter of language acquisition. French winemakers tend to think that the special mix of soil and climate found in the particular territory where the vineyard is located — an environmental factor — is the key determinant of wine quality. But some heretics are saying that the innate characteristics of the grapes are overwhelmingly more important. Already they have mapped the genome of pinot noir. The Economist subhead: "The war on terroir." Everyone groan to show that you know French.

Posted by Geoffrey K. Pullum at 04:46 PM

More on Malaysian Censorship

The Malaysian government has not backed down on its attempt to restrict the use of the term Allah to the Muslim god, so the Roman Catholic newspaper The Herald has filed suit against the government. The Sabah Evangelical Church of Borneo has filed its own suit against the government, which has blocked the importation of religious children's books containing the word.

Posted by Bill Poser at 01:52 AM

December 28, 2007

English-Only Insanity on the Campaign Trail

Republican Presidential hopeful Fred Thompson provided a stunning example of the stupidity of the English-only movement during a campaign stop in Iowa a few days ago, noted by Ed Brayton over at Dispatches from the Culture Wars. He assigned partial blame for the mortgage crisis to illegal immigrants, who, he alleges, get themselves into mortgages that they cannot afford because they lack sufficient knowledge of English to understand the terms, saying "A lot of them couldn't communicate with the people they were getting the mortgage from". He cited no evidence for this claim; to my knowledge there isn't any. Indeed, I wonder how many illegal immigrants are in a position to buy a home.

In the same exchange he agreed with a questioner that "it's sickening" that "everything is in Spanish" and said that English should be made the national language. That makes a lot of sense: if immigrants are causing problems through their lack of knowledge of English, make sure that they have even less opportunity to communicate by reducing the use of other languages. I like him better on Law and Order.

Update: Reader Nancy Jane Moore points to this report on how illegal immigrants are often good risks for mortgages.

Posted by Bill Poser at 02:20 PM

The hectic pace of modern life

... and the resistance of some poets to writing hastily for a mass audience, as described in this passage:

The sincerity of these poets and their devotion to art for its own sake is, in a time when so much writing is done hastily to catch the average person's attention, at least a hopeful sign ...

When was this written?  And who are these poets?

The passage is from Compton's Pictured Encyclopedia of 1922, and the poets are Edgar Lee Masters, Vachel Lindsay, Edwin Arlington Robinson, Robert Frost, and Amy Lowell.  It's not clear to me what such a list would look like today, when there's a lot of poetry being written but not a very large audience for it.

Posted by Arnold Zwicky at 01:05 PM

Catherine Tate to bovver America

The American Dialect Society's vote for Word of the Year is fast approaching, bringing an end to the WOTY season here in the States. In the UK, a Word of the Year is selected annually by Susie Dent, word expert on the popular game show Countdown and author of The Language Report. This year her choice was footprint, as in carbon footprint. That's an eco-friendly word with international appeal, but last year's choice was a head-scratcher for most people outside of Britain: bovvered. Now LeftPondians will finally be enlightened about the source of this puzzling word, with the US premiere of "The Catherine Tate Show" on BBC America tonight. In the show's comedy sketches, Tate skillfully inhabits various personae, including the teenager Lauren Cooper, who responds to any inconvenience with her now-famous catchphrase, "Am I bovvered?"

As the New York Times review explains, fully appreciating the show often depends on "a nuanced sense of cultural distinctions, familiarity with the latest British slang, and the ability to understand it when it is uttered rapid-fire in Ms. Tate’s signature singsong style." Still, that shouldn't scare off American viewers, especially those curious about the state of British English today. As an appetizer, here are some video excerpts from the show that are particularly rich from a linguistic perspective.

Though Lauren is an expert at slinging her own idiosyncratic slang, she can get tripped up by a trendy Americanism:

In a reversal of expected conversational roles, Tony Blair turns the tables on Lauren:

Lauren ain't bovvered in French either:

And finally here's another of Tate's regular characters, eager office worker Helen Marsh, showing off her (faux-)translating skills:

Posted by Benjamin Zimmer at 12:52 PM

Una victoria de mierda

Sim Aberson wrote (from Miami) a couple of days ago to report that

There has been a controversy in the Miami Herald opinion page about the decision to print a translation of Hugo Chavez's reaction to losing in his bid to amend the Venezuelan constitution.

The problem is that the paper translated Chávez's Spanish profanities into (roughly) corresponding English profanities.  The story:

Chávez reacts to loss with profanity
President Hugo Chávez harshly dismissed his defeat Sunday as insignificant.
Posted on Thu, Dec. 06, 2007

Special to The Miami Herald

CARACAS -- A defiant President Hugo Chávez Wednesday repeatedly used a harsh expletive to describe his opponents' victory in a crucial vote Sunday, and suggested that if he had not conceded he might even have won.

Chávez's comments came as the president's supporters and foes traded bitter accusations over the vote, which narrowly defeated his proposals for radical changes to the Venezuelan constitution.

... Claims by some in the opposition that the final tally was the result of a deal made under pressure from elements within the military provoked a furious attack on the opposition Wednesday during a joint news conference by Chávez and the high command.

"You should administer your victory properly, but already you are covering it in shit. It's a shitty victory, and our -- call it, defeat -- is one of courage, of valor, of dignity . . . We haven't moved a millimeter and we won't.''

Chávez used the expletive twice more during the conference.

As we've pointed out many times on Language Log, some publications that would generally not print taboo vocabulary from its writers will allow it in quotations when it seems to reveal something about the speaker's character or style.   This is just the bilingual version of that policy.  Chávez's wording is first characterized as "profanity" and "a harsh expletive", and then the mierda hits the fan, as English shit.  ("A shitty victory" is the paper's translation of "una victoria de mierda", which is translated as "a victory of shit" in most English sources.)

The BBC News version is similar:
Last Updated: Thursday, 6 December 2007, 10:56 GMT

Chavez belittles opposition win
 Mr Chavez insisted he would push on with reform plans

Venezuelan President Hugo Chavez has lashed out at his opponents and vowed to pursue plans for constitutional reform despite his referendum defeat.

Speaking on state television, Mr Chavez used offensive language to heap scorn on the opposition's surprise victory.

Mr Chavez also denied reports he had been pressured by the military to accept defeat in Sunday's vote.

Venezuelans voted 51% to 49% against the proposals, which included ending presidential term limits.

When he first acknowledged defeat, Mr Chavez had adopted a calm and measured tone, accepting the outcome as a "decision the people have made".


But on Wednesday, speaking at a televised news conference alongside armed forces chiefs, he decried the opposition's success as "a shit victory".

First, the characterization as "offensive language", then a translation of "una victoria de mierda" as "a shit victory".

No doubt some other publications quoted Chávez as having said "shit".  But most seem to have avoided the taboo word.  Of these, perhaps the oddest version is the Washington Post's, where "mierda" is alluded to as "a four-letter expletive" (not actually quoted in the report, in either Spanish or English):

Chávez Turns Bitter Over His Defeat in Referendum
Foes of Amending Charter Have 'Nothing to Celebrate'

By Juan Forero
Washington Post Foreign Service
Thursday, December 6, 2007

BOGOTA, Colombia, Dec. 5 -- Venezuelan President Hugo Chávez on Wednesday used a four-letter expletive to dismiss the opposition victory in Sunday's referendum and pledged to press forward with plans to approve constitutional changes that would expand his power in one of the world's leading oil producing-countries.

You can catch various recordings of the speech on YouTube, for instance here.  Meanwhile, the New York Times blogsite reported on the Chávez expletive's YouTube career, without (of course) mentioning the word:

December 6, 2007,  4:22 pm

Venezuelan Leader Climbing YouTube Again


The King of Spain achieved momentary Internet stardom when his tirade at President Hugo Chávez hit YouTube. During the clip, Mr. Chavez seemed to hold his fire, but he did anything but that in another one is climbing the video site's charts today with 90,000 views so far.

During a news conference covered by The New York Times today, Mr. Chávez unleashed the same expletive four times when the subject of his recent election loss came up. Based on a translation from Spanish from The Miami Herald, he appears to dismiss the opposition's victory as unworthy and suggests that he regrets conceding -- to put it mildly.

A final note: several commentators have pointed out that Venezuelan Spanish mierda is more offensive than English shit.  Here's what Daniel -- clearly no fan of Chávez -- on his Venezuela News and Views blog has to say (after quoting yet another expression alluding to, but avoiding, shit):

"using a derogatory term for feces" (Herald Tribune)

If there was one thing that Chavez might had achieved Sunday night was to get back the democratic label sticker on his forehead when kicking and screaming he at least recognized that he had lost. Today an ill wind brought back his inner fascist.

The video below shows toward the end the moment when he called the opposition victory of Sunday "mierda", shit, feces, but much harsher. See, in the US you can, for example, forget your presentation at home and when you reach work a loud "shit!" is allowed and understood as you have 45 minutes left to race back home and get your presentation before your clients arrive. It also works pretty much the same way in France with "merde!". But in Venezuela it does not work. "Mierda" is a very strong word ...

Posted by Arnold Zwicky at 12:41 PM

Language in Pakistan

In the recent coverage of Pakistani political strife, I haven't seen much about the troubled linguistic and ethnic situation there. This is probably because the official ideology is to deny its existence. Thus according to Tariq Rahman ("Language and Politics in a Pakistan Province: The Sindhi Language Movement", Asian Survey, 35(11) 1105-1016, 1995):

Pakistan is a multilingual country with a population in 1994 of about 128 million. While multilingualism is not denied -- though the 1981 census contained no question on language -- the state denies the multinationality thesis endorsed by ethnonationalist leaders. The classical form of this thesis, argued by Gankovsky, is that there are four major nationalities in Pakistan: the Punjabi, Sindhi, Pakhtun, and Baluchi (Bangladesh, the former East Pakistan, has been left out here). To this list, the Siraiki was added in the 1960s and an effort to make Muhajir a nationality began in the 1980s. The official point of view is that there is one Pakistani nation united by the bonds of Islam and the national language, Urdu.

But Urdu is the native language of only about 7% of the country's population. Its imposition has caused "language riots" since the country's foundation, and language-focused unrest was at the center of the struggle that resulted in the former East Pakistan breaking away as Bangladesh in 1971. Other struggles about language policy have continued since that time, especially with respect to the role of Sindhi, the language of the region where Benazir Bhutto's Pakistan Peoples Party has its central power base.

Tariq Rahman ("Language policy, multilingualism and language vitality in Pakistan", Trends in Linguistics, 175:7 3-106, 2006) gives this language distribution in Pakistan as of the 2001 census:

Language Percentage of speakers

As Alyssa Ayres points out ("The Politics of Language Policy in Pakistan", in Brown and Ganguly, eds., Fighting Words: Language Policy and Ethnic Relations in Asia, MIT Press):

Conflicts over language identity are not merely about language: They are intertwined with struggles over power and access to it. The vast majority of Pakistan's rulers and policymakers have been Punjabi and mohajirs (settlers), while the military has been ruled by a Punjabi-Mohajir-Pathan nexus.

This leaves out about a third of the population in the remaining western half of the country, with the inhabitants of Sind at the head of the list. Ayres again:

The story of Pakistan's Sindhi language movement (and language riots) parallels that of the Bengali language movement from partition in 1947 through Benazir Bhutto's first regime (1988-90). During Bhutto's first term in office, tensions between Karachi's numerous ethnic groups exploded. ...

Sindhi, like Bengali, enjoyed regional hegemony throughout the time of the British Raj. It has long had a literature and a widespread presence both colloquially and administratively. ... Sind had been a separate province during the Raj. This was due in part to the Sindhi language movement of the 1930s, which had resulted in Sind separating from the Bombay presidency in 1936. This institutionalization of a Sindhi ethnic identity linked directly to language was therefore in place even before partition. Partition would trigger Sindhi ethnic mobilization for two reasons: cultural insensitivity and economic subjugation.

Partition brought massive demographic changes to the subcontinent. Karachi in particular saw an enormous influx of migrants from Uttar Pradesh -- the Urdu-speaking Mohajirs -- as well as from Punjab, Baluchistan, and the NWFP. At the same time, Hindus, who had comprised 64 percent of the population of Sind prior to partition, fled to India. Homes and possessions left behind in Karachi and other urban centers (as well as agrarian lands in the Sindhi interior) were claimed by Mohajirs. The results were striking: In Karachi, Mohajirs comprised 57.55 percent of the population in 1951; in Hyderabad, 66.08 percent. These cities were effectively cleaved in half and then populated by strangers.

When Pakistan came into being, Sindhis, like Bengalis, were surprised to find their language would be subservient to Urdu in the national order. This lower status offended Sindhi cultural pride. The problem was exacerbated by the inherent advantage afforded the newcomer Mohajirs, whose mother tongue was the national language; this gave Mohajirs a considerable advantage in seeking government employment.

A crucial event early in Pakistan's history helped to precipitate Sindhi-Urdu tensions: On July 23, 1948, the provincial government of Sind offered the city of Karachi to the federal government for use as the new capital of Pakistan. The federal government, headed by Jinnah, accepted the offer and then decided to reconstitute the city as a federal territory. When M.A. Khuhro, then chief minister of Sind, objected, he was dismissed by Jinnah on grounds of being both a poor administrator and a corrupt government official. Karachi thus became a federal territory with a heavy Urdu presence. Most important, however, the economic and cultural capital of Sind was perceived as having been hijacked by the Pakistani state. From the Sindhi point of view, these developments created a painful inequality: To obtain government jobs, Sindhis would have to learn a "foreign" language. At the same time, the newly arrived "foreigners" (i.e., Mohajirs) did not have to learn Sindhi to go about their daily lives in urban Sind, where most of them lived. There was no compelling reason for Mohajirs to integrate with Sindhis -- a situation that struck the latter as highly discriminatory.

There's a lot more to the story, but this should be enough to give you the general idea. And it's not only in Sind that language policy is a social and political issue, according to Rahman (see the original text for references):

[T]he privileging of Urdu by the state has created ethnic opposition to it. However, as people learn languages for pragmatic reasons, they are giving less importance to their heritage languages and are learning Urdu. This phenomenon, sometimes called ‘voluntary shift’, is not really ‘voluntary’ as the case of the native Hawaiians, narrated by Daniel Nettle and Suzanne Romaine, illustrates. What happens is that market conditions are such that one’s language becomes a deficit in relation to what Pierre Bourdieu, the French sociologist, would call ‘cultural capital’. Instead of being an asset it becomes a liability. It prevents one from rising in society. In short, it is ghettoizing. Then, people become ashamed of their language as the Punjabis, otherwise a powerful majority in Pakistan, are observed to be by the present author and others ... Or, even if language movements and ethnic pride does not make them ashamed of their languages, they do not want to teach the language to their children because they think that would be overburdening the children with far too many languages. For instance, Sahibzada Abdul Qayyum Khan reported in 1932 that the Pashtuns wanted their children to be instructed in Urdu rather than Pashto. And even this year (2003), the MMA government has chosen Urdu, not Pashto, as the language of the domains of power, including education, in the N.W.F.P. The same phenomenon was noticed in Baluchistan. Balochi, Brahvi and Pashto were introduced as the compulsory medium of instruction in government schools in 1990. Language activists enthusiastically prepared instructional material but on 8 November 1992, these languages were made optional and parents switched back to Urdu. Such decisions amount to endangering the survival of minor languages and they devalue even major ones but they are precisely the kind of policies that have created what is often called ‘Urdu imperialism’ in Pakistan.

In short, the state’s use of Urdu as a symbol of national integration has had two consequences. First, it has made Urdu the obvious force to be resisted by ethnic groups. This resistance makes them strengthen their languages by corpus planning (writing books, dictionaries, grammars, orthographies etc) and acquisition planning (teaching the languages, using them in the media, pressurizing the state to use them ...). Secondly, it has jeopardized additive multilingualism as recommended by UNESCO ... As Urdu spreads through schooling, media and urbanization, pragmatic pressures make the other Pakistani languages retreat. In short, the consequence of privileging Urdu strengthens ethnicity while, at the same time and paradoxically, threatens linguistic and cultural diversity in the country.

Perhaps the most important result of resistance to Urdu is increase the importance of English, as discussed by Ehsan Masood ("Urdu's last stand", 9/1/2007). Benazir Bhutto's Urdu was notoriously poor ("Benazir's poor Urdu inspires many a joke", ExpressIndia, 12/4/2007), and Pervez Musharraf's Urdu is apparently also often criticized, though Urdu is his native tongue.

[After writing this post, I saw that Tristan Mabry has an excellent Op-Ed in this morning's Philadelphia Inquirer, "In divided Pakistan, not all are mourning Bhutto". Tristan (a friend from his time as a grad student at Penn) knows these issues well. The quality of the information in his article is a striking contrast to the (complete?) omission of these matters from the rest of the media coverage of the situation in Pakistan, both before and after Bhutto's assassination. It would be nice to see his expertise published in more prominent newspapers, and mixed in with the usual talking heads on the national broadcast media.]

Posted by Mark Liberman at 08:12 AM

December 27, 2007

The unkindness of strangers

Several readers have taken me to task for being too hard on the journalists who filed serious stories about spoof articles from the British Medical Journal's Christmas issue ("'Tis the season", 12/21/2007). For example, Ray Girvan, a frequent correspondent and long-time foe of bad science journalism, wrote:

Re the unicycler, it's by no means clear whether it's a spoof or not.

The Christmas BMJ does include many, but it's not solely "a spoof issue". As the editorial for Christmas 2000 says: "The essence of the Christmas BMJ is strangeness. It's our left brain issue. We want everything to be not as it seems".

This includes some spoofs, but also many genuine but quirky topics such as last year's "Sword swallowing and its side effects", the 2003 "How long did their hearts go on?" (an actuarial study to see if the lives of Titanic survivors were shortened by the trauma); or in 2000, "Civilisation and the colon: constipation as the "disease of diseases" (an erudite history of constipation and its treatment).

In this case, Ray is off the mark. The unicycler story was not "genuine but quirky", it was "quirky and presposterous". Let me explain.

The "unicycler", in case you've forgotten, was Sam Shuster, a retired dermatologist who took up riding a unicycle around Newcastle upon Tyne. Finding himself the object of unkind comments (and sometimes more direct interventions) from strangers, he "realised that this indicated an underlying biological phenomenon and set about its study".

True enough, this is probably not the kind of spoof article that is a complete fiction from beginning to end. Let's grant that Dr. Sam Shuster exists, that he did ride a unicycle in Newcastle upon Tyne, that he did record 400 instances of comments from strangers. Let's even grant that the paper ("Sex, aggression, and humour: responses to unicycling", BMJ 335:1320-1322, 22 December 2007) was not meant (entirely) as a parody of articles about the biological determination of social facts, but rather represents Shuster's sincere belief about the causes of his experiences.

Why is this article still essentially a joke, which should never have been presented in the mass media as a serious contribution to the biology of humor? Well, let's say that it would never have been published in a serious scientific journal, if not as a joke; and that its key conclusions are entirely unconnected to its evidence, other than via the logic of jokes.

There are several superficial things that should have signaled to any intelligent reader that something odd is going on. There's the author's apparent lack of expertise -- he's a retired dermatologist writing about a linguistic topic at the intersection of anthropology, sociology, social psychology and physiology. There's the lack of serious footnotes or references -- there are three references, one to Darwin's Descent of Man and two to recent popular-science books on sexual selection. There's the personal and informal presentation ("My wife said 'buy the bloody thing', which I did on the whim of the moment").

None of this necessarily undermines his work, or seriously influences my own evaluation of it -- there's plenty of nonsense produced by credentialed experts, in the standard impersonal style of scientific rhetoric, with a conventional battery of footnotes and references. But you don't normally get past the editors of major scientific journals unless these superficial characteristics are in order -- except as a joke.

A slightly more important set of problems was raised by Dr. Shuster himself. He points out that "because responses were not from a consecutive cohort or sample, the findings are only semi-quantitative", and that "confirmation by another rider, with a different style, appearance, dress, age, sex, and location is desirable".

These problems don't entirely invalidate the research itself, but outside of the Christmas spoof context, they would surely have led the referees and editors to force Dr. Shuster to be more precise in describing its limitations. They might even have made him run a control or two. But the real problem, the thing that could never have gotten past an editor who was not looking mainly for light-hearted quirkiness, is the generality of the conclusions that Dr. Shuster wants us to draw.

Let's review: An elderly upper-middle-class man rides a unicycle through streets frequented mainly by people from lower socio-economic strata, in a culture with a long tradition of class antagonism. The strangers he meets respond with ridicule and even some "attempts to topple the unicycle", as well as expressions of praise and concern. The physical attacks are generally from adolescent males, while ridicule is most salient from adult but not elderly males.

His diary confirms his initital impression, which is that "almost 50% of those encountered ... responded verbally"; that 95% of the responses from post-pubescent females "praised, encouraged, or showed concern", vs. only 25% of the responses from post-pubescent males. These are essentially the only numbers in the articles, whose observations are mainly qualitative.

His first-level conclusion is that the "waxing and waning of the male response" suggests sexual competitiveness due to "androgen induced virility". In other words, the male denizens of Newscastle upon Tyne are concerned to prevent Dr. Shuster from riding in on his unicycle and impregnating their women. The youth express this concern by swooping at him with bicycles and kicking soccer balls at him; more mature men do it by making stupid jokes at his expense ("Couldn't you afford the other wheel?").

This all may very well be true, though other hypotheses do come to mind -- for example, in some cultures, rough teasing is a way to make friends. And, as Dr. Shuster points out, the study was not very well controlled. But let's stipulate that he's right about all of this, and go on, because we've reached the real crux of the problem: what in the world does any of this have to do with humor?

On the face of it, his observations are about responses to a stranger doing unexpected things in a public place. It's one particular stranger (Dr. Shuster), one particular kind of odd behavior (an elderly man riding a unicycle), and one particular public place (the streets of Newscastle upon Tyne). Adolescent boys sometimes respond with what might be called aggressive threat displays, like kicking balls at him; mature men often (about .75*.45 = 33% of the time) make teasing remarks, like "I didn't know the circus was in town" or "And what were your other birthday presents?".

But Dr. Shuster generalizes beyond the public teasing of oddly-behaving strangers in British cities: "These findings suggest that humour develops from aggression in response to male hormones". A logically parallel argument would be "I've observed on many occasions that drinking red wine leads gives me a headache. These findings suggest that all illnesses develop from the physiological response to metabolites of ethanol." Shuster's conclusion is frankly presposterous, and surely would never have been allowed to stand in an article published in a serious scientific journal, other than as a joke.

Now, this is not just a matter of faulty logic -- Dr. Shuster's generalization is obviously false to fact, as any man or woman with a normal experience of life can testify. Women make jokes of all sorts, nice ones and mean ones and neutral ones, at roughly the same overall rate as men. Even in the line of teasing comments that Dr. Shuster focused on, there's a stereotypically female kind of remark described as "catty".

But we don't need to look out the window, because there's a fairly large experimental literature on the subject.

For example, there's DT Robinson & L Smith-Lovin, "Getting A Laugh: Gender, Status, and Humor in Task Discussions", Social Forces, 80(1) 2001, which used "event history techniques to analyze humor attempts and successes in six-person groups", collected "in the early 1980s at the University of South Carolina". There were "approximately four groups in each of seven gender-compositions conditions (all female, one to six men)", in which "the participants were Anglo undergraduates between the ages of 17 and 25 enrolled in introductory sociology". Among the conclusions: "Groups consisting entirely of women have a significantly higher rate of humor [than mixed-sex groups]... Compared to mixed gender groups, all male groups did not joke more frequently ... Likewise, the rates of successful humor [i.e. where others laughed] are higher among all female groups, but not all male groups". However, within mixed-sex groups, "men engage in humor at higher rates than women".

Another relevant study was done in New Zealand by Jennifer Hay. Though there were a number of later publications in refereed journals (e.g. "Functions of Humor in the Conversations of Men and Women", J. of Pragmatics 32 (6) 2000, pg 709-742), an especially accessible source is her 1995 University of Canturbury MA thesis "Gender and humour: beyond a joke". Her data came from 18 hour-long recordings: six involving four males, six involving four females, and six involving two participants of each sex. (Hay took recordings from existing collections where she could, and recorded others to fill in the gaps in the table. She says that "There will no doubt be variation in the intimacy of the speakers across my tapes, but all of the conversations can be described as discourse between good friends.")

Hay divided jokes into 12 categories, one of which was "insult":

An insult is a remark that puts someone down, or ascribes a negative characteristic to them. ... The insult here is likely to be genuine, and the humour stems from the unexpectedness of the statements, which isn most circumstances would be unacceptable.

Here's her transcription of one example:

 DF: i usually just um turn off the electric blanket
 BF: yeah well i did
 CF: i don't i roll over alex onto the cold side
(//so [ho]\
AF: /[oh ha]\\ DF: well chris that //just shows that\ you're a=
BF: /good on you\\
DF: =wanton //woman\ [insult]

This category would include the teasing jokes make about Dr. Shuster on his unicycle. Hay observes that

The odds of a speaker using a jocular insult in single sex interaction are more than twice the odds than in mixed interaction, and in both types of interaction, women are slightly more likely than men to use insults.

Specifically, in mixed-sex groups, a female's probability of producing an insulting joke was about a third greater than a male's; in same-sex groups, females were 50% more likely to produce insulting jokes than males were.

There are plenty of other studies out there. The details vary: the locations, ages, ethnicities and social roles of the participants; whether they were friends or strangers; the size of the groups; the goals if any of the conversations; and so on. There are plenty of gender differences, and there are sometimes circumstances like Dr. Shuster's, where jokes are more likely to come from men than from women. But as often as not, women make more jokes than men do, including insulting or aggressive jokes.

Now take a look at Shuster's paper again -- or more to the point, take a look at how it was presented in the mass media: "Humour 'comes from testosterone'", BBC News; "Scientist claims men are funnier than women", The Telegraph; "Aggression 'makes men more humorous than women'", The Independent; "Humor Develops From Aggression Caused By Male Hormones, Professor Says", ScienceDaily; "Is humor tied to male aggression?", World Science; "Quand les testicule régissent l'humour", TF1; "Das Geheimnis des männlichen Witzes", Spiegel; "Humor entsteht aus Aggression", Die Welt; "L'umorismo è maschile perchè legato a testosterone", AGI; "'L'umorismo è maschio', sostiene una ricerca", ANSA; "El humor 'está en la testosterona'", BBC Mundo; "Bewijs: mannen hebben inderdaad meer humor", Elsevier; "Testosteron ger män aggressiv humor", Svenska Dagbladet; "Estudo diz que senso de humor está ligado à testosterona", BBC Brasil; "Smysl pro humor je prý výhradně mužská záležitost", České noviny.

Can you tell me with a straight face that you think this was a serious science story, presented in a responsible way by intelligent reporters and editors?

[An irrelevant note on Ray's quotation from the BMJ Christmas issue for 2000: "The essence of the Christmas BMJ is strangeness. It's our left brain issue." The editors certainly pegged the quirkometer on that one, unless "left" is just a slip of the pen for "right", perhaps due to an overdose of eggnog.]

[Update -- Carl Zimmer writes:

This may be more information than you want about the unicycle story, but I thought I'd pass it on. is a press release service for science stories. It hosts a number of prominent journals, such as PLOS and PNAS and Science and BMJ. Press releases are made available to registered journalists under embargo before papers are published, and then become accessible to the public. BMJ posted a press release on the unicycle paper before it came out, and you can still find it there:

As you can see, it's a fairly detailed, straight-faced press release. I'd be surprised that a Christmas joke would extend as far as an embargoed press release, which is designed to draw attention to research in a journal.

If it wasn't a joke, which is what I suspect, then this appears to be a case of reporters uncritically writing about a paper that has the imprimatur of a peer-reviewed scientific journal. If we're apportioning blame, science writers deserve some for their laziness, but so do science journals that accept deeply flawed research and eagerly draw attention to it with press releases.

When the world's press features a really bad piece of scientific work, or a really bad misrepresentation of good work, the symbiosis of flacks and hacks is usually part of the background, as I've often observed.

But there's something else as well going on in this case.

I can see two possibilities.

One is that the PR folk at BMJ themselves missed the joke, and put out a straight-faced press release on automatic pilot.

Another is that the press release itself was another example of "lighthearted" (if deadpan) Christmas cheer.

But I can't believe that BMJ accepted and printed that paper as the result of their normal review process. ]

[Update #2 -- another Zimmer heard from, Benjamin this time:

See this piece for more on Sam Shuster of unicycling fame:

"It seems likely that Shuster is a wind-up merchant. Maybe he just enjoys getting into the papers by publishing wacky stuff in journals that ought to know better."

See discussion therein on his paper about the psychological effects of Karl Marx's boils and carbuncles. (At least that touches on Shuster's area of expertise, dermatology.)


Posted by Mark Liberman at 09:27 AM

December 26, 2007

Blame it on Kipling

Faulty Diction, featured in Arnold's post on "Be that as it will", is a hoot. Here's the entry for blame on:

It's true that the OED's earliest citation for blame (something) on (someone) is only dated 1903 -- and what's more, it's from the pen of that notorious radical Rudyard Kipling:

1903 KIPLING Five Nations 22 We will blame it on the deep. 1910 —— Rewards & Fairies 175 If you can keep your head when all about you Are losing theirs and blaming it on you.

Earlier examples are easy to find, though their contexts do seem somewhat informal. The earliest example of "blame it on ..." in the New York Times archive seems to be  in a quotation from a baseball player -- "Captain" Anson, player-manager of the Chicago White Stockings -- in a story from August 7, 1885 that starts like this:

and ends this way:

As the Chicagos left the field they presented a pitiful spectacle. They strolled along dragging their bats behind them, and looked like mourners returning from a funeral. Capt. Anson was the saddest man in the party. Visions of the championship began to fade from his gaze, and as he journeyed toward his carriage he was evidently in an unhappy mood.

"Tough luck," Abner Dalrymple ventured to remark.

"Yes," was the answer.

"Those fellows played good ball," interpolated right-fielder Kelly.

"But they had the umpire with them," said the copulent Williamson.

"Let's blame it on the umpire," was Anson's rejoinder.

'That's so," echoed his colleagues.

They seemed to derive a little consolation from this, and all agreed that the umpire lost the game.

A search in APS yields an article from the Saturday Evening Post of June 2, 1849, "Letters to Country Girls", by Mrs. Swisshelm, which starts in a distinctly informal style

Well positively, girls, you are too much trouble. I do believe girls were created on purpose to annoy susceptible young gentlemen and nervous old ladies! Here, Meg has gone off a visiting, and left me to keep house a whole week, with nobody but Mary for help; she would rather make a "play house" than wash dishes. Well, I have been very industrious, and yesterday was office day! When I went there, what was waiting for me amongst the rest of my troubles but the longest, sauciest letter "from a Country girl!" It is too bad; and if the world continues getting worse at this rate, the people must all move out of it. - Just to think the way girls will talk to old folks! It was not so in my young days!

After some further ritual recitation from the liturgy of Young People Today, Mrs. Swisshelm takes up her main theme, which is that poor health, bad complexions and premature aging are all caused by eating too much ("The Irish famine never killed half as many people as the American surplus has done"). In the middle of this, she deploys a nice example of "blame it on":

Did you never observe the brilliant complexions, beautiful teeth and full forms of Irishwomen when they first come to this country? But they are not long here until they look like the rest of us, and people blame it on the climate; but it is not half so much that as the diet.

Whatever the historical details turn out to have been, the blame it on battle has been over for a long time: Mrs. Swisshelm and Rudyard Kipling won, and Faulty Diction lost. In fact, this is one of those usage battles that has been so thoroughly lost that few people today are even aware that there was ever a battle at all. Even in 1915, I suspect that the objection to blame on was an isolated curmudgeonly prejudice, rather than a spirited defense of formal norms against the colloquial hordes.

Prescriptive strictures against the use of specific complement structures with specific verbs often seem to be of this especially pathetic kind, where the general educated public is not so much disobedient as simply unaware. Thus Edwin Newman, Strictly Speaking, 1974: "You may convince that. You may convince of. You may not convince to.".

Nor, when it comes to verbal complements, can you convince not to. But they never learn.

[Arnold Zwicky writes:

OED has

1835 Fraser's Mag. XI. 617, I call this bad management, and I blame it upon you.

which MWDEU takes to be the first attestation of "blame on".

Clearly there were many attestations before the 1903 Kipling, since (according to MWDEU) by 1881 Ayres was ranting that "blame on" "is a gross vulgarism, which we sometimes hear from persons of considerable culture".

I hope to write a note on the topic -- as far as I can tell, the antipathy towards "blame on" was entirely a consequence of its being a 19th-century innovation. (but Ayres's opinion was echoed by a pile of usage writers from 1906 through 1983 (MWDEU lists 16 of these).

I left out the 1835 citation, since it was "blame it upon" rather than "blame it on". But I should have checked MWDEU, which as usual has an insightful and informative entry, observing that "Ayres does not stop to explain why it is a vulgarism or how such cultured persons are capable of using such a vulgarism". (Ayres' 1881 work was The Verbalist, "A Manual Devoted To Brief Discussion Of The Right And The Wrong Use of Words, And To Some Other Matters of Interest To Those Who Would Speak And Write With Propriety").

Ayres' perspective in this case is an inspired one, precisely by virtue of lacking any pretense of logical or empirical support. There are no grounds for refutation -- grammatical logic is irrelevant, and if many excellent writers often use the construction, that simply shows that their culture, although perhaps "considerable", is in fact inadequate.

Still, it's shocking to find that OUP and Bryan Garner have retained, in the 2000 edition of The Oxford Dictionary of American Usage and Style, this entry for blame:

In the best usage, one blames a person; one does not, properly, blame a thing on a person. E.g.: "I blame the fires on him." (Read: I blame him for the fires .)

Bryan Garner is not a snob, but "in the best usage" is an empty gesture in the direction of elite culture -- just as irrefutably empty as Ayres' phrase "a gross vulgarism, which we sometimes hear from persons of considerable culture".

On this view, Robert Louis Stevenson demonstrated his imperfect cultural background when he wrote in his Vailima Letters, published in 1896: "I, for one, blame it on Madam Saumai-afe without hesitation". The Wall Street Journal revealed its essential underlying vulgarity when in 1909, it used the following, apparently with editorial approval, as a filler: "President Roosevelt has two convenient formulas for dealing with trouble. One is to blame it on Loeb and the other is to send Taft to straighten it out."

No one ever invited I.F. Stone to a debutante ball, and so it's perhaps no surprise to that Stone's 1937 The Court Disposes had a chapter entitled "Can we blame it on the fathers?" It may be slightly more surprising, to some, that in 1937 the New Yorker's Talk of the Town headlined a piece on the World's Fair "Blame it on Jacqueline" -- but Harold Ross was "the son of an Irish immigrant and a schoolteacher", and sooner or later, class will tell.

In a 2001 U.S. Supreme Court opinion, Justice Antonin Scalia framed a sentiment not without relevance to the present case:

Deferring to our colleagues' own error is bad enough; but enshrining the error that we ourselves have improvidently suggested and blaming it on the near-unanimous judgment of our colleagues would surely be unworthy.

But Justice Scalia fell short of the "best usage", and should instead have expressed himself like this:

Deferring to our colleagues' own error is bad enough; but enshrining the error that we ourselves have improvidently suggested and blaming the near-unanimous judgment of our colleagues for it would surely be unworthy.

John Updike's short story "My father's tears", published in the New Yorker in 2006, starts this way:

Come to think of it, I saw my father cry only once. It was at the Alton train station, back when the trains still ran. I was on my way to Philadelphia to catch the train that would return me to Boston and college. I was eager to go, for already my home and my parents had become somewhat unreal to me, and college, with its courses and the hopes for my future they inspired and the girlfriend I had acquired in my sophomore year, had become more real every semester; it shocked me—threw me off track, as it were—to see that my father's eyes, as he shook my hand goodbye, glittered with tears.

I blamed it on our shaking hands: for eighteen years, we had never had occasion for this ritual, this manly contact, and we had groped our way into it only in the past few years.

Updike's fans will be disappointed to learn that his usage is not "the best" -- Shillington, PA was apparently not entirely corrected by Harvard -- so that the last sentence ought to be rewritten as:

I blamed our shaking hands for it: for eighteen years, we had never had occasion for this ritual, this manly contact, and we had groped our way into it only in the past few years.

Abandoning irony for a moment, let's observe that both Scalia and Updike's sentences were better as they were written, not as Garner would have them re-written to conform to "the best usage". But I doubt that there is any constellation of facts or opinions that could prevent Ayres' little off-hand expression of prejudice from echoing down the centuries in the unconsidered repetitions of his cultural copyists.

From a functional perspective, this is an ideal sort of prescriptive norm. Since it's an entirely artificial policy, with no basis in the past century of speaking and writing, there's no way to learn it simply from attending to even the "best" speakers and writers. The only possible source is works on usage. This adds essential value to such works, by giving those who read them carefully and credulously a reason to feel superior to everyone else. ]

Posted by Mark Liberman at 08:56 AM

December 25, 2007

Christmas and "politically correct(ed)ness"

As in past years, this holiday season has featured numerous gripes about the "politically correct" avoidance of the word Christmas. I noticed an interesting formulation of the complaint in a New York Post review of the television special "Elmo's Christmas Countdown":

And contrary to popular politically correctness, this Christmas special actually mentions "Christmas." (NY Post, Dec. 22, 2007)

Here are a few other recent Yuletide complaints using similar wording:

Unfortunately Christmas is not only in danger of falling victim to the politically correctness zealots, but it has been hijacked by commercial interests. (Northern Life, Sudbury, Ontario, column by Lionel Rudd, Nov. 30, 2007)

"Wouldn't it be nice if we could avoid placing Politically Correctness, including sizes on people other than ourselves. Perhaps setting an example is better than pointing out changes we would rather see in others. Keep Santa Fat, I'm fat and I'm ok with that too." -Tom, Northview (KSPR News, Springfield, MO, "Talk Backs: Slim Santa," Dec. 13, 2007)

I am disappointed with all the people who are trying to do away with Christmas. I am not talking about the religious aspects or the politically correctness or incorrectness. I am talking about the very word Christmas that brings to hearts and minds magical moments. (Evening Sun, Hanover, PA, letter to the editor, Dec. 13, 2007)

"Politically correctness" (as opposed to the more common "political correctness") treats the adverb-adjective combination "politically correct" as a single unit, tacking on the -ness suffix despite the resulting grammatical peculiarity. A Google search reports nearly 5,000 examples, and Google Book Search turns up more than 50 attestations in print. While searching I also noticed another variant, "politically correctedness" (~250 Googlehits). Here are a few in the usual Christmas context:

and yes. I said Christmas. Not Happy Holidays. Not politically correctedness here. Christmas. (link)

Reading the local paper this morning I see that some people in Medway were upset with the school system for succumbing to the PC, politically correctedness, that seems to have warped the minds of the younger than me generation. When did Christmas become a four-letter word? (link)

happy holidays to everyone!...actually....screw politically correctedness and have a Merry Fu@#ing Christmas Season! (link)

Since "politically correct" (or, for some, "politically corrected") has become such a fixed expression, it's not too surprising to see it nominalized by the simple addition of -ness. The rationale behind this formation is more visually apparent when a hyphen is inserted between "politically correct" and the nominalizing suffix:

Anyway, at one point I got fed up with all the 'politically correct-ness' of the holiday season - you couldn't say 'Merry Christmas' to people, because what if they celebrated Hanukkah? (link)

Most "Politically Correct-ness" is just words being replaced so that the person using them will not hurt a single person who might ever overhear. (link)

I wouldn't consider this a case of an adverb modifying a noun, since the ADV+ADJ combo of "politically correct" is still lurking, despite the nominalization. The modification of a noun with an adverb is of course a grammatical no-no, but in my post "Love, adverbially" I considered some apparent exceptions. For most of those exceptions, the adverb turns out to be a "sentence adverb" modifying an entire sentence or independent clause, even if the target of the modification is obscured by an elliptical construction. (For instance, the movie title Love Actually is actually an elliptical form of a line in the movie, " actually is all around.")

I went hunting for more examples of ADV+ADJ-ness, careful to avoid collocations that merely consist of a sentence adverb preceding a noun with -ness, e.g., "There is clearly nervousness over the issue," or "Eventually sadness sets in." Here are some that I found (with the aid of a corpus that has part-of-speech tagging and regular-expression searching):

Tomorrow is my last day of absolutely madness. (link)

The shouty voice from downstairs dimly nagged at my barely awakeness. (link)

Postpartum depression does exist, but for many it results in minor symptoms such as not wanting to be near your child, constantly sleepiness or having trouble focusing. (link)

And the strange darkly wonderfulness of A Touch of Daniel. (link)

As he says himself, there is a distinctly uniqueness to Cork people. (link)

These large panels of processing figures are spectacular, but they have that curiously flatness of fascimile that is impossible to overcome. (link)

My flirt is a front for the emotionally emptiness I experience. (link)

But then… we are also capable of incredibly kindness, and creativity. (link)

Silly boys with their toys and their overly drunkenness. (link)

I see Hip-Hop finally being awoken from it's roots in the pursuit of Truth, with Mos Def leading in bringing his socially and politically awareness to the forefront. (link)

Some of the above might simply be cases of poor self-editing, where the writer starts with an ADV+ADJ construction and then adds -ness to the adjective without adjusting the adverb by removing -ly. But note that in some examples such editing wouldn't actually work: you can't change "barely awakeness" to "bare awakeness," or "overly drunkenness" to "over drunkenness."

As with "politically correct-ness" the underlying parsing is sometimes signaled by hyphenization:

Requests always come in just about the time I start getting swamped with pre-term-insanely-busy-ness. (link)

A modifier stack-up like "pre-term-insanely-busy-ness" starts looking like the "extended adjectives" that Mark Liberman has recently considered, as in Dolly Parton's "I got those can't stop crying, dishes flying, PMS blues." So we could think of ADV+ADJ-ness (or a longer version with more modifiers) as a nominalization of an extended adjective, with the addition of -ness rather than a full noun like blues. To some observers these constructions surely seem like nothing more than grammatically incorrectness, but to others they might have a certain distinctively stylishness.

Posted by Benjamin Zimmer at 11:48 PM

Be that as it may have been

Today is a day for enjoying presents, and so I was happy to put a few minutes into helping Arnold Zwicky enjoy his 1915 Funk & Wagnalls Faulty Diction. (Unexpectedly, there seems to be no rock band named "Faulty Diction".)

Commenting on that work's proscription of be that as it will -- said to be "Erroneously substituted for be that as it may" -- Arnold observed that both variants seem to be just about equally represented on the web, and that Fielding, Defoe and Locke are among the authors who used the version with will. Arnold speculated that "[i]t might be that the will variant was current in the 18th century but dipped in use by good authors in the 19th".

As my contribution to Arnold's Christmas cheer, here's a graph of the counts of "be that as I will" vs. "be that as it may", in the prose section of the LION index, in 20-year segments from 1700 to 1900:

This certainly tends to confirm Arnold's notion that "be that as it will" declined after 1800; it indicates that "be that as it may" was a mid-19th-century fasion; and it suggests that both versions of the expression were past their sell-by date in 1915.

If you'd like your very own copy of Faulty Diction ("A Brief Statement of the General Principles Determining Correctness in English Speech and Writing, With Their Application to Some of the More Common Instances of Violation and to Some of the Mooted Questions Regarding Usage"), a .pdf is available from Google Books here.

[Based on this case, some might wonder whether this book's questions regarding usage were really mooted in 1915 ("proposed or brought forward for discussion; talked of, discussed, debated."), or moot ("having no practical significance or relevance; abstract, academic").]

[Note: the counts in the graph are incomplete, because pages 3, 6 and 10 of LION's prose-index results for be that as it may turned up an error message rather than a list of citations, so the dates on those pages were left out. Since the results are in alphabetical order, I doubt that this changed the historical facts very much.]

Posted by Mark Liberman at 11:57 AM

One Christmas too long

Happy Christmas from Language Log to all our readers. I'm spending the time with family in southern England, where I was planning to look up an old friend (who had been my very first PhD student, though he was many years my senior): the distinguished scholar of Cariban languages Desmond Derbyshire. I hadn't seen him for a long time. He hadn't been answering email messages, but he always traveled a lot, and I just thought he was away from his Hampshire home, perhaps in Brazil (he still visited sometimes, despite being retired from his work there). But I had waited one Christmas too long. Des died quietly in his sleep in a hospital a few days ago, before I could track him down.

The people who worked with him on language analysis and bible translation in the Summer Institute of Linguistics have sent me many consoling messages about Des having gone to a better place. These didn't actually console me much; but that's not the point, is it? Uttering those words consoled them. I do not share their religion, but I do like them — I think the many SIL members I have met over the years are some of the nicest and kindest human beings I have ever interacted with — and I do share their respect and love for Des.

He was a wonderful human being as well as a superb linguist. And his work was very important — more important than most linguists will ever do in their lives; he documented a whole endangered language that otherwise might have gone into history unrecorded, and worked effectively for the welfare of the Amazonian tribe who speak it. They are now flourishing, as they were not when he first arrived in the 1950s. He translated the entire New Testament for them: they can read the Christmas story in their own language this Christmas time if they wish. If they have heard of his passing, I know they will be thinking about him, as I will, because he was their friend, and often a resident of their village, for over fifty years. I wish I had been able to see him one more time.

[See now the obituary for Derbyshire posted on the LINGUIST List.]

Posted by Geoffrey K. Pullum at 06:24 AM

December 24, 2007

Be that as it will

My friend Steven Levine collects stuff from garage sales, estate sales, and the like, and passes them along to his friends.  My most recent gift  from Steven is a 1915 Funk & Wagnalls booklet (only 80 small pages) Faulty Diction, which I expect to be mining for material for some time.  Here's an entry for a proscription that was new to me (p. 18):

be that as it will.  Erroneously substituted for be that as it may.

Consonant with this judgment is the fact  that both the Cambridge International Dictionary of Idioms (1998) and the Cambridge Dictionary of American Idioms (2003) list be that as it may but not be that as it will (or the other variants be this as it may/will).

Meanwhile, hoi polloi on the web use all four variants with very similar frequencies:

be that as it may: 387,000    be that as it will: 267,000
be this as it may: 160,000    be this as it will: 160,000

These numbers are not gigantic, which is not surprising, since be that as it may is formal in style, and Google web searches turn up a lot of (very) informal writing.  So I suppose it could be claimed that what the web searches show is that ordinary people have an imperfect command of formal idioms.  Against this is the fact that a lot of the be that as it will cites are from thoroughly respectable sources:

"Well," cries Jones, "be that as it will, it shall be your own fault, as I have promised you, if you ever hear any more of this adventure. Behave kindly to the girl, and I will never open my lips concerning the matter to any one."  [Henry Fielding, Tom Jones]

But be that as it will, the world shall, for once, hear what account an Englishman shall give of Scotland, who has had occasion to see most of it, and to make critical enquiries into what he has not seen ...  [Daniel Defoe, Introduction to the Account and Description of Scotland]

But, be that as it will, this is certain, that whoever pursues his own thoughts, will find them sometimes launch out beyond the extent of body, into the infinity of space or expansion ...  [John Locke, An Essay Concerning Human Understanding]

... Be that as it will, in general I dislike very much the Venetian School and prefer the Flemish to it in its own way, that is to say with an exception to the finest works of Titian, of which the Danae at Naples, I think, is the most pleasing picture in Italy and consequently in the world.  [Lord Holland, letter from Florence, 5 August 1794]

Now I see that be that as it might gets 250,000 hits and be this as it might 128,000, with many from reputable sources.

So what's the source of the judgment that there is only one correct variant?  (Plenty of idioms have variant forms, after all: be/lie at the bottom of something, time hangs/lies heavy (on someone's hands), lay/put your cards on the table, have/hold all the cards, etc.)  And that this is the variant with may rather than will or might (not to mention that rather than this)?  (I find all of the variants unremarkable.)  These are serious questions for the F&W booklet, since it claims to be based on "scientific principles", in particular the principle that

Usage to be good should be reputable, that is, it should have the sanction of good authors or (to be the best usage) of the best authors.  (p. 4)

What's more, the offenses in the booklet are supposed to be serious ones:

The faulty expressions treated are comparatively few, since rigid principles of exclusion have been enforced by the limitations of space.  ... The examples given are sufficient to illustrate the various classes of faulty usage that need to be guarded against.  (p. 3)

(It should be obvious that the Faulty Diction people were not proponents of Avoid Passive, or of avoiding "passive style", and indeed the booklet -- published three years before Strunk's Elements of Style, with its famous antipathy towards the passive voice and passive style -- doesn't warn against either.)

Now I very much doubt that in this case (and in many others) the compilers of the booklet actually consulted the practice of good authors.  The judgment looks to me like an expression of personal taste, a quirk even.  Not that there's anything wrong with expressing personal tastes in print -- but this sort of booklet is not the place to display them.

At the moment, I have no idea what the practice of good authors was then or is now, and I'm not prepared to do the necessary searches with the resources available to me and in the time available to me.  It might be that the will variant was current in the 18th century but dipped in use by good authors in the 19th, though for the entry to occur in Faulty Diction it must have been fairly frequent a hundred years ago.   But I'm inclined to think that the compilers of the booklet just pulled the proscription out of the air.

Posted by Arnold Zwicky at 08:48 PM

Another reversal

Following up on my posting about problems with converses and directionality, Chris Lance wrote this morning to ask about substitute 'replace', as in this Bizarro cartoon from a couple of years ago:

I've written about this case at some length on the American Dialect Society mailing list, over several years, but now see that it hasn't turned up on Language Log.  So here's a very brief account of the phenomenon.

For details of the various uses of substitute and their history, I refer you to a paper by David Denison, still in press but available in .pdf format on his website.

According to Denison, the development comes in three stages.  In a nutshell:

1.  standard substitute: substitute NEW for OLD (i.e., substitute a fried chunk of your left buttock for the pork chop) -- vs. replace OLD with/by NEW (i.e., replace the pork chop with/by a fried chunk of your left buttock).  Note: the prepositions are important.

2.  encroached substitute: substitute OLD with/by NEW (i.e., substitute the pork chop with/by a fried chunk of your left buttock).  The verb substitute encroaches on replace by taking on its argument structure (with the result that the order of the two arguments mirrors the sequence of the two denotata in time and their information status, in both cases old before new).  But standard substitute continues in use; the two meanings are usually distinguished by preposition choice -- though to judge from comments in my e-mail, this is a subtlety that escapes many people.  Encroached substitute has been around since the 17th century, and as MWDEU notes, despite having been condemned by many commentators, it's been appearing in standard writing on both sides of the Atlantic for a long time (and has been recognized as a standard variant in Merriam-Webster dictionaries since WNI2 in 1934).

3.  reversed substitute: substitute OLD for NEW (i.e., substitute the pork chop for a fried chunk of your left buttock, as in the Bizarro cartoon).  This one -- a blend of standard and encroached substitute -- is genuinely recent, apparently becoming widespread in the U.K. only about twenty years ago, though now spreading to the U.S., as in the cartoon (the cartoonist Dan Piraro is American).  Denison argues that the vector for its spread in the U.K. was the language of sport, in particular football/soccer: the spread of reversed substitute beginning in the mid-80s follows the institution of tactical substitution in soccer in the 1966-67 season.  As Chris Lance put it in e-mail to me:

If a manager makes a substitution during the course of a game, then the player taken off is said to have been substituted. From there, it's a small step to say that the player taken off has been substituted for the player who replaced him. This usage now seems to have spread to other contexts.

Understandably, many speakers have trouble interpreting reversed substitute, which functions as the converse of the standard verb.  You have to rely on context to figure out which meaning is intended.

Plenty of detail, documentation, and discussion in Denison's paper. 

Posted by Arnold Zwicky at 06:35 PM

mo' juiced

I wouldn't dispute Geoff's assertion that claims about the expressive superiority of French words over their English synonyms are usually pure humbug. Still, there are times when a French word undeniably adds a soupçon of meaning that eludes its English equivalent. As in the banner headline that appeared over the lead story in the New York Times Week in Review in January, 1998, just after the Lewinsky story blew open: SCANDALE! Why do you think they felt they needed a final e on the word?, I asked my friend Rachel. Oh, she said, that's so you can tell it's about sex and not money.

Posted by Geoff Nunberg at 04:24 PM

More on the early days of "eggnog"

Just in time for the holiday season, Heidi Harley wrote here on the discovery of an early citation for eggnog, apparently antedating the first OED cite of 1825 by about fifty years. On the American Dialect Society mailing list, independent scholar Joel S. Berson posted some follow-up findings based on a search of historical databases. Below is an edited version of Joel's post with additional material.

Yes, that must have been a pleasant discovery while reading before the fire, but (as Margot Charlton wrote) the databases are the Scrooges. In the "bah humbug" tone so appropriate to the day, I note that the OED is unlikely to be able to use a c1774 date for Heidi Harley's student's find, since Boucher only reports orally for that date. The OED can use the 1806 date (I think actually 1807, as I'm told by Harvard's on-line catalogue for the Houghton Library copy, and by WorldCat), which is still an antedating of the OED's 1825, but —

E.A.N. Scrooge (more fully known as the Early American Newspapers database) tells me the two earliest occurrences of "egg-nog" (with or without hyphen) locatable in 18th century colonial American papers are:

(1) The Independent Gazetteer (Philadelphia), 1788 Oct. 16, page 3, col. 1 (within an essay whose discourse is somewhat mysterious to me):

Rummaging now the brain, many conceits may be found, much truth of all kinds, whole store rooms of curses and unmentionable damns, with devils of all shapes and colours, thousands of encomiums on oysters, hot suppers, and devilish fine wines; and there are so many different qualities and dispositions that intestine wars are never over; when wine and beer, punch and eggnog meet, instantly ensues a quarrel, and it is raised so high, that the brains boil like mush in a pot with heat, and was it not for the holes I before mentioned, which let out the steam, the skull must be cracked.

(2) A few years later, in the Virginia Chronicle (Norfolk) of 1793 Jan. 26, p. 3, col. 2, it appears seasonably:

Messrs. Baxter & Wilson,
     On last Christmas Eve several gentlemen met at Northampton court-house, and spent the evening in mirth and festivity, when EGG-NOG was the principal Liquor used by the company. After they had indulged pretty freely in this beverage, a gentleman in company offered a bet that not one of the party could write four verses, extempore, which should be rhyme and sense; and when it was taken up by a gentleman present, who wrote the first five
[sic; I count six] verses following; to which the subjoined answer was immediately given. As I think them applicable to the occasion, you will oblige me by inserting them in your next week's paper.
     January 14, 1793.                S———————

The trembling Muse with anxious care to please,
May wish, perhaps, to appear with grace and ease;
But vain, alas! are all the powers of art,
When awful Dullness hangs upon the heart.
Each brisk emotion which the soul receives,
And quickens fancy with the wit it gives,
Must cease to flow, when leaden slumbers bind,
And quell the transports of the glowing mind.
For strength or splendor never yet arose,
From foggy brains which languish'd for a dose.
In pity then permit the strain to end,
In kind compassion to your drowsy friend.
      The ANSWER.
Let Wine, alas! resign its boasted praise
To rouse the Muse, and prompt the Poet's lays,
Since rival worth now boasts superior art,
To infuse the transports of the glowing heart.
'Tis Egg-Nog now whose golden streams dispense
Far richer treasures to the ravish'd sense.
The Muse from Wine derives a transient glare,
But Egg-Nog's draughts afford her solid fare.
The first escapes by exhalation's power,
And leaves the Muse more languid than before.
The latter, firm, remains her steady friend,
Sustains her talk, nor quits her to the end.
On old prescription one relies for fame,
While solid merit props the others claim.

Of course, we have only the submitter's claim, a full month later, that the two poems were actually written on the occasion, and by two separate persons (or is that personae?). Colonial poets and wits, such as Mather Byles, or Honest Ben, were not above dissembling.

In its issue of 1793 April 6, page 3, col. 2, the Virginia Chronicle printed a response "To the Northampton Poets, on their Poetry". It begins:

ILLUSTRIOUS bards! what pen can write your praise

and, comparing them to Homer and Virgil, praises the paper for publishing

                                 each sublime matter,
Who drinks egg-nog, who mixes wine with water,
Who died last, as also who was married,
Whose wench brought forth a son, and whose miscarried.
These are fit subjects for poetic brains.
Great WASHINGTON, his country'd
[sic] pride and boast,
Will never in your poetry be lost.
Even TOMMY PAINE will live to future ages,
And be the Hero of poetic pages.
And be assur'd, of this there is no telling:
Which is the best your diction, or your—spelling."
April 1, 1793.

I wonder about the above reference to George Washington, considering Heidi's discussion of his recipe.

[Guest post by Joel S. Berson]

[Update #1: And if you're curious what went into eggnog at the turn of the 19th century, Joel Berson posted to ADS-L a citation with "perhaps the earliest list of ingredients":

The American travellers, before they pursued their journey, took a hearty draught each, according to custom, of egg-nog, a mixture composed of new milk, eggs, rum, and sugar, beat up together.

Travels through the States of North America, and the Provinces of Upper and Lower Canada, During the years 1795, 1796, and 1797. By Isaac Weld, Junior. Fourth Edition. London: Printed for John Stockdale, Piccadilly. 1800. Page 81. [Google Books, title page viewed. ESTC lists a 1799 edition; Houghton has a copy of that.]

Joel adds, "This is at an inn near Baltimore, on the road to Philadelphia. Perhaps another hint of a Middle Atlantic origin."]

[Update #2: Also on ADS-L, Joanne Despres of Merriam-Webster points out that Mitford Mathews' Dictionary of Americanisms and William Craigie's Dictionary of American English both quote the Boucher passage, dating it ca. 1775. Merriam-Webster uses that date for its eggnog entry, though as Joel Berson notes above it might not pass muster as an OED first citation date.]

[Update #3: Fred Shapiro supplies two more early citations:

1788 New-Jersey Journal 26 Mar. 2 (America's Historical Newspapers)
A young man with a cormerant appetite , voraciously devoured, last week, at Connecticut farms, thirty raw eggs, a glass of egg nog, and another of brandy sling.

1795 Freneau, Philip Morin. Poems written between the years 1768 & 1794 (Eighteenth Century Collections Online) 345
To the sign of the Anchor we then were directed,
Where captain O'Keef a fine turkey dissected;
And Bryan O'Bluster made love to egg-nog.

The 1788 cite would appear to be the earliest known example from a primary source.

[Update #3: Fred Shapiro supplies two more early citations:

1788 New-Jersey Journal 26 Mar. 2 (America's Historical Newspapers)
A young man with a cormerant appetite , voraciously devoured, last week, at Connecticut farms, thirty raw eggs, a glass of egg nog, and another of brandy sling.

1795 Freneau, Philip Morin. Poems written between the years 1768 & 1794 (Eighteenth Century Collections Online) 345
To the sign of the Anchor we then were directed,
Where captain O'Keef a fine turkey dissected;
And Bryan O'Bluster made love to egg-nog.

The March 1788 cite would appear to be the earliest example from a primary source found thus far, just beating out Joel Berson's October 1788 cite above.]

Posted by Benjamin Zimmer at 03:29 PM

Pseudo-intellectual francophilic nonsense

James Kirkup, in his obituary for the French novelist Julien Gracq in today's Independent, refers to Spengler's view of the future (in Decline of the West) as

"a soulless expansionist Caesarism — a vision strikingly realised today in our all-enveloping nationalist, commercial and industrial "mondialisation"

and adds a baffling parenthetical remark about the latter word:

(the French term is so much more expressive than our banal "globalisation").

What could possibly be the justification for this supposed contrast?

French mond-ial-is-at-ion and English glob-al-iz-at-ion are not just synonymous but morpheme-by-morpheme equivalent, and etymologically cognate except for the roots: the French uses Latin mundus "world" and the English uses Latin globus "globe", probably via Old French. We now use "the globe" to refer to the world. But the former word is "so much more expressive", and the latter is "banal"?

I'll tell you what I think (have I ever let you down?): I think this is pseudo-intellectual francophilic nonsense. I think that if anyone in a literary context asserts that some French word has some property like expressiveness or poeticness that its English counterpart does not have, and/or that some English word is not as good as some French word of the same meaning, they are simply assumed to be saying something justifiable and probably true, because nobody dares to call them on it. I say there is absolutely no objective sense of "expressive" under which French mondialisation and English globalization could be determined to be non-equivalent.

I'm not saying you can't find contexts in which one is used and the other is not; indeed, that is true roughly 100 percent of the time, since virtually all the contexts for mondialisation are French contexts. I'm saying the claim of a difference in expressiveness is unsubstantiatable nonsense, designed mainly to cow you into thinking you are not intellectual enough to see the difference. Do not be cowed. You are just fine as you are. You cannot see the difference in expressiveness between these two words, and Language Log is here to give you the comforting news that there isn't one.

Talking of the Independent, I did write them a letter pointing out that their Health Editor had fallen for a hoax. I wrote:

It was good fun to see health editor Jeremy Laurance taken in by one of the British Medical Journal's famous Christmas jokes (22 December, page 19). Did no one on the editorial desk feel the slightest qualm of suspicion about a dermatologist correlating testosterone levels with unicycle jokes? And has Jeremy Laurance been tipped off yet, or is it going to be done at the Independent's Christmas party?

And did they print my letter? Why don't you try to guess.

Posted by Geoffrey K. Pullum at 04:44 AM

December 23, 2007

Proceed with caution

A long-standing topic on the American Dialect Society mailing list is the use of ancestor to mean 'descendant'.   Recently, our attention has expanded to include successor 'predecessor'.  Which reminded me of precede/proceed as used by some students in intro linguistics courses.  And then of problems with the technical terms progressive and regressive assimilation.

Directionality is hell.

I briefly mentioned ancestor/descendant, along with yesterday/tomorrow and subsequent/prior, back in 2005.  And then, last July, I commented on forward-looking, backward-looking, and double-sided lexical items, taking off from temporal since and before in a case where the former seems to have been used for the latter:

Standard English has forward-looking since and backward-looking before, but no double-sided temporal P, one covering both directions.  In a roughly similar fashion, standard English has forward-looking tomorrow and backward-looking yesterday, but no double-sided temporal adverb, meaning 'one day from today'.  Such lexical items aren't unnatural [and double-sided temporal adverbs, meaning 'a day from today', do occur in some languages, with the actual reference determined from context] ... but we'd expect them to be relatively rare, since they're less informative than the more specific items.  Still, a double-sided temporal P would have its uses, allowing speakers to view things from either end of a time span ...

One way to view apparent switches of directionality is to see them as a reflection of a desire for double-sided lexical items.  One item in a pair (A and B) is chosen (on the basis of frequency or salience) to serve for both, giving the effect of a switch, though in fact the chosen item now serves in both its original use and the reversed use: ancestor 'ancestor, descendant', yesterday 'yesterday, tomorrow'.  Typically, most people "switch" in one direction, using A for B, though some people will go in the other direction, using B for A.  So far as I know, you don't find people with a true reversal, A for B AND B for A.

1. Ancestor.  A couple of examples from ADS-L discussions.  From David Bergdahl on 12 June, citing an article on tracing descendants of the Lost Colony in Virgina:

Fred Willard, director of the Lost Colony center, said some colonists may have migrated inland to what are now East Lake, Chocowinity and Gum Neck. Researchers plan to use cheek swabs taken from possible ancestors to test the paternal and maternal DNA lines.

And from Bonnie Taylor-Blake on 24 June, citing a front-page story in the Atlanta Journal Consitution that day:


An unlikely, close-knit bond develops between ancestors of slaves and the ancestors of their slave masters.

I'm assuming that these writers would also use ancestor for actual ancestors.  That is, for them ancestor means 'lineal kin'.

Finally, from Brians's Common Errors under ancestor:

When Albus Dumbledore said that Lord Voldemort was "the last remaining ancestor of Salazar Slytherin," more than one person noted that he had made a serious verbal bumble; and in later printings of Harry Potter and the Chamber of Secrets author J. K. Rowling corrected that to "last remaining descendant." People surprisingly often confuse these two terms with each other. Your great-grandmother is your ancestor; you are her descendant.

2. Successor
.  On 17 December, Charlie Doyle posted:

It's somewhat like the confusion of "ancestor" with "descendant"--

In yesterday's newspaper an AP story by Hillary Rhodes reported that the name "Emma" for new babies, after three years in first place, has been overtaken by "Sophia" and "Isabella":

Sophia has more of a Latin or continental appeal than its proper English successor, Emma  (Athens [GA] Banner-Herald, E12).

Unlike the "ancestor"/"descendant" pair, however, "successor" and "predecessor" do have the same root ...--a circumstance that might contribute to their confusability.

It's not clear that most speakers appreciate that successor and predecessor have the same root, but no doubt they do appreciate the phonological similarity, and phonological similarity is a strong contributor to word confusions.

3. Proceed.  This one's in Brians and in MWDEU, but in short entries that treat the issue as mostly a matter of spelling.  But the situation is more interesting than that.

My experience in intro linguistics classes is that when we're talking about phenomena that have to do with the context in which some element occurs and with the influences of surrounding elements in such contexts, and I use the terms precede (as in "X has the variant Y when it precedes Z" -- i.e., Y for X in the configuration XZ) or is preceded by (as in "X has the variant Y when it is preceded by Z" -- i.e., Y for X in the configuration ZX), some students press the verb proceed into service as the converse of precede.  So: "word-final vowels are voiceless when proceeding a voiceless consonant; vowels are nasalized when proceeded by a nasal consonant."  (I THINK the passive version is more common than the active, but I haven't collected real data.)  The usage appears especially in phonology, but sometimes in morphology and syntax as well.

Now, the converse of precede that I myself use is follow.  The students aren't getting proceed from me (or any other linguistics instructor, I'd wager).  What they're doing, I think, is "fixing" the non-parallelism of precede (Latinate) and follow (Anglo-Saxon) by extending the 'next' sense of Latinate proceed (We proceeded to the ballroom 'We went next to the ballroom', I'll proceed to the conclusion 'I'll go next to the conclusion').  (In my experience it's hopeless to ask these people where they got their use of proceed -- as, in fact, it's usually pointless to ask people where they got ANY usage.  Both the innovation and spread of variants are largely below the level of consciousness of ordinary speakers.)

Some contribution to the development of proceed 'follow' probably comes from the burden of technical terminology that attends learning linguistics.  Some of it -- reduplication, rhotic, morpheme, clitic, anarthrous -- is obviously special to linguistics, but most of it -- onset, liquid, constituent, definite, subject -- is ordinary English vocabulary used in very special ways, and some of it is not really technical terminology, but "terms of art", conventions within the field that prefer certain usages over other, equally available, ones.  Like precede and follow rather than come before and come after.  (Further complexity: in linguistics, precede and follow by default mean not merely 'come before' and 'come after' but 'come immediately before' and 'come immediately after',  This is so in ordinary language as well, but the convention in linguistics is so strong that linguists ordinarily have to mark the non-default case: "if followed anywhere within the word by a voiceless consonant" and the like.)

Not only do my (mostly American) students spell precede and proceed 'follow' differently, they usually pronounce them differently as well: for them, both verbs usually have a weak accent on their first syllable, so that vowel reduction is blocked, with the result that the verbs have something close to tense [i] and something close to tense [o], respectively, in their first syllables.  So, for the most part, my students are not "confusing" words at all; they're differentiating them scrupulously, but in a way that conforms neither to the conventions of ordinary English nor to the conventions of the guild of linguists.

I wish I could say that this was the end of it, but of course Brians and MWDEU wouldn't have the entries they do if it were: there's a complex pattern of proceed/precede "confusion" (in pronunciation or spelling or both) that linguists can't be held responsible for, and that I don't know the details of.

You can get some appreciation of the complexities from John Wells's blog entry of 23 May 2006, where he reports student misunderstandings going in several directions at once.  Note: the distinguished phonetician Wells is British, and a professor at University College London, so his experiences are bound to be a bit different from mine.

Every now and again my students reveal that they are confused about the words precede and proceed. Accounts of phonetic processes and allophonic rules often refer to a preceding consonant, being preceded by a vowel, and so on. In the sequence ABC, B precedes C and is preceded by A.

But students sometimes write this as a 'proceeding' consonant, or being 'proceeded' by a vowel. Worse, since to proceed from can mean to follow, they sometimes interpret my spoken precede, which they imagine to be proceed, as meaning 'follow', so that they also have the meaning the wrong way round.

In theory, the distinction between the [I] of precede and the [ǝ] (weakened from [ǝU]) of proceed ought to be robust. After all, in my own speech and in that of most of my students valid doesn't rhyme with salad nor rabbit with abbot; the initial syllables of finesse and phonetics differ. But in practice, clearly it may not be: weak pre- and pro- can get confused.

And I have just detected one of my favourite authors, Jared Diamond, committing the reverse mistake. In his marvellous book Collapse (Penguin 2006), on page 501 of the UK paperback edition, we read that "LA smog generally [gets] worse as one precedes inland". Oh dear. What a precedent.

4. AssimilationAssimilation is the phenomenon of (at least partial) agreement in phonological features between two segments.  Given XY, X (as target) can pick up features from Y (as trigger), or Y (as target) can pick up features from X (as trigger).  So, in many languages, vowels pick up the feature of nasality from a following nasal consonant (with different details in different languages).

Once again, there's a directionality.  How do we describe the facts?

The conventional terminology in linguistics takes the point of view of the trigger: if the trigger follows the target, it's exerting its influence BACKWARDS onto the target.  So this is called regressive assimilation.  And if the trigger precedes the target, it's exerting its influence FORWARDS onto the target.  So this is called progressive assimilation.

But suppose we look at things from the point of view of the target.  Then if the target precedes the trigger, the target is picking up its properties from what FOLLOWS -- which could reasonably be called progressive (or forward-looking) assimilation.  And if the target follows the trigger, the target is picking up its properties from what PRECEDES -- which could reasonably be called regressive (or backward-looking) assimilation.

In my experience, students have a shitload of trouble in choosing between the two viewpoints.  A lot of them just don't get the regressive/progressive terminology.

Long ago, I moved to using a clearer directional metaphor, borrowed from the analysis of speech errors: anticipation (something changes to fit what is to come) vs. perseveration (something changes to fit what has gone before).  Real-life examples, involving word choice, from the 2005 NWAV meetings:

anticipation: ... with only a few phrases [for switches] to English for words or short phrases.  [Carol Myers-Scotton]

perseveration: ... distinguishing that variation that is universal from that universal that is not... from that variation that is not.  [Ron Horvath]

This terminology doesn't always work, but it works better than regressive/progressive (with its unclarity about where the point of view is located).  On the other hand, anticipatory and perseveratory are six syllables each, while regressive and progressive clock in at only three syllables each.  So relentless devotees of Brevity should prefer regressive/progressive.  Hey, it's a trade.

Posted by Arnold Zwicky at 02:19 PM

Myth is truth (p < .05)

In this week's Tom the Dancing Bug, Ruben Bolling explains "how great journalism is done" (the full strip is here.). He's talking about political journalism, obviously. Science journalism is generally simpler, since the scientific equivalents of political parties are usually too diffuse and too weak to call a publication effectively to account for unfairness. In fact, most of the time there aren't any powerful voices at all to call you to account if you write something wrong or foolish about a scientific topic, just a lot of scientifically-educated readers cursing into their oatmeal.

So you're pretty much free to write what you want, based on a popular book, a lecture you've heard, a press release or another story in the popular press. Your editor will like it if you figure out how to add a local or topical angle. If you're at a high-end publication, you might slot in a comment from an expert source -- though in general, you don't want that source to confuse everyone by casting doubt on the main story line, or at least on your interpretation of it.

In science journalism, there are several other prominent story lines beside "Up is Down". One of them is "Up is Up, Science Shows", where some common-sense truth is demonstrated by means of expensive equipment or clever mathematical analysis. This doesn't always work, but it's pretty spectacular when it does. My current favorite example is one that was featured in Matt Hutson's NYT Year in Ideas piece on "neurorealism":

... a Boston Globe article about how high-fat foods activate reward centers in the brain. The Globe headline: "Fat Really Does Bring Pleasure." Couldn't we have proved that with a slice of pie and a piece of paper with a check box on it?

Matt's (entirely appropriate) reaction is an instance of another common story line, is "That's what I always thought", or more succinctly, "well, duh." Ironically, this one is most often deployed in reaction to claims that are not really true, at least as the "well, duh" articles present them. This happens especially often with respect to stories about sex differences, where the structure of received ideas is so extensive, complex and deeply felt.

This characterized much of the reaction last year to Louann Brizendine's claims about male-female differences (see e.g. "Bible Science stories", 12/2/2006; "David Brooks, neuroendocrinologist", 9/17/2006). The worldwide reaction to the "gender happiness gap", a few months ago, was another nice example (see "Gender-role resentment and Rorschach-blot news reporting", 9/27/2007; "The 'gender happiness gap': statistical, practical and rhetorical significance", 10/4/2007).

The current world-wide blossoming of articles about testosterone and humor, based on an article in the BMJ's Christmas spoof issue, is also mostly of this kind. Aside from the small problem that the original article was (literally) a joke, it was a great opportunity for the world's journalists, because of the complex structure of biological differences and local cultural norms for them to play with. You can see this at work in the some of the local adaptations around the world, e.g. "Men make more gags than women", Sify, 12/22/2007

Can you fathom why do we have so many male comedians in Bollywood? Well, scientists have the answer to it.

According to a recent study, male hormones such as testosterone is the reason behind a good sense of humour.

These male hormones fuel aggression which, in turn, develop humour, said the study conducted by a former consultant dermatologist Sam Shuster.

The author also offers a (rather inaccurate) neurorealist interpretation of the same study that set Christopher Hitchens off last year (see the links in "Flacks and hacks and Hitchens", 12/14/2006, for details):

About two years ago, researchers at Stanford University claimed on the basis of studies of brain patterns that a gender divide do exist while it comes to appreciating humour. Women generally place a greater emphasis on the language, and use a more analytical approach.

The first commenter, Santosh, gets right to the point:

Men always will remain one step ahead with respect to women,be it in any field of life.Women should better accept this fact.

About a century ago, Rudyard Kipling offered an explanation based on evolutionary psychology for the idea that women are humor-deficient:

She who faces Death by torture for each life beneath her breast
May not deal in doubt or pity—must not swerve for fact or jest.
These be purely male diversions—not in these her honour dwells—
She the Other Law we live by, is that Law and nothing else.

Kipling's verse version of this theory is out of fashion, since it was deployed in opposition to the idea of women's suffrage, now a settled issue in western countries. But it stands in a long line of more-or-less serious attempts to understand the biological and social nature of sex differences, in humor as well as in other areas. Some of the attempts have been transparent political rationalizations, like Kipling's; others have been decked out in more scientific attire, like Brizendine's, despite being equally without empirical support; others have been somewhere between spoof science (like Shuster's paper) and plain old bad science (like the recent claims about sex differences in the American electorate's reponse to the current presidential candidates, based on an fMRI study of 20 subjects recruited at Stanford: see "Flacks and hacks and brainscans", 11/23/2007.)

But there's also quite a bit of good, solid science that bears on the questions raised by Kipling, Hitchens and Shuster. Some of the results are interestingly non-obvious -- though the set of received ideas in this area is so extensive and so internally contradictory that almost any result will make someone react the way Hitchens did to his (mis) interpretation of an fMRI study of sex differences in the appreciation of cartoons:

Slower to get it, more pleased when they do, and swift to locate the unfunny—for this we need the Stanford University School of Medicine?

This post is already far too long, but some other time, I'll review some of the good work on jokes, laughter and sex.

Posted by Mark Liberman at 10:31 AM

December 22, 2007

Antedated eggnog

Like Arnold Zwicky, I'm not the kind of linguist who is heavy into antedating and sourcing, but a student and I just recently verified our very own antedation, of eggnog, and it's so seasonal I thought I should share it.

The OED's oldest quote for eggnog is from 1825:

Sara Kraft, an Arizona undergraduate taking a independent study course with me this fall, was reporting to me on a chapter in an anthology of Alan Walker Read's articles on ok.1 Read had occasion to quote from a 'pastoral' poem written around 1774 by 18th-century clergyman and philologist Jonathan Boucher, which quote I happened to notice contained the words 'egg-nogg':

Fog-drams i' th' morn, or (better still) egg-nogg,
At night hot-suppings, and at mid-day, grogg,
My palate can regale:
I don't know why it occurred to me to check in the OED to see what the earliest quote they provided was, but I did, and since it was so much later, and from a British source to boot, I got interested. We requested an Inter-Library Loan of Boucher's "A Glossary of Obsolete and Provincial Words; forming a supplement to the dictionaries of the English Language."2 After a false start with an incomplete microfiche from Yale, Harvard obligingly sent us the actual volume. Amazing...

The pastoral poem, quite a long one, is a footnote in the 60-page Introduction to the Glossary. The full title of the poem, in the verbose convention of the day, is "Absence: A Pastoral: drawn from the life, from the manners, customs and phraseology of planters (or, to speak more pastorally, of the rural swains) inhabiting the Banks of the Potomac, in Maryland".

In the introduction to his Glossary, (p. xlix), Boucher writes the following:

"A List of some of the most remarkable and common [words], collected during my residence in Virginia and Maryland nearly thirty years ago, is here set down at the foot of the page. To this list I will subjoin a copy of verse, which I have ventured to call a Pastoral, written during my residence in America; written solely with the view of introducing as many of such words and idioms of speech, then prevalent and common in Maryland, as I conceived to be dialectical and peculiar to those parts of America."
The italicized terms in the quoted lines above (lines 69-71) are italicized in the original, and are the 'words and idioms of speech' that Boucher considered to be "dialectal and peculiar to those parts of America", including egg-nogg. As Read notes, from Boucher's reference to 'nearly thirty years ago', we can be sure that the pastoral was written around or before 1774, since Boucher died in 1804. Read notes that he left his Maryland parish in 1775, being a loyalist.

We wrote of our finding to the OED, and received a nice note back from Margot Charlton, saying in part:

I see that the Dictionary of American Regional English has some interesting eighteenth-century evidence for EGG POP, which seems to be similar....The entry goes back to the first edition, and was originally published in 1891; as we revise the text we are finding that we can antedate most entries with the help of the large historical databases now available to us, but it is always helpful to receive such precise information.

Boucher is a fairly well-documented fellow, having been rector in George Washington's parish, tutor to Washington's stepson, and his personal friend. A bit of quick googling on eggnog revealed a startling connection between Boucher, Washington, and eggnog: apparently a recipe for eggnog was found among George Washington's "kitchen papers" at Mount Vernon. It doesn't say whether the recipe was titled with the word "egg nog(g)' or not -- the term "egg flip" and (according to DARE) "egg pop" were also in use, or maybe it wasn't titled -- but given that Boucher noticed Marylanders in Washington's parish using it, presumably it's likely that Washington called it that too. Perhaps Boucher even sampled some of Washington's concoction at a holiday party around this time of year in the early 1770s.

Update: The far more sophisticated antedaters and sourcers over at ADS-L have been hot on the trail of 'eggnog' since this post went up! You can follow the discussion by going here and entering 'eggnog' as a search term. The upshot is that a) Sara and I were far from the first to spot that Boucher used 'eggnog' very early on, and b) there are several other instances of the word in published sources from the late 1700s (i.e. whose actual date of publication is earlier than that Boucher publication date of 1807).

So, apologies for wasting everyone's time! but at least it was seasonal. And, as in my independent study, I learned a lot. Happy new year everyone --

1Read was the kind of linguist who is heavy into a/s; I learned a lot from that independent study! If you think the recent rash of abbreviationism is purely technology-driven -- email, texting, all these newfangled electronics driving the kidz of today crrrazy -- you should check out his discussion of the mania for abbreviations in 19th-century New England. Larry Horn describes some of the amazing parallels in detail in this 2002 post to the ADS-L listsev.

2The volume we consulted was indeed titled as above, with "Obsolete" instead of "Archaic", and was from the first (1807) publication of part of Boucher's overall Glossary. It had been bound together by Widener with another dictionary, by Nares, dated 1822. The same piece of Boucher's glossary, up until 'Blade', was published again in 1832, as described in the following excerpt from the "Works of the Camden Society," published by Camden Society (Great Britain), Royal Historical Society (Great Britain) in 1868 (image below from GoogleBooks):



Posted by Heidi Harley at 10:23 PM

I'll teach you to undernegate!

Caught on television yesterday:

That'll teach you to blow your quarters on the arcade.  [Walker, Texas Ranger episode "The Covenant"]

conveying 'that'll teach you not to blow your quarters on the arcade'.  Undernegation, of a sort we haven't seen on Language Log since a brief mention in a 2004 posting by Chris Potts on "negation-indifferent items".

It's hard to get a feel for how common will teach someone to 'will teach someone not to' is, because references to literal teaching are so frequent.  Undernegative teach takes a complement VP denoting an action that the speaker views as undesirable, but there's no way to search for such VPs in general.  You can, however, search for some specific VPs of this sort, for instance talk back 'reply defiantly or insolently'; {"teach X to talk back"}, for various pronouns X, gets a modest number of hits, among them:

I'll teach you to talk back to me. You've got too big for your place for the last time. I've been taking too much from you, but I ain't doing it no more.  [Erskine Caldwell] (link)

I'll teach you to talk back to your StepMother, I'll teach you to talk back to me. From now on you'll do whatever we tell you with no back talk, ...  (link)

"That will teach him to talk back", Banzi muttered.  (link)

That'll teach her to talk back to me!  (link)

Guess that'll teach me to talk back to a couple Diablo fiends.  (link)

(The subject of teach in these examples is either that or I.)

Presumably, undernegative teach arises from "negation by association", as in the account Mark Liberman gave for undernegative could care less some time ago.  That is, teach by itself comes to be seen as a sufficient sign of negativity, and the not becomes dispensable.  There's no easy way to gauge the relative frequency of undernegative teach versus explicitly negative teach, but with the VP talk back, it looks like the undernegative is ahead.

Posted by Arnold Zwicky at 12:17 PM

A New Approach to Censorship

The government of Malaysia has announced what to my knowledge is a hitherto unknown form of censorship: they have prohibited the use of certain words with non-Muslim referents. The Associated Press reports that the government has informed the The Herald, a Roman Catholic newspaper, that its license to publish will not be renewed if it continues to use the word Allah in reference to God.

The Internal Security Ministry is of the view that the word Allah refers specifically to the Muslim god and may not appropriately be used in reference to the Catholic god. In addition to allah, the Malaysian government has prohibited the use of three other words by non-Muslims: solat "prayer", Kaabah "the shrine in Mecca", and baitulah "the house of god". Although the report is not entirely clear on this point, I think that what is prohibited is the application of these words to entities distinct from those to which they are applied by Muslims rather than, strictly speaking, their use by non-Muslims. That is, it is presumably acceptable for a Catholic to say that Muslims worship Allah.

As far as I know, such restrictions on the use of particular words are novel. Arabic-speaking Christians and Jews use the word Allah as their word for God. Muslim countries frequently discriminate against non-Muslims, but so far as I know, no other restricts their choice of religious terms.

The basis for this policy is also curious. If one believes in the existence of multiple deities, it make sense to distinguish one from the other. If your favorite deity is Thoth and mine is Isis, it makes sense to keep their names distinct to avoid confusion between the two. But if, as Muslims believe, you believe that there is only one god, you can believe that other people have false ideas as to what God is like and what she wants but you cannot reasonably believe that the god that someone else worships is different, for that would imply the existence of two gods. I suspect that the Internal Security Ministry has not thought this out very carefully.

Posted by Bill Poser at 03:13 AM

December 21, 2007

'Tis the season

It's an old tradition for the British Medical Journal to publish spoof articles in its Christmas issue. This week's BMJ continues the tradition, as the cover image reprinted on the right suggests. For example, one of the featured articles is Sreeram V Ramagopalan et al., "Origins of magic: review of genetic and epigenetic effects", BMJ 335:1299-1301, 22 December 2007:

Objective To assess the evidence for a genetic basis to magic.
Design Literature review.
Setting Harry Potter novels of J K Rowling.
Participants Muggles, witches, wizards, and squibs.
Interventions Limited.
Main outcome measures Family and twin studies, magical ability, and specific magical skills.
Results Magic shows strong evidence of heritability, with familial aggregation and concordance in twins. Evidence suggests magical ability to be a quantitative trait. Specific magical skills, notably being able to speak to snakes, predict the future, and change hair colour, all seem heritable.
Conclusions A multilocus model with a dominant gene for magic might exist, controlled epistatically by one or more loci, possibly recessive in nature. Magical enhancers regulating gene expression may be involved, combined with mutations at specific genes implicated in speech and hair colour such as FOXP2 and MCR1.

I can also recommend the article by Gareth Williams and Poonam Dharmaraj, "Dissent of the testis", which complains that the redesign of certain popular British candies means that they are no longer a suitable substitute for 8 ml orchidometer beads "to gauge testicular volume" ("This is a major setback for paediatric endocrinology, and the manufacturer's decision to change the sweets' morphology without consulting the medical profession is a further kick in the Teasers.")

The practice of running spoof stories in the BMJ Christmas issue is not exactly a secret. The lead article in this year's issue includes a pie chart showing the topical distribution of spoofs over the past two decades:

Apparently, it's an equally old tradition for the reporters and editors at BBC News to swallow such jokes hook, line and sinker. A letter printed in the BMJ in 2003, from John Attia and Kichu Nair, notes:

Every year the BMJ and Medical Journal of Australia use their Christmas edition to inject some medical humour into the normally serious scientific literature.

In this spirit we put together a fictional study entitled "Evidence based physicians' dressing: a cross-over trial," in which we documented the effect of "retro" dress (flared jeans, Hawaiian shirts, moussed hair, and nose rings) on patients' confidence.

Tongue in cheek, we described the Kolmogorov-Smirnoff test as two statisticians eyeballing the data over a glass of vodka, and we created a "fashion-operator characteristic" curve which defined a zone of "fashion limbo." We also calculated a number needed to dress (NND), analogous to the number needed to treat (NNT).

Despite what we and others saw as the obvious lightheartedness of this story, it has been reported as a serious research finding by the BBC ...

This year, the BBC news team somehow missed the stories on the genetics of magic and the testicular shape of Teasers, but did cover at least two spoof stories as if they were serious. One was Lee Graves et al., "Comparison of energy expenditure in adolescents when playing new generation and sedentary computer games: cross sectional study":

Procedure Participants were fitted with a monitoring device validated to predict energy expenditure. They played four computer games for 15 minutes each. One of the games was sedentary (XBOX 360) and the other three were active (Wii Sports).
Main outcome measure Predicted energy expenditure, compared using repeated measures analysis of variance.
Results Mean (standard deviation) predicted energy expenditure when playing Wii Sports bowling (190.6 (22.2) kJ/kg/min), tennis (202.5 (31.5) kJ/kg/min), and boxing (198.1 (33.9) kJ/kg/min) was significantly greater than when playing sedentary games (125.5 (13.7) kJ/kg/min) (P<0.001). Predicted energy expenditure was at least 65.1 (95% confidence interval 47.3 to 82.9) kJ/kg/min greater when playing active rather than sedentary games.
Conclusions Playing new generation active computer games uses significantly more energy than playing sedentary computer games but not as much energy as playing the sport itself.

This was picked up as "Wii players need to exercise too" ("Playing 'active' computer games on consoles such as the Nintendo Wii is no substitute for playing real sports, warn UK experts.") Curiously, the BBC reporter was unable to transcribe the spoof article's statistics correctly -- the third sentence of the article reads:

Wii players used only 2% more energy than players of regular computer games.

which is a strange way to report the statement "Predicted energy expenditure was at least 65.1 (95% confidence interval 47.3 to 82.9) kJ/kg/min greater when playing active rather than sedentary games."

The second spoof article picked up by the BBC news hounds was from the "Retirement" section -- Sam Shuster, "Sex, aggression, and humour: responses to unicycling", BMJ 335:1420-1322, 22 December 2007. The abstract:

After retiring from a busy university department in Newcastle upon Tyne, and with the time and the need for more than the usual consultancies, I was able follow some of my more extreme inclinations. As a cyclist, I had occasionally thought of using more or fewer wheels, but it was only when choosing a grandson’s gift that I got seriously lost in contemplation of a gleaming chrome unicycle. My wife said "buy the bloody" thing, which I did on the whim of the moment. After months of practice at home, I graduated to back streets, a small paved park, and finally town roads. I couldn’t avoid being noticed; in turn, I couldn’t avoid observing the form that notice took. Because at the time there were no other unicyclists in the area, such sightings would have been exceptional, yet I soon found that the responses to them were stereotyped and predictable. I realised that this indicated an underlying biological phenomenon and set about its study.

This was rendered as "Humour 'comes from testosterone'", 12/21/2007:

Men are naturally more comedic than women because of the male hormone testosterone, an expert claims.

Men make more gags than women and their jokes tend to be more aggressive, Professor Sam Shuster, of Norfolk and Norwich University Hospital, says. [...]

Professor Shuster believes humour develops from aggression caused by male hormones.

In these last two studies, it's possible (though not certain) that the reported data were not made up as a joke, even though the presentations are certainly light-hearted. And in fact, in the Australian case cited earlier, the news department of the BMJ itself was apparently fooled. The letter that I quoted before continues:

[Despite what we and others saw as the obvious lightheartedness of this story, it has been reported as a serious research finding by the BBC] and now the BMJ, both with a commentary from a medical expert.

We are both amused and alarmed by these occurrences and have been puzzling about their interpretation. Is this a sad commentary on:

* The sense of humour (or lack of it) in the medical scientific community?
* How thoroughly news media check their sources?
* The state of evidence based medicine? Does writing up something as a randomised trial give it such credence that it overrides common sense?
* How people read articles? Is an abstract sufficient to create the perception of veracity?

We wish to set the record straight that this was a fictional study and was simply intended to be, and was labelled as, medical humour. From the amount of interest it has generated it may be a fertile area for real research.

I believe that the news department at the BMJ is has now gotten the hint to consider the date and the content before accepting silly-seeming stuff.

However, the journalists and editors at the BBC seem to be quite resistant to the idea of applying any such faculty of judgment -- though I've occasionally speculated that the whole BBC News enterprise is a sort of vast Pythonic joke, a secret subtle Onion, organized entirely for the private amusement of those initiated into its mysteries.

[Hat tip: Cosma Shalizi]

[Update -- Holly at Feministing doesn't think that the BBC's journalistic incompetence is funny at all ("The BBC says: humour 'comes from testosterone.' Holly says: bad reporting 'comes from the BBC.'", 12/21/2007). She's right, of course. But the BBC is insulated from market forces, and apparently impervious to complaint, so what can reasonable people do but try ridicule?]

[Update by Geoff Pullum: Sadly, it's not just the BBC, and it's not about insulation from market forces (the World Service of the BBC was spun off into a profit-making separate entity, and a lot of the bad science reporting seems to come from there). The Independent is a profit-making (or at least, profit-attempting) newspaper, and a comparatively good one; but on page 19 of the issue for today, Saturday December 22nd, 2007, you will find an earnest report by Jeremy Laurance, the Health Editor, reporting the story about the dermatologist on a unicycle correlating male jeering with testosterone level. He swallows it hook, line and sinker. It looks to me as if people are being put into science and medicine reporting when they do not have good science degrees. Or a sense of humor.]

[Update #3 -- the "humor from testosterone" story is certainly spreading vigorously around the world, presented as a serious scientific result. And the BBC is certainly not the only credulous (or opportunistic) media outlet: "Scientist claims men are funnier than women", The Telegraph; "Aggression 'makes men more humorous than women'", The Independent; "Humor Develops From Aggression Caused By Male Hormones, Professor Says", ScienceDaily; "Is humor tied to male aggression?", World Science; "Quand les testicule régissent l'humour", TF1; "Das Geheimnis des männlichen Witzes", Spiegel; "Humor entsteht aus Aggression", Die Welt; "L'umorismo è maschile perchè legato a testosterone", AGI; "'L'umorismo è maschio', sostiene una ricerca", ANSA; "El humor 'está en la testosterona'", BBC Mundo; "Bewijs: mannen hebben inderdaad meer humor", Elsevier; "Testosteron ger män aggressiv humor", Svenska Dagbladet; "Estudo diz que senso de humor está ligado à testosterona", BBC Brasil; "Smysl pro humor je prý výhradně mužská záležitost", České noviny. ]

[Update -- more here.]

Posted by Mark Liberman at 11:04 PM

What's not the case?

Negation can be a tricky thing, as has been discussed many times here on Language Log before. Back in grad school, my first-semester semantics course was with Veneeta Dayal and our textbook was (the first edition of) Chierchia & McConnell-Ginet's Meaning and Grammar, in which "more colloquial" forms of expressing propositional negation were typically replaced with the somewhat more stilted "it is not the case that", so that a sentence like "Pavarotti is not hungry" is expressed as "It is not the case that Pavarotti is hungry" -- or, if you prefer, "That Pavarotti is hungry is not the case".

(Lots of examples in Meaning and Grammar involve Pavarotti, James Bond, and Sophia Loren, commenting on their hunger and boringness, asking whether they like each other, claiming that Sophia Loren thinks Pavarotti is French, etc. Don't ask.)

Why the stilted substitution? (I hear you ask.) Well, part of what one learns early on in Meaning and Grammar is that a sentence like "It is not the case that Pavarotti is hungry and Bond likes Pavarotti" is false true [thanks, Steve] if either or both of the conjoined clauses ("Pavarotti is hungry" and "Bond likes Pavarotti") is false. In other words, the sentence must be true whenever any of the following three more colloquially-expressed sentences is true:

  1. First clause negated: "Pavarotti is not hungry and Bond likes Pavarotti."
  2. Second clause negated: "Pavarotti is hungry and Bond doesn't like Pavarotti."
  3. Both clauses negated: "Pavarotti is not hungry and Bond doesn't like Pavarotti."

This generally holds for conjoined clauses, but not for embedded clauses. That is, the sentence "It is not the case that Loren thinks that Pavarotti is French" must be true when "Loren doesn't think that Pavarotti is French" (first clause negated) is true, but not necessarily when "Loren thinks that Pavarotti is not French" (second, embedded clause negated) or "Loren doesn't think that Pavarotti is not French" (both clauses negated) are true. Right? Right.

Update, 12/28: Wrong! The real difference is that "It is not the case that Pavarotti is hungry and Bond likes Pavarotti" does not entail any of the three more colloquially-expressed sentences listed above, while "It is not the case that Loren thinks that Pavarotti is French" does entail (and only entails) "Loren doesn't think that Pavarotti is French". Thanks to Jonathan Weinberg for writing on 12/21 to point out my mistake here, and my apologies for taking so long to correct it -- you know, what with the holidays and all that. (Just goes to show you that a phonologist can be easily confused by semantics.) In any event, there's a difference, and what follows still holds.

OK, with that background, you'll see why it surprised me to read the following quote from the President's press conference yesterday:

"Are we satisfied with progress in Baghdad? No, but to say nothing is happening is not the case," Bush said. (link)

At first I wondered whether this might be a misquotation, so I checked the White House transcript, where it's rendered only slightly differently: "Are we satisfied with the progress in Baghdad? No. But to say nothing is happening is just simply not the case." (audio)

Of course, what Bush was trying to do was negate the second clause: "it is (just simply) not the case that nothing is happening", meaning that "something is happening" -- i.e., that there is indeed progress in Baghdad. But the problem is that he embedded the "nothing is happening" clause under the "(for someone) to say" clause, so, if anything, his statement must be true only when "(for someone) to not say that nothing is happening" (first clause negated) is true.

Of course, I'm putting aside the fact that the stilted "it is not the case that" type of negation simply doesn't work with this type of sentence in the first place, which can perhaps be better appreciated by moving the negation to the beginning: "it is not the case that to say nothing is happening" (compare the grammatical-though-stilted "it is not the case that someone said that nothing is happening"). Then again, I'm also ignoring the possibility that one might analyze Bush's statement as if it had been rendered thusly: "... to say nothing is happening, well, it's just simply not the case", where the "it" specifically refers to the clause "nothing is happening". We're not ones to simply point out Bushisms here on Language Log, but in this case, I guess I just couldn't resist.

[ Comments? ]

Posted by Eric Bakovic at 11:43 AM

It all depends on what "see" means

And also what "march with" means. The Romney campaign is in the uncomfortable position of having to explain that the candidate said something that isn't literally true, without admitting that he said something false. This all works out for them if see means "be aware of", and march with means "support", or maybe "march for the same cause, but in a different day and in a different city".

According to Michael Levenson, "Romney never saw father on King march", Boston Globe, 12/21/2007

Mitt Romney acknowledged yesterday that he never saw his father march with Martin Luther King Jr. as he asserted in a nationally televised speech this month, and historical evidence shows that Michigan's Governor George Romney and the civil rights leader never did march together.

Romney said his father had told him he had marched with King and that he had been using the word "saw" in a "figurative sense."

"If you look at the literature, if you look at the dictionary, the term 'saw' includes being aware of in the sense I've described," Romney told reporters in Iowa. "It's a figure of speech and very familiar, and it's very common. And I saw my dad march with Martin Luther King. I did not see it with my own eyes, but I saw him in the sense of being aware of his participation in that great effort."

Here's the phrase from Governor Romney's 12/6/2007 "Faith Speech":

The Globe story clarifies what actually happened back in 1963:

Susan Englander, assistant editor of the Martin Luther King Jr. Papers Project at Stanford University, who is editing the King papers from that era, told the Globe yesterday: "I researched this question, and indeed it is untrue that George Romney marched with Martin Luther King."

She said that when he was governor of Michigan, George Romney issued a proclamation in June 1963 in support of King's march in Detroit, but declined to attend, saying he did not participate in political events on Sundays. A New York Times story from the time confirms Englander's account.

A few days after that march, George Romney joined a civil rights march through the Detroit suburb of Grosse Pointe, but King did not attend, Englander said. A report in the New York Times confirms Englander's account of that second march, mentioning George Romney's attendance but making no mention of King.

So George Romney publicly supported -- but wasn't present at -- a march in Detroit that King attended; and participated in a march a few days later in Grosse Pointe, which King wasn't present at. It's not clear from the Globe article whether Mitt participated in either of these marches; as far as I can tell from his biography, he was at boarding school at Cranbrook, an hour or so away, so it's possible.

Romney has repeated the story of his father marching with King in some of his most prominent presidential campaign appearances, including the "Tonight" show with Jay Leno in May, his address on faith and politics Dec. 6 in Texas, and on NBC's "Meet The Press" on Sunday, when he was questioned about the Mormon Church's ban on full participation by black members. He said that he had cried in his car in 1978 when he heard the ban had ended, and added, "My father marched with Martin Luther King."

Mitt Romney went a step further in a 1978 interview with the Boston Herald. Talking about the Mormon Church and racial discrimination, he said: "My father and I marched with Martin Luther King Jr. through the streets of Detroit."

I can appreciate his current frustration. It's certainly clear that his father supported the civil rights movement in 1963, which was not a position to be taken for granted for a politician at that time. And "I read about my father issuing a proclamation in support of a march that Martin Luther King participated in", or "my father (and I ?) marched in Grosse Pointe a few days after Martin Luther King marched in Detroit" are wimpy statements -- "I saw my father march with Martin Luther King" is much punchier. Its only drawback is that it's not exactly true.

[Update -- predictably, this is the presidential campaign story of the day:

David S. Bernstein, "Was it all a dream?", The Boston Phoenix

John Gibson, "Mitt Romney's Big Oops", Fox News

Michael D. Shear, " Romney the Elder and Martin Luther King Jr.", Washington Post (The Trail)

Michael Luo, "Romney Learns that 'Facts Are Stubborn Things'", NYT 12/22/2007 (original head: "Romney shows a tendency to imprecision")

Dawson Bell, "George Romney was at march, King wasn't, activist says", Detroit Free Press

etc. ...]

[Update -- Romney seems to have made things worse by claiming the expert status of a former English major, and then getting the World Series and the Super Bowl mixed up:

CBS News: "Did you actually see -- with your own eyes -- your father marching with Martin Luther King?"
Romney: "My own eyes? You know, I speak in the sense of I saw my dad become president of American Motors. I wasn't actually there when he became president of American Motors, but I saw him in the figurative sense of he marched with Martin Luther King. My brother also remembers him marching with Martin Luther King and so in that sense I saw him march with Martin Luther King."
Later he said, "I can't even give you the time frame. I just remember that we talked about it. My brother also remembers my dad having spoken about the fact that he did not do political events on Sunday but that he decided at the last minute that he was going to break that self-imposed rule and participate and I think he did so on a Sunday as I recall."
He added, "You know, I'm an English literature major as well. When we say, 'I saw the Patriots win the World Series, it doesn't necessarily mean you were there -- excuse me, the Super Bowl. I saw my dad become president of American Motors. Did that mean you were there for the ceremony? No, it's a figure of speech."

The Romney campaign has done a better job of making the case than the candidate has.]

Posted by Mark Liberman at 10:54 AM

Couric on the primaries: too close to call, tight as a tick

As the early rounds of presidential primaries and caucuses approach, CBS Evening News anchor Katie Couric has been emphasizing just how close the races are. On the Dec. 18 edition of the program, Couric had this to say to chief Washington correspondent Bob Schieffer about the upcoming Iowa caucuses:

You and I talked a little earlier this afternoon, Bob, we were saying Iowa is really getting interesting. As they say in the business, too close to call. In fact, that phrase was invented here at CBS between 1962 and 1964. I thought you might find that interesting.

Then the next night, Couric opened the Evening News broadcast with a teaser about a close primary race in another state:

The first presidential battle in the South. A new poll shows it's tight as a tick in South Carolina between Clinton and Obama.

First, let's consider Couric's assertion that the phrase "too close to call" was a CBS newsroom invention of the early '60s. After The Hotline made this their quote of the day, Couric felt obliged to support her claim on the Couric & Co. blog:

At first it sounds like something out of a horse race – a literal one, not the political races in Iowa or New Hampshire. Curious if that was the case, I asked Couric & Co. to check it out. Sure enough, language guru William Safire had pondered the same thing in one of his “On Language” columns back in 1996. Safire writes:

Daniel Schorr of National Public Radio remembers the phrase from the early days of television, and directed me to Martin Plissner of CBS, a pioneer of electronic election coverage.
"That phrase was invented at CBS between 1962 and 1964," says Plissner with the confidence never shared by lexicographers. "During that period, instead of using the exit polling we have today, we used a model we had devised for predicting or calling elections based on certain reported-precinct results. That gave us a sample to which we could apply mathematical formulae to determine a call. When we had a situation in which all the votes were reported but there was no clear winner, we called that election too close to call."

A little poking around inside CBS News today revealed more: that Lou Harris, who worked for CBS News in the ‘60s, is said to have first uttered those words on air when reporting on a tight race for governor of Massachusetts in 1962.

So Couric wasn't pulling this factoid out of thin air: Martin Plissner told Safire that the phrase was "invented at CBS," and another unnamed newsroom source gave Lou Harris the credit for having "first uttered those words" in 1962. But if Couric's fact-checkers had dug a little deeper, they might have discovered that there was something to her initial hunch that "too close to call" might have come out of a (literal) horse race or some other similar sporting context. In fact, "too close to call" shows up in the sports pages going back to the 1930s. Of the citations below, the first three relate to boxing and the fourth to college football:

For once they admitted that there was a fight too close to call.
Zanesville (Ohio) Signal, Nov. 3, 1932, p. 2 ("Today's Sport Parade" by Henry McLemore)

Canzoneri or Ross? It's too close to call. Let the referee decide it. Or the judges.
New York Times, Sep. 12, 1933, p. 30 ("Sports of the Times" by John Kieran)

That may make a difference when they step through the ropes and start firing. It makes it too close to call with any confidence in advance.
New York Times, May 6, 1938, p. 30 ("Sports of the Times" by John Kieran)

Missouri is favored over Iowa State; Nebraska should rout Kansas; Oklahoma and Kansas State are too close to call.
Burlington (NC) Daily Times-News, Oct. 16, 1940, p. 7

So was CBS at least the first to use "too close to call" to refer to a close electoral race? Evidently not, as it appears in print well before the putative 1962 invention. The phrase was applied to various close races in 1958:

The Senate struggle [in Minnesota] between Republican Sen. Edward J. Thye and Democratic Rep. Eugene McCarthy looks too close to call.
Albuquerque (NM) Journal, Oct. 12, 1958, p. 31 (AP wire story)

Maryland: Too close to call is the current verdict on the race between Senator J. Glenn Beall, Republican, seeking a second term, and Baltimore's Maryor Thomas D'alesandro.
New York Times, Oct. 13, 1958, p. 24

Two weeks before election, its [sc. New Jersey's] race for United States Senator is considered too close to call.
New York Times, Oct. 21, 1958, p. 1

A veteran newspaper political writer said the gubernatorial fight [in Wisconsin] between incumbent Republican Vernon Thomson and [Gaylord] Nelson is "too close to call."
Daily Review (Hayward, Cal.), Oct. 24, 1958, p. 25 (UPI wire story)

So by 1962, "too close to call" had become established in political reporting for at least four years. If CBS can lay any claim to the phrase, it's possible that the network was the first to use it as a categorization of close races in election-night coverage as voting returns came trickling in. But that's a far cry from saying it was "invented" or "first uttered" by Lou Harris or another CBS correspondent.

Couric's other turn of phrase, "tight as a tick," might actually have a more distinct CBS provenance, since it's widely attributed to her predecessor Dan Rather (known for his colorful Texanisms):

The new Iowa Polls show the caucus fights are "tight as a tick," as Dan Rather might say. (Des Moines Register, Dec. 1, 2007)

The conclusion to be drawn from this mass of data is that -- in the words of Dan Rather -- it is "tight as a tick" in Iowa. (Washington Post blog, Dec. 7, 2007)

In 2000, CNN host Jeff Greenfield attributed a more elaborate version to Rather:

And yet, either because of the challenger's appeal or the insider's weakness or maybe, maybe, because of an enduring American hunger for the new, this contest remains tight as a tick in Grandma's corset. I have no idea what that means, but it works for Dan Rather. (CNN transcript, Oct. 31, 2000)

I was unable to find the "Grandma's corset" variant in any of the CBS transcripts available on LexisNexis, but I did find another version from Rather:

There are races in the presidential races in individual states that are as tight as a tick on a dog's ear here. (CBS News transcript, Nov. 5, 1996)

The political use of "tight as a tick" isn't always credited to Rather, as in this recent usage by Democratic strategist Jamal Simmons on CNN:

So, I think you can't count John Edwards out yet. It's tight as a tick, as they say. (CNN transcript, Dec. 10, 2007)

And in the last presidential election cycle Gordon Fischer, the chairman of the Iowa Democratic Party, used yet another variant on the Fox News show "Big Story with John Gibson":

Well, you know, frankly from the polls, this is looking as tight as a tick on a deer. (Fox News transcript, Jan 19, 2004)

"On a deer" is an interesting variation on the theme, since a quick Google search finds the tick appearing chiefly in canine locations, in accordance with Rather's 1996 usage: on a hound, bloodhound, hound dog, coon dog, dog, dog's ear, dog's back, etc.

Though Rather tends to get the credit for "tight as a tick" with reference to close political races, it's a regional idiom with a wide array of other possible meanings. These meanings have lately been dissected on the American Dialect Society mailing list, in a discussion sparked by Couric's recent usage. Darla Wells, who first caught the broadcast, was familiar with the sense of "tight as a tick" meaning 'having had too much to drink,' and Dennis Preston collected this sense for his 1975 article, "Proverbial comparisons from southern Indiana" (Orbis 24,1:72-114). Ron Butters, meanwhile, suggested the idiom could mean 'miserly,' with tight related to tightwad. Finally, Gregory McNamee supplied some local knowledge from the region of Virginia where Couric was raised, saying that "tight/full as a tick" in that area refers to someone who has overeaten, "gorged to the point of popping, like a tick full of blood."

So Couric's Virginian understanding of the simile may have been influenced by Rather's north Texan dialect (or his own peculiar idiolect), allowing a semantic transferral to the realm of political campaigns. Considering the often contentious relationship between the current CBS anchor and the former one, it's interesting that Couric might be taking her cue from Rather when it comes to this particular idiom.

Posted by Benjamin Zimmer at 01:11 AM

December 20, 2007

Whether either

It's been a time of puzzles about whether and either: a case where either is used in place of standard whether, and an assortment of coordinations, of varying levels of oddness, involving these two items.

1. Concessive eitherFirst, either for whether -- in an ADS-L posting by Phil Cleary on 9 December, intended to document a possible eggcorn (advance someone's course for advance someone's cause) but incidentally providing an instance of concessive either X or not 'whether X or not':

From a NY Times blog: "Either Hillary likes it or not, Black voters are going to wake up and I don't see them voting for Hillary in South Carolina. Hillary has not done anything to advance their course let alone understand their need."

It looks like advance ... course is a moderately common error -- an eggcorn, perhaps, or a flounder, a word confusion based on similar pronunciation (especially, in this case, in non-rhotic varieties) and overlapping meaning.  Here are a few more examples (found on 12/9):


He wanted to use pro-Norman and thus Kamajor position to advance his course. Just after few weeks, he lost the steam as the Party moved aggressively to undo ...  (link)

For me and many other rational people, Gore has all the credibility to advance his course on climate change. But his public campaign would be a chunk better ...  (link)

That day I also found plenty of hits for {"either you like it or not"} in the appropriate sense, e.g.:

All I am saying is that "it's the truth" either you like it or not.  (link)

Either you like it or not Europe will meet her destiny - sooner than later.  (link)

Well, this is my accent, and either you like it or not, it cannot be changed.  (link)

Back on subject, that article on which the comic is based on is, either you like it or not, pretty damn acurate on the behaviour of a totalitarian, ...  (link)

(You can find more examples with {"either you like it or you don't"} and other variants.)  I hadn't noticed this before, and it seems not to have caught the attention of the usage advisers, but it's definitely out there.

Now, this is not just "confusing" either and whether.  It has a basis, I think, in one of the ways you can use "either you like it or not" (equivalent to "you either like it or not"), namely to convey indifference: 'it doesn't make any difference whether you like it or not'.  (Similarly for examples with clauses other than you like it.)  That's just what whether ... or not conveys.  So either bleeds into whether's territory.

[Long digression: This would be a good place to point out that people who revile whether ... or not in complement clauses --

I don't know whether we will come or not.

because the or not is "unnecessary" (Omit Needless Words, or ONW) don't always recognize that the or not is OBLIGATORY in standard English in concessive clauses:

 Whether we come or not, you should go on with the party.
*Whether we come, you should go on with the party.

Scrupulous usage advice recognizes the distinction.  (And sensible usage advice doesn't insist on the omission of or not.  MWDEU: "this use of or not is more than 300 years old and is common among educated speakers and writers.  It is, in short, perfectly good, idiomatic English.")

A further wrinkle: there are plenty of occurrences of the non-standard truncated concessive (without the or not), e.g.:

My biscuit is gonna pop, whether you like it you not ever gonna play me motherfuckers get shot ...  (link)

I just get out a Bible and read it and whether you like it you need it ...  (link)

Whether you like it, you are 'public figures.' It's not our fault that so many TV news anchors and reporters feel special privileges of attention and fame.  (link)

we are the knowing, and whether you like it you will love us, and will genuinely believe we are good enough to sell lots of records, tour the world and ...  (link)

That would be like the truncated as far as of

As far as your ideas on this subject, I think they're nonsense.

which I've mentioned a number of times on Language Log (for instance, here).  Here we have two situations where vernacular speakers omit words because's they're needless in the context, but guardians of the standard insist that you must Include All Necessary Words (IANW).  The guardians' judgment is in fact based on social criteria -- who uses the variant, an antipathy to what's perceived as innovation -- but the criticism is couched in terms of "efficiency" (ONW) or "logic" (Two Negatives Make a Positive) or some other abstract principle.  The rationale is SECONDARY, as I termed it in my discussion of at about a while back.  Omitting needless words is ok only if you do it in tune with the prescribed standard.]

2. Correlative either ... eitherA few weeks before, inspired by the data I'll talk about in section 4 below, I looked for examples of correlative either ... either.  Searching on {"either you * or either"} on 15 November got many hits, among them:

Favorite Quote. It's a beautiful world, everyone's insane. Either you swim or either you fade.  (link)

either you believe god or either you don't.  (link)

either you win or either you loose. The possibility of buying your own land and building on there, say a house/villa/pool whatsoever is great, ... (link)

Two weeks ago I told her enough with these games and I really love you and I want you back so it's either you're in or either you're out.  (link)

For those who don't, ms warren is this songwriter who either you love or either you hate, no in between ...  (link)

I'd noticed correlative either in student writing for years.  It's non-standard, but not (so far as I can tell) proscribed in the manuals, although it would be a natural object for ONW scorn.

It's easy to see how it would arise.  The standard disjunctive correlative either ... or has an accented word, either, marking the first disjunct, but normally unaccented or in the second.  In speech you can put an accent on the or to convey that the alternatives are of equal status, but in writing that's hard to do.  Using either on both disjuncts does the trick.  Yes, it's non-standard, but it's communicatively effective, and you can see why people might like it.  (It's also likely that correlative whether, as in the next section, promotes correlative either.)

[More digressions.

In addition to disjunction marked with either, English has (of course) unmarked disjunction, as in

You're in or you're out.

Here the two disjuncts have equal status, just as conjuncts do:

You're in and I'm out.

Somewhat surprisingly, there is little antipathy towards marked disjunction in the advice manuals, though you'd expect ONW to be invoked against either.  To my mind, this is a good thing: relentless ONW legislates implicit (rather than explicit) marking of content, and this is just an irrational prejudice.  Saving a word means a lot when you're writing telegrams or writing to some limit on word counts, but in plenty of other contexts that extra word does some work.  Explicit marking is often just what you want.]

Notice that the problem with correlative either ... either is not a failure of parallelism.  From the point of view of standard English, it's TOO MUCH parallelism.

3. Correlative whether ... whether.  Formally similar to correlative either ... either is correlative whether ... whether, as in these examples Neal Whitman sent me last month:

B2B Means Back to Basics: Whether It's the Net or Whether It's Not, Business Is Business  [book title]

"Reply: Whether It's Right, Or Whether It's Written, He Just Doesn't Get It: A Reply to Gregg"  [1997 journal article in Second Language Research]

Whether you're a mother or whether you're a brother  [from the song "Stayin' Alive"]

Whether you think you can or whether you think you can't, you're right.  [attributed to Henry Ford]
Neal found these examples (all concessive) a bit edgy, but I find them fully grammatical, and in fact I generally prefer them to coordination of full clauses WITHOUT the repeated whether, like:

Whether you think you can or you think you can't, you're right.

This sentence is entirely grammatical, but I prefer the explicit marking on both disjuncts, because it gives equal status to the disjuncts.

[Yet another digression.  Either and whether have a lot in common, but the details of their distribution are significantly different.  Among these differnces is that in formal standard English either marking a first disjunct is generally optional, but whether in the same context is obligatory:

*You think you can or you think you can't, you're right.

But in informal English, such concessives are just fine.  This is another sort of implicit marking, a kind of parataxis -- one sentence glommed onto another.  Such implicit marking, which depends on the hearer or reader supplying the connection between the two bits of content, is a good thing when the people involved are generally on the same page about what's going on.

The really big point here is that it's loopy to legislate in general against explicit marking (ONW) OR implicit marking (IANW) when both are available.  Each has its own virtues, depending on the context.]

Neal's examples are all concessives, but correlative whether is equally at home in complements:

I don't know whether he died or whether he's still working.  (link)

I don't know whether he was joking or whether he was serious.  (link)

Jack starts to wonder whether he's being paranoid or whether things are not quite what they seem.  (link)

Researchers have not yet answered whether beets produce geosmin themselves, or whether it is produced by symbiotic soil microbes living in the plant.  (link)

4. Correlative subjects.  Now we get to some possibly WTF coordinations.  It all started when I heard the following:

In the meantime, it's far from clear whether the marketplace or whether global warming will win the race. In Bio-Town USA, I'm Sam Eaton for Marketplace.  ("Home-grown energy independence",  American Public Media's Marketplace 11/13/07)

(This was the impetus for my investigations in section 3, which in turn took me to the stuff in section 2.)

On 14 November I googled up a few more examples, among them:

"But I don't know whether you, or whether any one, can assist me."  (link)

You had a particular concern in April 1989 as to whether you or whether Titan would be paying the full cost or part of the cost of travel per charter from ...  (link)

These have correlative SUBJECTS marked with whether, in a Right Node Raising (RNR) configuration that goes beyond the ordinary, since it groups together a subordinator whether with the subject of its associated complement clause, as against the VP of that clause; the disjuncts are not constituents.

There are similar examples with other introductory WH elements (some involving disjunction, some conjunction):

what:  But, in the end, it really doesn't matter what I or what you think, it only matters what Best Buy and/or Comcast and/or Hughes, or your neighbor thinks, ...  (link)

when:  So when you, and when I, come to God, have we many things that we could say to Him, so many things in fact that He must hear from us and be certain exactly ...  (link)

where:  In other words, where you and where I fit into the grand scheme of "it all." The picture we have of God is still out of focus.  (link)

who:  ... its never entirely stable and this is where the pain of not really knowing who you or who they really are comes in, it grows dimmer and darker.  (link)

And with if:

... or if an Australian security force detains an individual for whatever reasons to do with terrorism, if you, or if I report that, that we can go to jail ...  (link)

... and if he or if she doesn't come in every day, here's the Labour Code -- non-culpable dismissal," or "You're a nice lady, ...  (link)

And with that:

... wouldn't be around, if they did not have an enormous amount of raw survival tactics that you and that I would describe in very entrepreneurial ways.  (link)

(I haven't found any examples with either or with subordinating conjunctions like although and because.)

Non-constituent disjuncts/conjuncts are just what we expect in RNR: in

... give money to, and/or take support from, the party

the object NPs money and support are grouped together with a preposition associated with another argument of their verbs, even though the two don't make a syntactic constituent.  RNR is like that.

My first reaction to the whether examples was to shrink back, and probably many people will find them unacceptable.  But I've grown to find these RNR examples that "cut into" clauses not so bad at all.  As far as I can tell, the advice literature hasn't noticed them.  (How would the manuals label them?  They're certainly not failures of parallelism -- once again, they are in a sense more parallel than they'd have to be -- though they could of course be faulted as violations of ONW, if you care passionately about omitting every single omissible word.)

5. Bonus WTF coordination.  And now a real WTF coordination case (involving disjunction, so there's a tenuous tie to the preceding sections), from Bob Ray on 21 November:

Do you or your spouse have or applied for Medicare?    ___yes  ___no.  [on a retirement form from the State of Minnesota]

Ray noted that there was plenty of room on the form for a more complete sentence, in particular:

Do you or your spouse have or [have you or your spouse] applied for Medicare?    ___yes  ___no.

That is, the have is functioning both as a main verb (have Medicare) and as an auxiliary, in the perfect construction (have applied for Medicare).

This looks like a one-off error, the result of pasting together two formulations of related ideas, with have as the hinge, much as in the Elmore Leonard quotation Mark Liberman posted about earlier today:

Joe Aubrey thought he knew what Walter had in mind, but no idea how he'd pull it off.

The other cases I looked at above are systematic -- though maybe non-standard or in the gray area between standard and non-standard -- and not inadvertent errors.

Posted by Arnold Zwicky at 10:30 PM

Linguists on Jeopardy

Following up on Bill's post from last night: Kevin Holbert may be the first self-identified linguistics student to be a Jeopardy! contestant, but he's certainly not the first linguistics student to be a Jeopardy! contestant tout court. Eight years ago this month, there was a 1-time Jeopardy! champ named Todd O'Bryan. Todd self-identified as "[a] graduate student originally from Louisville, Kentucky" (thus also following up on my post from yesterday), but more specifically, Todd was then a linguistics student in what became my own home department at UCSD -- as Todd's web page clarifies.

[ Comments? ]

Posted by Eric Bakovic at 10:49 AM

Colloquial looseness

Breffni O'Rourke writes:

I thought either the Elmore Leonard desk or the WTF coordination department might be interested in these, from Up in Honey’s Room, 2007. Both are on p. 62 of the Weidenfeld and Nicholson edition (end of chapter 6):

(1) Joe had an airplane, a single-engine Cessna he’d fly to Detroit and take Walter for rides and show him how to work the controls.

Three coordinated VPs within a relative clause (“a single-engine Cessna [which] he’d X and Y and Z”), but only the first has the requisite NP gap.

(2) Joe Aubrey thought he knew what Walter had in mind, but no idea how he’d pull it off.

The subject of the second clause is supplied by the first, but the verb “had” has gone missing – maybe under the influence of “had” in the indirect question (“what Walter had in mind”). Which might also explain why it seems OK at first glance.

They’re both perfectly comprehensible and have that colloquial looseness that makes the dialogue vivid, but I was surprised to find it in a narrative passage.

But these two sentences occur in a passage of free indirect style, where Leonard presents his character's thoughts in a mixture of perspectives, tenses and pronouns. Here's what immediately precedes Breffni's first quotation:

Walter had better things to do, work toward becoming as well known as Himmler, perhaps even a Nazi saint. He had finally decided, yes, of course tell Otto and Jurgen what you intend to do. They were Afrika Korps officers, heroes themselves. Tell them they are the only ones in the world who will know about the event before it happens.

The only ones if he didn't count Joe Aubrey in Georgia, his friend in the restaurant business who owned a string of Mr. Joe's Rib Joints, all very popular down there. Though lately Negro soldiers from the North were "acting uppity," Joe said, coming in and demanding service, and he was thinking of selling his chain.

This passage, although not quoted, does have some of the aspects of Elmore Leonard's dialogue, in which parataxis tends to be preferred over syntaxis. So the first quote might be an indirect-style echo of something like

Joe's got this airplane, he flies [it] to Detroit and takes me for rides and shows me how to work the controls.

The object it is optional here, even in a main clause.

But parataxis doesn't seem to be involved in Breffni's second quotation. In that case, there's definitely a missing "had" or "he had"; and I agree that the immediately preceding "had in mind" helps explain why the sentence goes down so easy anyway.

However, stranded "no idea how" is fairly common out there on the internets:

The editor is in my phone but no idea how to make it work!
I've encouraged my Ajax guru cousin to investigate shimming a HTML5 Audio wrapper overtop his library to promote cross-browser programming, but no idea how feasible that is...
Have you ever known what you wanted to say, but no idea how to enunciate it?
You managed to raise the floor but no idea how to turn on the fans?
Here's a bracketless one (no screws at least) but no idea how reliable it is.
There are alot of tools and macros for Radio which I imagine would work here, but no idea how to properly implement.

Maybe for some people, "no idea how" is on the verge of becoming a construction that can introduce a clause without requiring any overt preceding subject and verb. There's a hint of this in the distribution of the initialism nfi:

i got so wasted saturday night i woke up in a red cross clothing bin nfi how i got in there.
If you havent got console (nfi how you can play without it) go to options, click the keyboard tab, advanced button and make sure "enable developer console"
its a marketing ploy of thiers to make the stores as ugly as posible during sales... nfi how that works tho
my posts are on direct anime as well nfi how it got there,
ive been having this problems for months now, nfi how to fix it.
Currently in beta, but nfi how much it'll cost.

But nfi whether this is plausible in the 1940s Detroit of Leonard's novel.

Posted by Mark Liberman at 07:49 AM

December 19, 2007

A Linguist on Jeopardy

We've chronicled the ups and downs of Jeopardy's handling of linguistic topics, which seems to have improved, and recently they've had entire categories devoted to linguistic topics, as on December 14th, when two of the categories were "Suffixes" and "Let's Learn Hebrew",but they've reached a new high: one of the contestants is a linguistics graduate student.He's Kevin Holbert, who won two games before losing today. You can check out his performance at the Jeopardy Archive. Alex Trebek chatted with him about his work on Khoi and San languages, in the course of which Holbert actually got in a good description of the production of clicks, for which Khoi and San languages are known. To my amazement, Trebek spontaneously (that is, not in imitation of Holbert) produced several clicks!

Posted by Bill Poser at 11:13 PM

Binge, don't purge

I'm spending the holidays with my wife's family in Louisville, KY this year. At breakfast this morning I picked up yesterday's Courier-Journal and started laughing so hard I almost coughed up my oatmeal. I wasn't reading the comics -- it was the headline for the top story in the business section that had me going.

Yum executive Hearl will retire
Eaton to be new development chief

At the risk of over-explaining the joke, more context may be necessary.

"Yum" refers to Yum! Brands, Inc., based here in Louisville, spun off from PepsiCo and home to such familiar restaurants as KFC, Taco Bell, and Pizza Hut (read more about it on Wikipedia). Given that kind of business, I should think that a chief operating and development officer with a name that sounds like eatin' is an improvement over one with a name that sounds like hurl.

Louisville Speaking of Louisville: the t-shirt pictured on the right (click on the picture for a larger version) is an example of how local folks are both aware and proud of at least one particular feature of their speech: the different pronunciations of the name Louisville itself. Here's what the Wikipedia entry for Louisville, KY has to say about this variability in pronunciation:

The variability of the local pronunciation of the city's name can perhaps be laid at the feet of the city's location on the border between the Northern and Southern regions of the United States. Louisville's diverse population has traditionally represented elements of both Northern and Southern culture.

There's probably more to it than this, but this sounds about right.

Go to the pronunciation section of the Wikipedia entry to listen to examples of these pronunciations -- none of them very natural-sounding, in my view, but you get the general idea. The description of the pronunciation is, linguistically speaking, very unsophisticated ("The name is often pronounced far back in the mouth, in the top of the throat."), but of course a more sophisticated description would not really matter much to most folks visiting that page.

This section also points out that "[n]o matter how Louisville is pronounced, the 's' is always silent", which makes Louisville, KY different from other Louisvilles in the U.S. (in Colorado, Georgia, Mississippi, and Tennessee). This is no doubt due to the fact that Louisville, KY is explicitly named after King Louis XVI of France, and of course in French the 's' is also silent.

One interesting aspect of the pronunciation of Louisville as (roughly) "Luhvul" is the quality of the vowel in the first syllable. To me, the vowel tends to sound pretty much identical to the vowel in the second syllable, right down to the l-sound at the end -- you might represent this orthographically as "Lulvul", but that doesn't quite capture it. The phonetic issue involved here happens to be discussed (with respect to something else entirely) in this post by Ben Zimmer last month.

[ Comments? ]

Posted by Eric Bakovic at 01:13 PM

December 18, 2007

The meaning mining business

Says my colleague Bob Ladd in an email about prepositions (heaven knows why we email each other when we can just walk across the hall between each other's offices):

I can't resist pointing out "the best band in the world" vs. "the best band on the planet". I had noticed some time ago the extent to which "the planet" is taking over from "the world" in such superlative expressions, which I think reflects some change of perspective induced by globalization and/or the space age, but I'd never noticed the prepositions before. Does Lawler's analysis explain this?

The answer is, as far as I can see, no. Perhaps I'm doing Lawler's analysis an injustice, but I think this is the kind of case where the metaphorical meaning mine doesn't self-evidently offer up the explanatory ore that we were hoping for. (John Lawler agrees with me, by the way; this isn't a criticism of his analysis. He and I both think the metaphorical basis of preposition choice only gives you a rough rule of thumb.)

Of course, we could always post-hoc it. We could say that people who claim the Rolling Stones are the best band in the world are using an older metaphor from earlier centuries, when air travel was unknown and the world, experientially, was a 3-dimensional space we are all embedded within, the hills and trees above us and the valleys and streams below us, so in is used, locating the Stones within the container-like 3-dimensional area that is the world of our consciousness; and we could say that when we claim the Stones are the best band on the planet we are speaking like 21st-century, topologically sophisticated people who have seen satellite pictures and Google earth, so on is appropriate, locating the Stones on the 2-dimensional surface of a 3-dimensional sphere.

But I think it's post-hoc. I just don't believe it could have been predicted in advance that the words world and planet would behave exactly this way, given merely semantic information about them. Though I could be wrong: we linguists don't have any privileged access to the real and correct semantic information about a word, we have to hypothesize what it might be and then draw conclusions from our hypotheses and see if they predict incorrectly.

My hunch is that English preposition choice often comes close to being explainable in terms of semantics and metaphors, but when you get near the grammaticized edges it starts to let you down.

Why do we laugh at someone but pour scorn and ridicule on them? Are they more like a point target when we are laughing and more like a flat surface when we dole out ridicule as if it were a gooey humiliating liquid? Could you really predict, if you didn't already know, that given what a joke is, we would say The joke is on you (not at you)?

You could just say "Yes" to all this, I suppose, and I don't know what to tell you about why I don't believe you. I can only supply more examples of where I think metaphor is of doubtful predictive power.

Why do we trust, or believe, or have faith in someone but rely or count on them? Are they more like a container when trust or belief or faith is the issue and more like a surface when reliability is the issue? Don't you put your trust in the ice when you walk out on it? Aren't you relying on a plane when you get in it?

You see where I'm going with this? There is a vein of truth in the semantic/metaphorical analysis of preposition meanings and preposition choice, and it is a very useful guide. But it is important not to work that vein until it runs out (I'm deep in a mining metaphor here, you see) and then try to save its predictive power by sifting through the mine tailings looking for nodules of post-hoc explanatory gold that you can take into town and sell to some sucker who can be convinced into staking his fortune on the meaning mining business.

By the way, why do we invest in a mine but stake our fortune on it?

Posted by Geoffrey K. Pullum at 09:49 AM

The biggest typo in history

At least, it's got to be the biggest one that I ever saw, in purely quantitative terms. Today's NYT Science Section has an article by Dennis Overbye ("Laws of Nature, Source Unknown") that mentions

"...string theory, the alleged theory of everything, which apparently has 10500 solutions..."

This is a reference to the so-called string theory "landscape", which is said to have about 10500 solutions. Presumably Overbye wrote the number correctly, and it was typeset wrong, at least in the online version. And 10500 is such a big number that it's hard to explain how big a typo this is.

If memory serves, the number of elementary particles in the known universe is believed to be about 1080. So if every elementary particle contained its own individual universe of 1080 particles, and every particle in every one of those sub-universes contained its own sub-sub-universe of the same size, there would be 1080x1080x1080=10240 particles in this three-level hierarachy of meta-universes. If every one of the elementary particles in this baroque construction met every single one of the others, that would be 10240x10240=10480 pairings (minus 10240 self-pairings, but never mind). And if every one of these possible pairs of particles went out a quadrillion times, that would be 10495 interactions. And if the participants posted 10,500 pictures of each of these events on Facebook, there would be about 10500 pictures in all.

[There are, of course, much bigger numbers out there. I discussed some of them earlier this year, in a post on MIT's Independent Activities Period ("Charm School (and battling logicians)", 2/3/2007). There's a somewhat more serious discussion here.

But the current competition is for the (quantitatively) largest typographical error in a serious publication. I won't be surprised to learn that Dennis Overbye can be bested in this competition. Pending documentation of a bigger mistake, though, this is the largest goof that I know about.]

[Update -- Andrew Greene writes:

I was just reading that article in my dead-tree edition of the Times, and the 500 was properly superscripted there. I've noticed before that when they convert articles to HTML the Times often fails to convert superscripts and subscripts to the appropriate tags.


[Update #2 -- though I didn't notice it at the time, there's actually a typo of exactly the same sort, but much larger consequence, in a story that I linked to in my post on the Large Number Competition at MIT last January. The article in The Tech ("Professors Duke It Out in Big Number Duel", 1/31/2007) describes the winning effort as follows:

Near the end of the duel, Rayo furiously scribbled on the whiteboard: "The smallest number bigger than any number that can be named by an expression in the language of first order set-theory with less than a googol (10100) symbols."

Although this definition took a bit of tweaking, including what Rayo described as his "second order logic trick," it soon won him the duel.

In this description, the numerical representation of a googol should of course be 10100 rather than 10100. No doubt the author was bitten by the same online production bug that lowered the superscript 500 in the case of Dennis Overbye's article.

The award might still be contested, since The Tech's error is in parentheses next to a written-out form of the number...]

[Update #3 -- Brett Altschul writes:

As both a theoretical particle physicist and a former news and production editor at The Tech, I thought I could provide a little more information about how typesetting errors with exponents occur on the Web.

First, when I read the Times article this morning, it was certainly not the first time I've seen the 10500 number for the number of flux compactification vacua in string theory mangled as 10500. A search of the Times archive comes up with two other articles on string theory with the same mistake. It's sufficiently common in online information about strings that I barely even notice it now.

As to how it happens, I can tell you exactly how it occurs at The Tech's Web site. The Tech uses custom-written scripts to convert the article files into HTML with as little human interaction as possible. The scripts have evolved and gotten much more sophisticated over the years, but the part that creates the HTML from the article copy was basically unchanged from 1994 (when The Tech launched the Web's very first news site) to 2005 (when I lasted checked on it), and it probably is the same to this day. The script works by extracting the meta-tags in the article file and creating the corresponding HTML tags.

There are a number of quirks to the process though. First of all, every type style that we wanted to be able to convert had to be put into the script by hand. While boldface and italic are supported, most other styles are not, because they were not deemed useful for journalistic prose. That included superscripts, which do come up occasionally (and more often in The Tech that in most papers, I would think), but not very often. The software also doesn't know how to handle odd typefaces. In my obituary for Prof. Henry Kendall, I was talking about his and other scientists' work on quarks. I mentioned the Omega^- particle, using the Symbol font and a superscript. The Web version renders this as W- (which is a different particle altogether). Another problem was that the style sheets used by the Quark Publishing System software that produced the print pages could do things with its style sheets that just couldn't be done in HTML. The title (e.g. "NEWS EDITOR") in the bylines is supposed to be all capitals and italicized. Both those functions can be implemented by the style sheets, but there's no way to specify all caps in HTML. So we were supposed to type our titles in all caps, so it would appear correctly in the online version, but not everybody did, and this is reflected in the online versions of some articles.

I assume that the problem with the Times' algorithm is exactly the same as The Tech's. While it's possible to implement superscripts in HTML, nobody bothered to do it.


Posted by Mark Liberman at 07:37 AM

Drawl with this

Way back on National Preposition Day, Roger Shuy wondered what things would be like if we just got rid of those pesky prepositions. In the meantime, I was corresponding with a student who wanted to enroll in a course of mine next quarter. When I granted permission, the student wrote (emphasis on the eggcorn added):

I thought I would end up going through linguistics with drawls next quarter.

I think Roger needn't worry. Even if we tried to get rid of prepositions, they'll just keep coming back.

[ Comments? ]

Posted by Eric Bakovic at 04:53 AM

December 17, 2007

Languagehat revealed!

After five and a half years of prolific (yet always thoughtful) linguablogging, our esteemed colleague Languagehat has finally divulged his "real-life" identity to the world in a recent post. He's Stephen Dodson, and he's the coauthor of a new book entitled Uglier Than a Monkey’s Armpit: Untranslatable Insults, Put-Downs and Curses from around the World. In the past I've been skeptical of compendia collecting putatively "untranslatable" tidbits from the world's languages, since they tend to be sloppy and linguistically misinformed (see here for a takedown of a recent example, The Meaning of Tingo). But Stephen and his coauthor Robert Vanderplank (Director of the Oxford University Language Centre) are just the type of careful and well-rounded scholars that you'd want for such a task, as illustrated by the selection of entries posted on Languagehat.

This satisfying word came over from England as a mere name for an ant, but Americans made it a contemptuous epithet for an “insignificant, contemptible, or irritating person”. From H.L. Davis’s 1935 novel Honey in the Rock, about pioneer Oregon: “Anybody who called owning horses disorderly conduct was a liar and a pissant.”

prumphænsn (PRUHMP-hine-s’n)
This delightful insult literally means ‘fartchicken’.

And a Slovak one they cut from the manuscript:
Pojebali kone voz! (POH-yeh-buh-lee KOH-nyeh VOHZ) (Slovak)
This lively expression, ‘May the horses fuck the carriage,’ illustrates the fact that Slovak cursing makes greater use of sexual terms than that of the Czechs.

Unfortunately the book is not yet available in the US (and the American version of Amazon doesn't seem to know about it yet), but readers in the UK, Australia, and New Zealand can already pick it up. I look forward to Stephen's introduction to the US edition, in which he quotes "Pushkin, Mark Liberman, and [his] nonagenarian mother-in-law"! A formidable lineup indeed.

Posted by Benjamin Zimmer at 10:11 PM

From the Departmenths of Cupertino

Passed on by Victor Steinbok, from Eugene Volokh's blog of 12 December:

I Smell an Auto-Correction Glitch:

From the American Association of Law Schools program:

In 1934, in the Departmenth of the greatest depression in history ....

I have a posting coming on Scunthorping in automated taboo avoidance.  One of its points is that substring search-and-replace programs have the potential for considerable evil, as above.

[Correction, 12/18: Loads of people have written to explain that the error here was probably not in automated correction (involving search-and-replace), but in automated COMPLETION.  As Jim Gordon put it (in the anthropomorphic way we have all come to talk about programs): "the program thinks it knows what the word will be as you begin to type it. This is then compounded, perhaps because the typist is looking at either the draft text or the keyboard, rather than reading what he/she writes."  The idea is that you type "dep" and that's explanded to "Department" (complete with caps), but you press on with the rest of "depth".  This is not an entirely satisfying account, however, because it would predict "Departmentth", while automatic substution of "Department" for "dept" would (correctly) predict "Departmenth".]

Posted by Arnold Zwicky at 08:18 PM

It's not just to God's ear(s)

Kyle Hutchinson writes:

I read your Language Log posting about "From X's mouth to God's ear" and you had me pretty thoroughly convinced that the phrase is not very snowclonish at all, until a few moments later when I happened to read a post from (writer/actor/Mac guy) John Hodgman's blog. In response to a email hoping for an end to the screenwriters' strike, he writes, "FROM WALTER'S ELECTRONIC MOUTH to the writers' and producers' ears."

So it's not just to God's ear(s); for some writers, there's now an open Recipient slot as well as an open Source slot:

From X's mouth/lips to Y's ear(s). 

It's looking pretty snowclonish.

Hutchinson (who was of course primed by my posting to notice such things) continues:

... a Google search on "from your lips to the" (or similar truncated versions) makes it look like the entity to whom the ears belong is a second open slot in the phrase. All the alternates for the "God" slot do seem to be authority figures or at least entities with decision-making power.
Here are a few of the ear-owning entities I came across:

the Hockey Gods
the Flying Spaghetti Monster [the deity of the Pastafarian religion -- AMZ]
the CIA
the Florida legislature
the publisher
TPTB [The Powers That Be -- AMZ]
my Muse
my editor
my seller

The X slot seems to be mostly second-person, but here's one like the Hodgman example, with inventive items in both slots:

From [Paul] Krugman's lips to the Flying Spaghetti Monster's ears... Hopefully, Mr. Krugman's words are not just the ravings of a wild-eyed economist.  (link)

Posted by Arnold Zwicky at 07:52 PM

Getting to know jack

Recently, six of my friends and I were talking at our bi-monthly men's group meeting (so now you know how retired men spend their time) and one man, a retired Army colonel with some definite opinions about military and diplomatic misadventures overseas, told us that he was mystified when his 20 year-old son informed him, "Dad, you know jack about what's going on." My friend thought his son had misspoken by leaving out the "don't" here. But he was wrong. His son said it exactly the way he wanted to. The colonel wondered aloud, "how can you say something positive that means a negative?" It happens. Language is a forever changing, creative, dynamic thing and it surprises us all the time with what it is capable of doing. But I had to admit that the positive "know jack" was new to me too. Language Log has dealt with issues of negative polarity and negative by association several times before, especially with expressions such as "care less," "still unpacked," "knows squat," and "cannot be underestimated," for example here and here.

But, as far as I can tell, we haven't dealt with the issue of "he doesn't know jack" versus "he knows jack," which seem to mean the same thing. So I went to my handy Google and typed in "knows jack about" and got over 9,000 hits, most of them conveying a  negativity that, at least to my sheltered mind, would  seem to need a  "doesn't." For example:

Carnivore knows jack about DSL, LLU or telecommunications in general...the guy knows jack. Go home little boy.

...nary a one of us knew jack about some of the finer points of...

Kornheiser knows jack about me or my crew.

Lol, another Gamestop employee who knows jack about design...

Turns out the guy knows jack about football.

So I went back to Google and tried, "doesn't know jack about" and discovered that I was right. It's more commonly used, but not by a whole lot--only by about 4,000 more hits. For example:

Nicholson doesn't know jack about women.

CBS doesn't know jack about women.

Rudy Guliani doesn't know jack about Iraq.

Bill Simmons doesn't know jack about bad calls.

Concerning the way "could care less" is conveyed as "couldn't care less," Mark Liberman says that the "not" was added by means of what John Lawler calls negation by association. Similarly, "I could give a damn" becomes "I couldn't give a damn" and "he knows squat" becomes "he doesn't know squat."

So here's another one to add the collection you may or may not know jack about.

UPDATE: Boy, was I wrong here. Several readers have enlightened me about something I never knew--that "jack" stands for "jack shit." I need to get out more. What this seems to mean, as John Cowan so helpfully pointed out, is that "jack" is polysemous, with two senses that are actually very similar but have very different polarity implications: "nothing" and "almost nothing." One has a negative polarity, as in "he knew jack about it," and the other has a positive polarity, a in "he doesn't know jack about it," which means he knows very little. So "jack" can substitute for "anything" in wider scope polarity cases. Miles Townes and Josh Millard suggest that there may also be a politeness feature in the exchange between the son and his father-- a kind of parental respect. All of which goes to show that even 77 year-old fossils like me can keep on learning. Thanks, readers.

Posted by Roger Shuy at 05:24 PM

Coming to a theater near you?

Earlier this year I told you about the publication of the book When Languages Die by K. David Harrison, then I told you about some of the media coverage of the work behind the book, and then I told you about Harrison's appearance on The Colbert Report. I'm now happy to report to the Language Log readership that an NSF-supported documentary starring Harrison and his colleague Greg Anderson is scheduled to premiere at the Sundance Film Festival in January!

The documentary, produced by Ironbound Films, is simply called The Linguists (follow the link for a preview). Here's a brief synopsis, copied from this LINGUIST List announcement [update: modified here] from the producers:

It is estimated that of 7,000 languages in the world, half will be gone by the end of this century. THE LINGUISTS follows David Harrison and Gregory Anderson, scientists racing to document languages on the verge of extinction. In Siberia, India, and Bolivia, the linguists' resolve is tested by the very forces silencing languages: institutionalized racism and violent economic unrest. David and Greg's journey takes them deep into the heart of the cultures, knowledge, and communities at risk when a language dies.

This is the first time that an NSF-supported documentary has made it to Sundance. Congrats and kudos to David and Greg, to Ironbound Films, and to the NSF for supporting the work in the first place. Let's hope it leads at the very least to increased awareness of the dangers facing the diversity of language and culture, and to more work being done by more people to manage those dangers and that diversity wisely.

[ Comments? ]

Posted by Eric Bakovic at 04:16 PM

The right to keep and bear adjuncts

Now that the U.S. Supreme Court has taken on District of Columbia v. Heller, commas and clauses are in the news. Adam Freedman explains why ("Clause and effect", New York Times, 12/16/2007):

The outcome of the case is difficult to handicap, mainly because so little is known about the justices’ views on the lethal device at the center of the controversy: the comma. ... The official version of the Second Amendment has three of the little blighters:

A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.

Freedman observes that "there could scarcely be a worse place to search for the framers’ original intent than their use of commas", because "in the 18th century, punctuation marks were as common as medicinal leeches and just about as scientific". Instead, he argues,

The best way to make sense of the Second Amendment is to take away all the commas (which, I know, means that only outlaws will have commas). Without the distracting commas, one can focus on the grammar of the sentence.

Three years ago in Language Log, Geoff Pullum applied the OICTIQ principle to those same second-amendment commas, taking the distracting first and third commas out of play and leaving the second one intact, consistent with modern punctuation standards. And in passing, Geoff also offered a brief analysis of the crucial grammatical issue:

The sentence begins with what is traditionally known as an absolutive clausal adjunct — a gerund-participial clause functioning as an adjunct in clause structure. It is understood as if it began with since or because or in view of the fact that (notice that Our situation being hopeless, we surrendered means "Since our situation was hopeless, we surrendered).

Freedman suggests that the decision that the Supreme Court is reviewing is based on a different grammatical analysis:

The decision invalidating the district’s gun ban, written by Judge Laurence H. Silberman of the United States Court of Appeals for the District of Columbia Circuit, cites the second comma (the one after “state”) as proof that the Second Amendment does not merely protect the “collective” right of states to maintain their militias, but endows each citizen with an “individual” right to carry a gun, regardless of membership in the local militia.

How does a mere comma do that? According to the court, the second comma divides the amendment into two clauses: one “prefatory” and the other “operative.” On this reading, the bit about a well-regulated militia is just preliminary throat clearing; the framers don’t really get down to business until they start talking about “the right of the people ... shall not be infringed.”

Freedman associates this analysis with work by Nelson Lund:

Nelson Lund, a professor of law at George Mason University, argues that everything before the second comma is an “absolute phrase” and, therefore, does not modify anything in the main clause. Professor Lund states that the Second Amendment “has exactly the same meaning that it would have if the preamble had been omitted.”

And Freedman strongly disagrees:

The best way to make sense of the Second Amendment is to take away all the commas (which, I know, means that only outlaws will have commas). Without the distracting commas, one can focus on the grammar of the sentence. Professor Lund is correct that the clause about a well-regulated militia is “absolute,” but only in the sense that it is grammatically independent of the main clause, not that it is logically unrelated. To the contrary, absolute clauses typically provide a causal or temporal context for the main clause.

The founders — most of whom were classically educated — would have recognized this rhetorical device as the “ablative absolute” of Latin prose. To take an example from Horace likely to have been familiar to them: “Caesar, being in command of the earth, I fear neither civil war nor death by violence” (ego nec tumultum nec mori per vim metuam, tenente Caesare terras). The main clause flows logically from the absolute clause: “Because Caesar commands the earth, I fear neither civil war nor death by violence.”

Likewise, when the justices finish diagramming the Second Amendment, they should end up with something that expresses a causal link, like: “Because a well regulated militia is necessary to the security of a free state, the right of the people to keep and bear Arms shall not be infringed.” In other words, the amendment is really about protecting militias, notwithstanding the originalist arguments to the contrary.

Prof. Lund's analysis seems to be developed at greatest length in "A Primer on the Constitutional Right to Keep and Bear Arms", (Virginia Institute for Public Policy, 2002). But in this document, at least, Lund offers essentially the same grammatical analysis as Freedman, though with a different rhetorical emphasis:

... the grammar of the Second Amendment emphasizes the indefiniteness of the relation between the introductory participial phrase and the main clause. If you parse the Amendment, it quickly becomes obvious that the first half of the sentence is an absolute phrase (or ablative absolute) that does not modify or limit any word in the main clause. The usual function of absolute phrases is to convey information about the circumstances surrounding the statement in the main clause, such as its cause. For example: "The teacher being ill, class was cancelled."

Although Lund's grammatical analysis is the same as Freedman's, his conclusion is different:

The importance of this can be illustrated with a simple example. Suppose the Constitution provided:

A well educated Electorate, being necessary to self-governance in a free State, the right of the people to keep and read Books, shall not be infringed.

This provision, which is grammatically identical to the Second Amendment, obviously means the following: because a well educated electorate is necessary to the health of a free state, the right of the people to keep and read books shall not be infringed. The sentence does not say, imply, or even suggest that only registered voters have a right to books. Nor does the sentence say, imply, or even suggest that the right to books may be exercised only by state employees. Nor does the lack of identity between the electorate and the people create some kind of grammatical or linguistic tension within the sentence. It is perfectly reasonable for a constitution to give everyone a right to books as a means of fostering a well educated electorate. The goal might or might not be reached, and it could have been pursued by numerous other means. The creation of a general individual right, moreover, would certainly have other effects besides its impact on the electorate's educational level. And lots of legitimate questions could be raised about the scope of the right to books. But none of this offers the slightest reason to be mystified by the basic meaning of the sentence.

In fact, Lund, Pullum and Freedman are all in agreement about the grammar of the crucial sentence, and even about some aspects of its interpretation. All three see the first clause as an absolute phrase (Lund)  or an absolutive clausal adjunct (Pullum) or an absolute clause (Freedman). Both Lund and Freedman cite the relationship to the rhetorical effect of the Latin ablative absolute. All three indicate that the meaning of the participial first clause is similar to that of a tensed clause introduced by because.

Could anyone possibly disagree with this analysis? Apparently so -- Freedman suggests that "a group of anti-gun academics" once argued that the subject of the main clause is actually "a well-regulated militia":

In a 2001 Fifth Circuit case, a group of anti-gun academics submitted an amicus curiae (friend of the court) brief arguing that the “unusual” commas of the Second Amendment support the collective rights interpretation. According to these amici, the founders’ use of commas reveals that what they really meant to say was “a well-regulated militia ... shall not be infringed.”

This would be an extraordinary example of analytic incompetence, if true, worthy of its own Language Log post. But given Freedman's somewhat unfair representation of Lund's argument, it needs to be checked. Unfortunately, his description of the case and the brief is too vague to allow me to find it during what little remains of my breakfast hour. If you can provide a more specific citation, please tell me.

Aside from this bizarre possibility, all parties to the argument seem to agree about the syntax and basic semantics of the amendment. Everyone agrees that the crucial sentence might as well be recast as

Because (or since, or in view of the fact that) a well-regulated militia is necessary to the security of a free state, the right of the people to keep and bear arms shall not be infringed.

The question is, to what extent does the meaning of the adjunctive clause contextually restrict the interpretation of the words right and people in the main clause? (And perhaps the meaning of the phrase bear arms as well.) The commas certainly don't decide this. And as far as I can tell, the syntax doesn't decide it either.

[Update -- a footnote in this brief does seem to make the argument that Freedman cites, more or less, via the rather vague assertion that

Under ordinary usage, the first and third commas in the Amendment are unnecessary. If these commas had not been inserted, it would be possible to understand the Well Regulated Militia Clause as simply explaining the rationale for the Bear Arms Clause (the Amendment would then read: "A well regulated Militia being necessary to the security of a free State, the right of the people to keep and bear Arms shall not be infringed."). But the commas are in fact in the text proposed by Congress and ratified by the states, and they prevent this reading.

The author's preferred grammatical analysis is not clear to me, and I don't have time for further investigation this morning. In any case, this document is dated 1999, so presumably Freedman had something else in mind.]

[Jon Weinberg writes:

The brief you identified in your update is indeed the one; it was filed in 1999 and signed by some of the biggest names in the Con Law teaching biz, in connection with a case that the Fifth Circuit didn't end up deciding until 2001. Be aware that the footnote you found continues onto the next page, so that the rest of it reads:

The first unusual comma --between "Militia" and "being" -- forces the reader to search for a verb for which "Militia" is the subject. That verb does not appear until "shall not be infringed" near the end of the Amendment. The second unusual comma -- between "Arms" and "shall" -- sets off the verb phrase "shall not be infringed" from the preceding language; it suggests that the subject for this verb phrase is not simply "the right of the people to keep and bear Arms." The grammatical effect of these two unusual commas is to link "A well regulated Militia" to "shall not be infringed" to emphasize, in other words, that the goal of the Amendment is to protect the militia against federal interference. The Constitution was drafted with great care, and (unlike much legal writing from the Founding period) its use of punctuation generally conforms to modern conventions, suggesting that the commas in the Second Amendment are not haphazard but rather deserve scrupulous attention.

It should be clear from looking at the brief that the authors' focus wasn't grammatical (orthographical?) or linguistic analysis: the reasoning in the footnote looks like a stray thought, not necessarily well-developed, that was in a footnote precisely because the authors didn't think it was powerful enough to include in text. The main thrust of the brief is an argument from historical evidence (like the drafting history of the amendment) that the folks who drafted and enacted the second amendment didn't in fact understand it to convey a right to bear arms except for the purpose of service in state-run militias.

(Jon is certainly right that the focus of the brief is historical rather than grammatical. But the footnote is indeed making a grammatical claim along the lines that Freedman suggested, namely that "a well-regulated militia" is part of the subject of "shall not be infringed".)

Several others have sent useful information as well, which alas I won't have time to summarize until this evening.]

Posted by Mark Liberman at 06:39 AM

Victims and etymology

A side point in my posting on Daniel Cassidy and how the Irish invented (American English) slang is Cassidy's presentation of himself as a maligned victim, unrecognized and disdained by the Anglo academic linguistic establishment, especially that demon, the OED -- a stance that makes him appealing to (some) Irish-Americans and indeed to (some) Irish-in-Ireland.  After all, the Irish have legitimate historical grievances against the English, so Cassidy can tap into them in advancing his absurd etymologies.

Now that That Holiday is approaching, it's time to consider, once  again, victim-playing by fundamentalist Christians over people who avoid the word Christmas.

But first, Cassidy.

The Irish were badly treated (to put it mildly) by the English, and the Irish in the U.S. fared poorly for a long time ("No Irish Need Apply"): they didn't even count as "white" for some time, which is how we could get a book entitled How the Irish Became White (Noel Ignatiev, 1996).  There's plenty of room for grievance here.

So you can understand Cassidy's in-your-face defiance.  That doesn't, however, lend credence to his etymological fantasies.

As for Christmas, this topic has come up every December for a while now.  Two years ago on Language Log, Geoff Pullum weighed in, mocking Jerry Falwell's claim that people who said "happy holidays" instead of "merry Christmas" were "trying to steal Christmas" and treating Christians with contempt.

Last year, Geoff Nunberg added his voice (this time the complainant was Bill O'Reilly, describing the use of "happy holidays" as "insulting to Christian America"), noting how bizarre it was for some Christians to insist that everyone should be greeted in the way that they (these Christians) wanted.  Surely this is a kind of Christian triumphalism: why should store clerks be obliged to treat everyone as if they were Christians?  [Stop!  Don't write me about how Christmas, the holiday, really isn't Christian; I'll get to that.  I can't do everything at once.]  This is a religiously diverse society, so it makes sense for people who engage random members of the public to use, if they wish, neutral formulations of what they say; that's merely polite.  This is not necessarily a matter of avoiding offense.  I'm not OFFENDED by the word Christmas, or by the holiday itself, though I no longer celebrate the holiday and get mightily annoyed by people who insist that this is the most warm and wonderful time of the year and I should get with the program.

(In a previous life, I celebrated the holiday in splendor.  Why I no longer do this is a possibly interesting story, but, frankly, it's my business, and it's not relevant here.  My point is that if I now want to opt out of the Christmas scenario, that's my right.  I'm not lashing out at people who wish me "merry Christmas", nor do I view those who wish me "happy holidays" as allies of some sort.  [As for the holiday itself, my current stance is that it's best celebrated by a meal at a Jewish delicatessen or an excellent Chinese restaurant.  Since there's one of the latter a few blocks away from my house and none of the former anywhere close, my little family goes Chinese.])

Summing up, as Geoff P. pointed out two years ago, the holiday of Christmas has become, in the U.S. and U.K. at least, primarily a commercial event, with its own customs, few of which have recognizable connections to Christian beliefs or practices.  (Note: "few of which".  Nativity scenes, for example, might give anyone pause.  As would the intensely Christian content of the Christmas carols -- not "Christmas songs" -- that are everywhere in public during the month or so before Christmas.)

The War on Christmas folks object to the removal of Christ from Christmas.  (If you're going to be an etymological Feinschmecker, please note that Jesus is  a name -- cf. Jehoshua, Joshua -- while Christ is a title or epithet.  Which is to say that if you're seriously into the Etymological Fallacy, then insisting on the title Christ, and words related to it, is insisting that other people recognize Jesus as "the anointed one", the Son of God and Our Lord.)

And now we get to 2007.  Jon Lighter (on ADS-L, 12/1/07) reported the latest episode of victomology and etymology, which I repeat here with only minor editing:

Each year, Fox News's front-line correspondents fan out across the world to report on the terrifying "War on Christmas," as "the PC crowd" tries to intimidate Christians into denying their God or Santa or something.  Today they reported the latest offender - Janet Napolitano, the elected governor of Arizona, who described the state's Christmas tree as "the Holiday Tree."

To understand the social, political, and theological ramifications of the Democrat governor's stance, one must turn to radio talk-show hosts Inga somebody and somebody Ryan, who debated the nuances at some length, Ryan saying that, as a Christian, he thinks people should call their own tree whatever they want and forget about the issue. "Holiday tree" is fine with him. (He must be closet PC). Inga, however, who is presumably even more of a Christian, argued that a Christmas tree is "definitely a secular symbol" because "I couldn't find any references to a Christmas tree in the Bible."  Therefore the secular governor should have stayed strictly secular by calling the tree a "Christmas tree."  Follow me so far?

Inga then revealed that the governor was, in fact, illicitly mingling church and state by using the "PC" terminology "holiday tree."  Why?  So obvious! "Because 'holiday' means 'holy day,' she stated. Thus the governor was  trying to sneak her own (unidentified but possibly "Secular Humanist") religious beliefs into the lighting ceremony by declaring Christmas a "Holy Day." Follow me so far?

What distinguished this discussion from the usual level of political debate in America was Inga somebody's brilliant observation that "holiday" really means "holy day," an assertion somebody Ryan could not refute, moot, or dispute. Like a deer in the headlights, he never dared to say, "No, Inga. That's what it meant many centuries ago, but it means that no longer.  Wise up, troll."

I believe Inga's point was essentially to honor the Etymological Fallacy. Keep it holy.

I'll content myself with noting that the United States currently has ten "federal legal holidays" (when federal offces are closed, and your bank probably is too):

New Year's Day
Martin Luther King Day
Presidents Day
Memorial Day
Independence Day
Labor Day
Columbus Day
Veterans' Day
Thanksgiving Day
Christmas Day

Exactly one of these, Christmas Day, is in any sense a "holy day", for Christians or anyone else.  Meanwhile, people in the U.K. can use holiday where people in the U.S. use vacation ("go on holiday", the old "holiday camps", etc.).  The holy in holiday is at best the whiff of a ghostly presence these days, and it can't be revived by people who stamp their feet and insist that etymology is destiny.

Posted by Arnold Zwicky at 02:07 AM

December 16, 2007

Getting rid prepositions

National Preposition Day is now almost over and it's  time to stop celebrating and to reflect on the many problems prepositions give us. I'm thinking maybe, just maybe, it's time to get rid of them altogether. So I'll take a crack at it here:

There have been proposals reducing and improving English years and years. One these was eliminate the past tense, example. Years failure have shown us that this is little value. It's not feasible do this. We can't neglect what's happened the past. But the use prepositions is hardly important. One thing, they're difficult children learn and non-native speakers have lots trouble them. As me, I'm all it. Dumping prepositions trash basket would shorten my emails and this would save paper that we use produce books. But you may ask what this would look like. Consider the title this important book linguistics:

The Cambridge Grammar the English Language

Huddleston and Pullum should like this one. Okay, only one word is saved, but this book is far too expensive anyway most us purchase and every word saved will bring down the price.

Think also Lincoln's Gettysburg Address, the beginning:

Four score and seven years ago our fathers brought forth this continent a new nation, conceived liberty, and dedicated the proposition that all men are created equal.

Not bad, eh?

And also the way it would end:

--that the nation shall, God, have a new birth freedom--and that governments the people, the people, and the people, shall not perish the earth.

I kinda like the repetition, don't you?

And finally, think the Lord's Prayer:

Our father, who art heaven, hallowed be thy name. Thy kingdom come, thy will be done, earth as it is heaven. Give us this day our daily bread and forgive us our trespasses, as we forgive those who trespass us. Lead us not temptation, but deliver us evil, for thine is the kingdom, the power, and the glory forever. Amen.

It seems okay, except maybe for the "deliver us evil" part. Hmm. Maybe this isn't such a good idea after all.

Posted by Roger Shuy at 11:37 PM

What have you done with God's ear?

Back on 23 September I surveyed some items on the fringes of Snowclonia, among them

God's Ear: From X's lips/mouth to God's ear.

Body Snatcher: Who are you and what have you done with X?

Time for an update.

God's Ear.  What I said before:

... I just noticed "from X's lips/mouth to God's ear" in a posting of Geoff Pullum's: "From [Stanley] Fish's mouth to God's ear."  Substantial number of hits, but it's not at all clear what people are using the figure to convey: there are some occurrences of "from your lips to God's ear" that seem to convey nothing more than that God hears everything you say, but in most occurrences of the figure something more complex is going on.

Almost immediately, Mank Mandel supplied a lead:

"From your lips to God's ear" is common among at least some Jews as a reply to an expression of hope or good wishes. I speculate that it may be a translation of a common Yiddish expression ...

(Also appearing in English as "from your mouth to God's ear".)  That is, what's conveyed is 'may God hear your words', contextualized as polite receipt of praise.  The expression has clearly been extended for some speakers of English (many of them not Jewish) to a wider set of contexts.  And at some point people varied the expression by allowing possessives other than your, to convey 'may God hear X's words'.  That would be the snowcloning moment (but see below).

Ben Sadock then supplied a Yiddish original: 

Fun dayn moyl in gots oyern.  'From your mouth to God's ears.'

Some versions have the expanded in gots oyern arayn.  The ones I've found all have moyl 'mouth'.   Most have the plural oyern, rather than the singular oyer, and the second-person possessive, but other versions have been reported:

from your mouth to God's ear (or ... to the Gates of Heaven).  May God hear what I/you say and act upon it.  Or, as defined in The Taste of Yiddish by Lillian Merwin Feinsilver (1970): 'Fun zayn moyl, in Gots oyer.  Lit, From his mouth into God's ear.  May God hear what he has said (and fulfil it)!'  The 'Gates of Heaven' may be an Arab version. ...  The first expression my stem from Psalm 130:2: 'Lord, hear my voice: let thine ears be attentive to the voice of my supplications'.  The phrase also appears in the orthodox Jewish prayer book.

Goldeneye is the best movie in the series since Diamonds Are Forever ... I told him I thought it would take $30 million in its opening weekend, to which he replied: 'From your lips to God's ears.'  Evening Standard (London) (4 October 1995)

(Nigel Rees, Cassell's Dictionary of Word and Phrase Origins (2002), p. 90)

No doubt there are other variants in Yiddish.

In English, the raw Google webhits favor lips over mouth and show a favoring of concordance in number between the two nouns: lips ... ears (9,100) over lips ... ear (3,360), and mouth ... ear (2,330) over mouth ... ears (650).

Now, is this a snowclone -- or an idiom with an open slot in it?  In principle, a similar question arises for every putative snowclone that has only one open slot, for instance:

What is this X of which you speak?
That's why they call it X.
We are all X now.
X is hard; let's go shopping!
X, call your office.

The once and future X.
Stupid X tricks.

Most of these are full sentences, though some are NPs.  Now, most expressions traditionally labeled "idioms" are on a smaller scale, but many of them have open slots in them.  Here are a few such VP idioms with make:

make something seem like a picnic
make someone sit up and take notice

make someone's skin crawl
make someone's day

(Note that the last two have a possessive open slot, just like the God's Ear formula.  There's a huge number of idioms with possessive open slots in them.  Please don't write me about further examples of open-slotted idioms; idiom dictionaries are packed with them.)

Formulaic expressions that are full sentences are traditionally labeled "proverbs" rather than idioms, and they are nearly completely fixed, with no open slots in them (though ANY formula is available for the occasional playful variation, of course):

A fool and his money are soon parted.
Fools rush in (where angels fear to tread).
A stitch in time saves nine.
He who pays the piper calls the tune.

So one way to think of God's Ear is as a cross between an idiom with an open slot in it and a proverb -- that is, as a big idiom (a sequence of two PPs conveying a proposition) or as an open-slotted proverb (of relatively recent invention).  Indeed, snowclones in general might be seen this way.

Body Snatcher.  What I said before:

... over on ADS-L I recently started a discussion of "Who are you and what have you done with X?", which I'd contemplated using in a recent posting that mentioned my granddaughter's alarm at being confronted by her mother speaking German: "Who are you and what have you done with my mother?"  The figure is canonically used in situations where the speaker is confronting someone who appears to be X but observes that this person lacks some property or properties historically characteristic of X; think Invasion of the Body Snatchers.  Things are a bit tricky, though, because there are perfectly straightforward uses of such expressions, as sequences of ordinary questions.  In any case, everybody seems to think that the figure originated in a specific quotation (not in IBS, so far as I can tell), but so far no one has a good candidate.  And there are lots of instances out there.

In what follows, I'll refer to the literal understanding of the two questions in sequence as the "at face" interpretation and the figurative understanding as the "impostor" interpretation.

The ensuing ADS-L discussion (and e-mail following up on my September Language Log posting) went in many directions.

First, no one has found the figure, or anything like it, in any version of Invasions of the Body Snatchers.  The IDEA is there, but not any formula.

Second, lots of people suggested some influence from the famous James Thurber cartoon, with the caption "What have you done with Dr. Millmoss?":

But the question in the cartoon is entirely literal; there's no suggestion that the hippopotamus is trying to pass itself off as Dr. Millmoss (and every suggestion that the hippopotamus has eaten Dr. M.).  The Thurber cartoon, wonderful as it is, is a red herring.

Next, a similar digression that turns on mere similarity in form: Linda Wilkinson citing the 1971 movie title Who Is Harry Kellerman and Why Is He Saying Those Terrible Things About Me?  Though there might be the beginning of another formula here.

Meanwhile, the hounds of ADS-L were on the antedating trail, trying to find earlier and earlier instances of the impostor use of the expression.

[Side note: antedating and sourcing can be fascinating, but for a variety of reasons they're mostly not my thing.  Establishing that some usage is in fact current for some group of speakers shouldn't commit you to a study of where, when, and why it originated and how it spread.  Not that these aren't interesting questions.]

This search into the (recent) past pulled up an awful lot of examples that could be seen as at-face uses of Body Snatcher.  That's what happens when you search on strings.  But several landmarks appeared:

Chris Waigl noted that there was a small flurry of impostor cites in 1997/8, possibly based on its appearance in the the film Ever After (1998).  From the imdb:

Henry: Mother, Father, I want to build a University, with the largest library on the continent, where anyone can study, no matter their station!
King Francis: All right... Who are you... and what have you done with my son?
Henry: [laughs] Oh, and I want to invite the gypsies to the ball!

Chris and I supposed that this was a secondary spread of an existing snowclone.  And there are earlier impostor cites.  A few, going back in time:

Who are you, and what have you done with the real Jessica?  (Deadly Voyage by Francine Pascal, 1995, found by Jeff Prucher)

WHO ARE YOU AND WHAT HAVE YOU DONE WITH NOT AL CRAWFORD?  ("A. Jing Hippy" responding to "Not Al Crawford" on alt.peeves, 1991, found by Ben Zimmer)

Who are you really ? And what have you done with Ross Alexander ?  (on, 1990, found by Towse)

... when I took 500 mg. of diphetocaine which made me alert at bedtime when Joey discussed his day at the office -- after which I
took 750 mg. of osculavenol and slept soundly through the night. This morning I work up, dragged myself to the bathroom -- and found that I was out of everything. When Joey came down for breakfast he screamed: 'Who are you and what have you done with Dorothy?' (Hold Me! by Jules Feiffer, 1977, found by Towse)

Somewhere along the line, I noted that there was no particular reason for the expression I started with to be the sole way of conveying the impostor idea.  Had it really become formulaic?

This turned out to be a hard question to answer.  One thing quickly became clear: other ways of expressing the impostor idea were around:

who are you, and where is my husband? I came home from work today and the house was sparkling clean...  What do I check to make sure he hasn't become a pod person? (link)

I'm pretty sure there isn't a chapter in a parenting book as to what to do when your daughter wakes up one morning and says "I'm not eating those ribs and please remove the pork fat and back from my collards. Thanks." My father would look at me like, who are you and what happened to my daughter who could suck down a plate of ribs? (link)

I have no easy way of determining how often one formulation is used versus others.  It might well be that the variant I started with is the most common.  But it's beginning to look like this is not so much a figure of speech, tied to particular wording, as a figure of thought -- much like systems of metaphor.

Posted by Arnold Zwicky at 09:37 PM

Dimensions, metaphors, and prepositions

National Preposition Day continues with a guest post by John Lawler, who gives the following brief account of how the prepositions at, on, and in work in English:

The usage of in/on/at, like that of most prepositions, is metaphorically locative and, in the case of these three, dimensional.

The basic principles are simple:

in relates to a 3-dimensional container
on relates to a 2-dimensional surface
at relates to a 1-dimensional location

Naturally, usages can get very complex, especially the idiomatic varieties. However, the metaphor theme "time is space" extends these principles pretty straightforwardly.

The experiential key here is that a day (one's current waking period) is metaphorized as a surface on which one is walking (the slogan is "ontology recapitulates physiology"). That accounts for on Thursday, on the seventh, on his birthday.

The smaller time units are then locations on that surface, whence at noon, at the moment, at 8:15:44.23, 2/17/44, while the larger ones are containers for days, whence in March, in 2007, in the twentieth century.

Similar remarks apply to non-temporal uses (see Charles Fillmore's Deixis Lectures):

on the lawn ~ in the yard
in space ~ on (the) earth,
in the air ~ on the ground
on (the) sea ~ in (the) water ~ on (the) land
in some non-Abelian cohomology group

Interesting special cases come from vehicles:

in a car/plane/bus/canoe/boat/kayak
on a raft/bicycle
on public transportation
on a (scheduled) boat/train/plane/bus

I suspect that the schedule metaphorizes a (2-D) passenger list, though I'm not yet convinced that this is the right analysis.

Guest post by John Lawler

Posted by Geoffrey K. Pullum at 06:21 PM

In the car, on the bus

Continuing with our celebration of National Preposition Day, let me note the following judgments about prepositions with objects denoting vehicles. The preposed asterisks do not necessarily mean that the string of words is utterly ungrammatical, but just that it is grammatical only on some fairly bizarre understanding — it is not just an alternative to the other example in its pair. The degree of acceptability varies: (1b) is thoroughly strange and suggests that she was strapped to a roof rack, and (2a) is even more bizarre. The ‘?*’ in front of (5a) and (6a) indicates that (in my judgment) the status is not so clear.

(1)a. She left that morning in a car. b. *She left that morning on a car.
(2)a. *She left that morning in a bike. b. She left that morning on a bike.
(3)a. She left that morning in a taxi. b. *She left that morning on a taxi.
(4)a. She left that morning in a van. b. *She left that morning on a van.
(5)a. ?*She left that morning in a bus. b. She left that morning on a bus.
(6)a. ?*She left that morning in a train. b. She left that morning on a train.
(7)a. She left that morning in a boat. b. She left that morning on a boat.
(8)a. *She left that morning in a ship. b. She left that morning on a ship.
(9)a. She left that morning in a single-seater light plane. b. *She left that morning on a single-seater light plane.
(10)a. *She left that morning in a United Airlines 747. b. She left that morning on a United Airlines 747.

The point is not to get people to send me hundreds of emails about how they found a place where someone says "have you ever wanted to live in a bus" or "see a band record their album in a bus" (please, no emails). The conditions here are not absolute, and I know that. But there really are some gross differences between the ways we refer to travel in or on various modes of transportation. And it really does not look like they are fully and clearly predictable on the basis of meaning, or vehicle architecture, or a combination of both. Language is not that orderly.

One other thing. Consider (4b). Suppose there were a scheduled van service between a central location and some factory on the outskirts of the city. Now it sounds a bit better, doesn't it? My initial hunch is that (i) normal position for a person with respect to the vehicle, (ii) size of the vehicle relative to a person, and (iii) regularly scheduled nature of the transportation may all be relevant to some extent.

Posted by Geoffrey K. Pullum at 04:56 PM

Preposition day at Language Log

Nobody informed me that this might be National Preposition Day, but since we've already had two posts about English prepositions, I thought I'd chime in with another one.

Mark Liberman's post about the use of "on" in the sentence, "Java is found everywhere--on mobile phones, desktop computers, Blu-ray Disc players, set top boxes. and even in your car," reminded me of an insurance contract dispute I once worked on. Mark found it odd for Java to be "on" a disc player. It sounds odd to me too. It seems better to say it's "in" a disc player. English prepositions can drive many people to distraction, non-native speakers in particular. For example, some people say "I'm sick to my stomach" while others say "I'm sick at my stomach." Still others, like me, say "I'm sick on my stomach." Okay, there are some dialect differences here but oddly enough, I've found that it's mostly non-native English speakers who choose what might be considered the most logical way to say this: "I'm sick in my stomach." But I digress.

The insurance case that Mark's post called to my memory took place in the early 1990s. The owner of a jewelry sales business  took out a policy to cover loss and theft. Soon after this, a salesman for this company parked and locked his car outside of a jewelry story and someone broke into the locked trunk and made off with a large and expensive amount of jewelry from it. The salesman looked out of the window of the store, saw the theft taking place, and ran out to try to stop the thief, but he got away. The owner of the company, the policyholder, thought his insurance policy would cover the loss. But wait. Like most policies of that type, this one contained an exclusion section containing ten items, the tenth of which said.

10. We do not cover property in or on a vehicle that is not attended. An attended vehicle has a person actually in or on the vehicle. This person must be you, your employee or a person whose  sole duty is to attend it.

The dispute was whether the car was "attended," whether the jewelry was in the salesman's custody, and whether he was "in or on" his car when the theft took place. The policyholder believed that the salesman was attending his car because he had locked the jewelry in the trunk and was "in or on" his vehicle at the time the theft occurred. To him, "in or on" meant that the salesman was in or around the immediate area and therefore was attending the car. The insurance company disagreed, claiming that the salesman was not attending his car, and that the jewelry was, therefore, not in his custody. At the center of the dispute was the meaning of the prepositional phrase, "in or on a vehicle," along with the meaning of the verb, "attended."

The prepositions, "in" and "on," are part of the locative complements associated with the roles of source, goal, and location (Huddleston and Pullum 2002, 258). The prepositions "from" and "to" mark the source of the goal and location, while the prepositions "in" and "on" are normally part of the location expression, as illustrated in exclusion 10 above. When considering the locative meaning of the prepositional phrase, it's necessary to analyze not only the meaning of the prepositions, but also the noun to which they refer, in this case the "vehicle." The selection of particular prepositions is in part dependent on the nouns referred to, as in the examples that Geoff Pullum gave today. We have no idea how this distribution originated, but it did, and that's that.

It would appear that the nouns specify the use of different prepostions. The prepositions "in" and "on" in exclusion 10 indicate a relationship between a person and a vehicle. If the subject is a person and the locus is a vehicle, "in" means a person contained in that vehicle and "on" means a person supported by that vehicle--on it. It's hard to imagine that the policy meant that the person should be "on" his car. On the other hand, "on" might cover a policyholder whose vehicle was a motorcycle, where being "in" it seems unlikely. The policy seems to point to the insurance company's intent to convey the fact that the relationship between the person and the vehicle is one that is a customary position for operating it, no matter what type of vehicle it might be.

If it is clear that the meaning of the otherwise non-specific word, "vehicle," is specified as either an automobile or truck (something that a person can be "in") or a motorcycle (something that a person can be "on"), the meanings of the prepositions "in" and "on" in exclusion 10 can become clear to native speakers of English. They refer to the types of vehicles and not to the general or immediate area (in front of the jewelry store). And "attend"means attending to the vehicle, not attending in the immediate area around it.

Posted by Roger Shuy at 02:41 PM

At that second, on that day, in that year

Talking of the interaction of syntactic preposition choice and meaning, as Mark just was, the Movable Type software used by Language Log automatically puts at the bottom of each post a little message like this:

Posted by Mark Liberman at December 16, 2007 08:15 AM

But of course that is ungrammatical. We don't say at December 16, ever. In fact the principles of preposition choice for specifying temporal locations (as opposed to durations extending from now into the future, like in a few seconds) seem to be as follows:

at for seconds (at that very second)
at for minutes (at 15 minutes past the hour)
at for hours (at eight in the morning)
on for days (on December 16)
in for weeks (in the third week of December)
in for months (in December)
in for years (in 2007)

But why? Why is it at for short periods of time recorded on a clock face, on for medium ones punctuated by the cycle of sunrise and sunset, and in for longer ones recorded in the calendar? One day I might set that as an essay question in a course on grammar and meaning. I have no idea what the right thing to say about it would be, though. I suppose my email account will be deluged with messages from people who want to explain it to me. Please be tolerant with me if it proves simply impossible for me to respond to them all. I have no staff for this sort of thing, you know; and I have are many other duties, from writing letters of recommendation (sorry about the delay, Frank) to Christmas shopping. But thank you in advance for whatever you send.

[Update: Nicole Perrin tells me that accountants do say "at December 16"; fair enough. I'm talking about ordinary non-accountant usage. Jonathan Lundell points out that we actually don't have any way to talk about being located within a certain 60 minute period, as opposed to being located at an exact instant like at 8 a.m.; I think he's right. John Lawler has sent a brief but detail-packed essay about dimensionality and metaphoricity in uses of at and on and in, which I may post separately.]

Posted by Geoffrey K. Pullum at 10:59 AM

Contrition, conditionally speaking

After his name turned up in the Mitchell Report on the use of performance-enhancing drugs in baseball, New York Yankees pitcher Andy Pettitte issued the sort of non-apology apology we've come to expect from baseball stars, from Pete Rose to Ozzie Guillen. Pettitte, who admitted to using human growth hormone (HGH) on two occasions in 2002 to help him recover from an elbow injury, couched his statement in conditionals and other hedge words:

If what I did was an error in judgment on my part, I apologize. ...
If I have let down people that care about me, I am sorry, but I hope that you will listen to me carefully and understand that two days of perhaps bad judgment should not ruin a lifetime of hard work and dedication.

Pettitte won't actually admit that his HGH use was indeed "an error in judgment," but if it happens to have been, he apologizes. He chooses only to own up to "two days of perhaps bad judgment." Pettitte's ambivalence about the quality of his judgment appears to be based on the fact that HGH is not an anabolic steroid for building muscle mass of the type that others such as his teammate Roger Clemens have been accused of using: "I wasn't looking for an edge; I was looking to heal," he says later in the statement. And he is also seeking legal cover in the fact HGH wasn't declared a banned substance in baseball until 2005. (Of course, Pettitte might still very well have committed an illegal act by purchasing or using HGH without a prescription.)

Pettitte's second conditional apology expresses contrition "if I have let down people that care about me." That's even more indirect that Pete Rose's grudging acknowledgment about his gambling problem: "I'm sorry for all the people, fans and family it hurt." Pettitte merely suggests that people who care about him might have been let down, but if they would listen to him carefully (or rather, listen to his statement released through his attorney), they wouldn't be let down at all.

Pettitte's half-hearted words haven't impressed many commentators in the New York area. "A sorry excuse," says Dan Graziano of the Newark Star-Ledger. "Crocodile tears and contrived 'regret,'" says Mike Lupica of the New York Daily News. I suspect that sportswriters, more than anyone, have a fine-tuned ear for non-apology apologies, conditioned to hear the artifice of conditionals.

Posted by Benjamin Zimmer at 10:51 AM

In or on? Experience the power of splash screens

Here's the setup screen for a recent Java update that kept pestering me until I let it run:

A couple of things about the text struck me as odd, starting with the preposition choice:

Java is found everywhere -- on mobile phones, desktop computers, Blu-ray Disc players, set top boxes, and even in your car.

Is Java "on" a disc player? As I waited for the upgrade to finish, it seemed to me that it should be in it, and probably in a mobile phone as well. At least the anonymous copywriter agrees with my intuition that Java is in a car rather than on it.

Google gives 217,000 hits for "Java is on", vs. 368,000 for "Java is in". And the top hit for "Java is in" is the download page at, which uses in rather than on for the cell phone case::

Java is in cell phones, automobiles, the Mars Rover, and many other places. By downloading it to your computer, you will be able to experience the power of Java.

What's the difference between on and in here? I'm not sure, but it seems to me that one issue is control. If you can install something or remove it, at your pleasure, then it's on the device in question. But if the manufacturer just put it there, in firmware or whatever, and there's nothing you can do about it one way or the other, then it's in there. That may be why it seems strange to me to say that Java is on a Blu-ray Disc player, since we don't have any choice in the matter.

The odd on was not the only interesting choice in (on?) this little pop-up screen.

By installing Java, you will be able to experience the power of Java.

What is it about this phrase that seems at once so strange and so familiar?

One piece of it is the lame advertiser's collocation "experience the power of", which has 690,000 Google hits this morning, most of them about things other than Java. On (in?) the first two pages of web-search results, we're invited to experience the power of Automator and Apple Remote Desktop 3, PC gaming, grace, suggestion, Adobe Acrobat, conversion, winning justice, visual learning, AllyCAD, MIDT, rhetorical eloquence, digital entertainment, GIS, the Pause, Information on Demand Solutions, Sedation Dentistry, Native Transactions, Instant Rapport, Resonance Repatterning & Deeksha, and my creator.

Is "experience the power of Java" meant ironically? It doesn't seem that way. Apparently advertising copywriters really think that this phrase impresses people in a positive way, and their clients believe them.

The other curious thing about this phrase is the repetition of the proper noun Java. I guess that advertising copywriters also believe that more brand-name repetitions are better, no matter what. In this particular case, the result reminds me of the now-defunct Fafblog, where ironic invocations of the ad-copy repetition trick were a characteristic shtick ("Fafblog! the whole worlds only source for Fafblog").

[Update -- Victor Steinbok suggests a different rationale for preposition choice in such cases:

This is just a personal opinion, so I don't know how much it would reflect the general sentiment. I find the reasoning for the on/in distinction that you cite somewhat unsatisfying.

If you can install something or remove it, at your pleasure, then it's on the device in question. But if the manufacturer just put it there, in firmware or whatever, and there's nothing you can do about it one way or the other, then it's in there. So it's strange to say that Java is on a Blu-ray Disc player, since we don't have any choice in the matter.

When I first read the sentence that prompted the inquiry, I had no problem with it. To me, the analogy of "on mobile phone" would be "on your computer". There are things that are on the computer and other things that are in the computer, but I don't quite see how the installation options make the distinction.

Instead, I thought of something else. If you think of Java as software (yes, I know, it's a "software environment"), then you'd be hard-pressed to refer to it as being in something, even if it's a mobile phone or a DVD player. "Built-in software support" is not the same creature--and the phrase actually points to the distinction that I want to make. The "built-in" bit refers not to software but the technology. So we don't say that I have a particular technology on my computer (I am sure some geeks would be tempted, but, I suspect, they would be outnumbered). For example, did Intel advertise their preceding generation of Centrino chips as the Centrino technology being in or on the computer? My memory may be faulty--and the examples may be sparse--but, I believe, it was the former. Certainly, when it comes to hardware, it's going to be "in".

So here's the $64,000 question--is BlueTooth technology on your computer, in your computer, or does your computer simply come with BlueTooth? Next, the same question but as applied to mobile phones instead of computers. The issue is not trivial since BlueTooth has both a software and a hardware component and you can install or deinstall both (even though, lately, most models come with the built-in option). So this might be a good test for both your explanation and mine.

I don't have a great deal of confidence in my rationalization for this little corner of the distribution of extended meanings of English spatial prepositions. To some extent it seem simply arbitrary. Thus an engine is "in" a car, while tires are "on" it, which more or less goes along with my rationalization -- but guys I've known for whom an engine swap was not a big deal would still talk about putting a small-block 350 "in" -- not "on" -- a vehicle. My intuition says that a carburetor is "on" an engine, while spark plugs are "in" it, even though it's easier to change plugs than to change a carb. Presumably this has something to do with the fact that the carburetor is bolted onto the outside of the engine, while the spark plugs are partly inserted into it -- but this sort of explanation seems more like a post-hoc rationalization than a predictive theory.

There's an extensive and interesting literature on this question, or at least on the factors that influence the choice of spatial prepositions in extended meanings, both in English and across other languages. Unfortunately I don't know it well enough to cite the pieces that bear the description of software and firmware.]

[Magnus Bakken:

My intuition about whether software is "in" or "on" a device has always been that the device is referenced as its disk (or disc) by synecdoche. Since we generally refer to software as being "on" a disk, we also refer to software as being "on" a device when the data is stored permanently on its disk.

And we refer to things being "in" memory -- computer as well as human -- although this is a temporary condition that we can control to some extent, contrary to my rationalization for "in". Maybe what's different about cell phones and disc players is that their storage media are likely to be devices that information is "in" rather than "on".]

[Dozens of people have pointed out to me that if you think of a cell phone as a "platform", as many engineers do, then it's obvious by metaphorical extension that software runs "on" it. Many other have brought up the "layers" metaphor (as in the OSI protocol stack), though they are divided between those who think that software runs "in" a layer and those (the majority) who think that it run "on" a layer. And some have pointed out that not only do programming languages and environments, operating systems, etc., run "in" or "on" various devices, but applications and other programs also run "in" (or sometimes "on") the languages, environments and systems, like Swift's infinite regression of fleas.]

[Update 12/17/2007 -- Rubert Goodwins at ZDNet has posted some fascinating counts:

"Running on DOS": 879
"Running in DOS": 9530
"Running under DOS": 25600
"Running with DOS":5290

"Running on Windows":472000
"Running in Windows":289000
"Running under Windows":263000
"Running with Windows":27300

"Running on Vista":82200
"Running in Vista":11500
"Running under Vista":20300
"Running with Vista":33100

"Running on Linux":329000
"Running in Linux":122000
"Running under Linux":77100
"Running with Linux":916

His comment:

So, 'on' has become firm favourite after a very slow start, 'under' is far more popular with Vista than Linux (which reflects an interesting point about the philosophies of the two platforms), yet 'with' seems to almost unknown with the open source OS - perhaps reflecting the fact that 'with' doesn't reflect any architectural aspect of the relationship between software and its host OS and thus is distrusted by the more technical.

"Running under..." -- ah, those were the days, when an OS wasn't a "platform" or an "environment", but just a sort of overseer that read your bits into memory and waited around for them to get finished.]

Posted by Mark Liberman at 08:15 AM

December 15, 2007

Political neologisms

I feel that we haven't done enough to document the made-up words and phrases coming out of the run-up to the 2008 U.S. presidential election. I did note the flurry of shag-derivations over at Talking Points Memo, and there have probably been a few others that slip my mind at the moment, but many seasonal blends and other acts of politico-linguistic creativity have passed us by. So before the chance is gone, I'll note that yesterday Ross Douthat offered this gloss for Huckenfreude (n):

Pleasure derived from the outrage of prominent conservative pundits over the rising poll numbers of Mike Huckabee.

One of his commenters cited a list of other huckavocabulary items, includng Huckaboom, Huckabomb, Huckabinge. And another suggested

Huckaventionism: misguided, well-meaning government intervention in private life meant to enforce some sort of virtue such as smoking cessation, weight loss, outlawing foie gras, etc.

Past seasonal coinages range from the obvious (Obamania) to the obscure (Ron Paul Effect).

If you've seen some especially interesting specimens, please send links and I'll add your observations to the end of this post.

[Richard Bell sent in a link to Huckacide at The Stump. Looking around a bit at The Stump also turned up Hill-a-copter and Barackabee.]

[Victor Steinbok contributes Huckaplomacy, along with some others that I don't have time to add just now.]

Posted by Mark Liberman at 09:58 AM

WOTY outside the Anglosphere?

It's that Word Of The Year time of year again, reflected in a sprinkling of WOTY press releases from various organizations, WOTY stories in the world's media, and WOTY-related posts at Language Log. Last month, Ben Zimmer discussed the background of OUP's WOTY, locavore. And Arnold mentioned that the American Dialect Society is accepting nominations for its WOTY, which will be chosen amid traditional pomp and circumstance at the society's annual meeting in January.  We seem to have neglected to mention Merriam-Webster's choice of w00t, which was the source of some puzzlement elsewhere. And others have announced their WOTY choices, or will be announcing them soon.

But WOTY fever, like word rage, seems to be mainly an anglophone cultural tradition.

I can't find any WOTY action in French, Spanish, Italian or Portuguese: web searches for {"mot de l'année"}, {"palabra del año"}, {"parola dell'anno"}, {"palavra do ano"} seem to turn up mostly discussion of English WOTY choices, along with a few mentions of the Gesellschaft für deutsche Sprache's Wort des Jahres -- which for 2007 was Klimakatastrophe, by the way.

Are there WOTY events in Arabic, Chinese, Czech, Finnish, Hebrew, Hindi, Japanese, Korean, Navaho, Polish, Russian, Swedish, Thai, Turkish, Urdu, Vietnamese, Yoruba, etc.? Inquiring minds want to know -- if you can report any specific WOTY sightings in other languages, please tell me.

[Marc van Oostendorp writes:

The Dutch publisher of dictionaries Van Dale, the association of language lovers Onze Taal and the newspaper De Pers together organise a Woord van het jaar election.

Van Dale has been publishing a yearly book with a selection of 'new' Dutch words of the previous year at least since 2000, and the journalist Ewoud Sanders publishes a list of such words in the prestigious newspaper NRC Handelsblad also at least since that year.

Of course, I should have known that the Dutch would be in this game. I believe that Onze Taal ("Our Language") is the world's most impressive language-related periodical aimed at a general audience -- and Marc has been Onze Taal's webmaster since 1997.

Herman Boel also wrote to tell me about the Dutch WOTY project, adding "Mind you, all ten words they have nominated are Dutch and are NOT applicable nor used in Flanders where we speak Flemish even though there are still a whole load of nitwits who believe Flemish and Dutch are the same language. "]

[Jo Lumley writes from Osaka:

I think that something quite similar to WOTY exists in here Japan, although I must admit my knowledge is far from complete. My subjective impression is that the media here are very fond of talking about newly coined words (新語 shingo) and fashionable / ubiquitous words (流行語 ryuukougo), and a some quick checking confirms the existence of at least three WOTY-type prizes.

The first is the Shingo Ryuukougo Taishou (新語・流行語大賞 Prize for New and Popular Words, see and Wikipedia, which has been awarded annually since 1984, presumably primarily as self-publicity, by the publishers of Gendai Yougo no Kiso Chishiki (現代用語の基礎知識 the Basics of Modern Terminology), which seems to be an annually published lexicon of current media terminology. This year the Governer of Miyazaki Prefecture was awarded a prize for his utterance "Dogenka sen to ikan" (とげんかせんといかん), a remark he apparently made in the prefectural assembly. The other prize went to a young golfer, Ishikawa Ryou, although in fact the word in question is not a coinage of his own, but his nickname, Hanikami Ouji ("the bashful prince"), which it seems was given to him by a news presenter (see wikipedia This is itself based on one of last year's popular coinages, Hankachi Ouji ("the handkerchief prince"), a nickname given to the popular young baseball player Saitou Yuuki.

Secondly, Asashi Shimbun's publication Chiezou (知恵蔵 - one of Gendai Yougo no Kiso Chishiki's rival publications) have annually awarded a prize called "Word of the Year" (the title is in English) since 2000. In 2006 they awarded it to Hankachi Ouji, which I mentioned above. 2007 seems as yet unannounced; see official site

Lastly, Brazil Inc. (, an internet company connected to the founder of the infamous 2-channel began awarding the Internet Word of the Year Prize (ネット流行語大賞 Net Ryuukougo Taishou) this year. They awarded first prize to Asahi Shimbun for the coinage "asahiru", a verb made from the word Asahi. It means "to fabricate or invent something, to bully perseverantly, to attack perserverantly those who disagree with you" (see This comes from accusations that Asahi Shimbun falsely reported the existence of *another* new coinage, Abe-suru (from the name of former PM Abe, meaning to abandon one's responsibilities, in this case because of Abe's resignation). As with Hanakami Ouji (above), I strongly doubt that the Asashi Shimbun coined this term itself, even though it is the named recipient of the prize.


[From Morgan Giles:

There are a couple of Japanese WOTY-esque events, and you're in luck, because this year's rankings have just come out.

Nihon Kanji Noryoku Kentei Kyokai, the Japan Kanji Aptitude Testing Foundation, has a kanji of the year vote, with this year's winner being 偽 ("nise"/"gi"), meaning "fake" (English article here).

It also seems that the publishing company called Jiyu Kokuminsha, who publish the Gendai Yougo no Kisochishiki (Encyclopedia of Contemporary Words), does a "buzzword of the year" (English article here; explanatory English blog post here).


[Michael Mann wrote:

There are WOTY events in German-speaking countries besides Germany itself:

In Spain, in 2006 and 2005 the readers of the newspaper "20 minutos" elected 20 words of the year -- and -- so maybe there will be 20 words for 2007 within the next days.


[Matthew Watson writes:

I'm writing in response to the post "WOTY Outside the Anglosphere?" on Language Log. My mom's side of the family are Palestinian/Lebanese, and my grandfather often talks about new words that have been approved for Arabic. If I understand right, there are a few associations--the most prominent of which is the Academy of the Arabic Language in Cairo--that publish official lists of needed modern words like "cellphone." The words are generally derived from Arabic or classical roots, and are promulgated to avoid the need to borrow from other languages. It's not exactly a WOTY event, but that makes sense given the nature of Arabic, which is a formal standard language not subject to regional or democratic innovations. (The more dynamic aspects of the language are limited to the dialects.)

As Matthew indicates, this is a somewhat different thing. There are many groups who are given -- or assume -- the task of official or semi-official language planning for new vocabulary. These groups have some mix of three goals: to make standard words available for official documents such as laws and regulations; to avoid what may be confusing variation of terminology in general public discourse; and to prevent or at least regulate borrowing words from other languages, which may be seen as culturally undesirable. This process sometimes works, but sometimes it's more or less ignored, especially if the official terms are cumbersome or late in arriving. ]

[Marivic Lesho writes:

The Filipinas Institute of Translation had their Salita ng Taon (Word of the Year) contest back in August. A lot of the entries from the past couple of years have been English borrowings. This year, the winner was miskol 'missed call', followed closely by roro, which is a Filipino way of shortening "roll-on, roll-off" (referring to an inter-island transportation system).

The Salita ng Taon website (in Tagalog) is here. An article about it from the Philippine Inquirer, in English, is here. And an interesting opinion piece by anthropologist Michael Tan, about the state of English in the Philippines, is here.


[Elsebeth Flarup writes:

It seems we do have something equivalent for Danish: "Årets ord", as described here in the Danish Wikipedia.

They don't have a 2007 word yet, but list the winner and runner-ups for 2006. The winner, "ommer" is quite difficult to render briefly in English; it's shorthand for something that has to be done over, because the first attempt was botched. It's a noun created from the compound verb "gå om", which means that something has to be done again, from the beginning.

Anyway, the Danish Wikipedia entry about the WOTY in Danish traces its roots back to an influence from Germany, Austria and Switzerland rather than the anglosphere.


[From Marc Naimark:

I have never noticed a WOTY in France. There are news stories when a dictionary (one of the Roberts or Larousses) adds new words, but no well-publicized elections of WOTYs.

A Google search for "mot de l'année" provides many responses, but always to English WOTYs. There is however a festival that elects a WOTY: the Festival du mot. I'd never heard of this festival.

It's very hard to find the actual winner... I managed to find out that the winner in 2005 was "précarité", referring to a borderline
socio-economic situation, for example a temp worker, a short-term contract, seasonal worker, living in furnished lodgings (thus without renter's protection), etc.

In 2006, it was "respect".

I haven't found the 2007 WOTY from this festival.

But when you see such duds as "précarité" and "respect", you can see that WOTYs do not generate great enthusiasm here.


[Tilman Stieve:

In Germany there is, besides the "Wort des Jahres", a negative award for "Unwort des Jahres" for words that are wildly inappropriate to the matter at hand and may even impinge on human dignity. This award, which is roughly comparable to the Doublespeak Award in the US was also awarded under the auspices of the Gesellschaft für deutsche Sprache until 1994. In that year a dispute between the jury and the Gesellschaft over the coinage "kollektiver Freizeitpark" by then-chancellor Helmut Kohl (it ended up as number two on the list for 1993) the jury became an independent organisation. You can check out their homepage at

Here you can also find a list of the "Unwörter des Jahres" from 1991 to 2006 ("Unwörter des Jahres seit 1991"). There has not yet been one awarded for 2007, as the dateline for nominations is January 7.


Posted by Mark Liberman at 09:02 AM

December 14, 2007

Going with the glow

Most cities (probably all cities) have official boards that control signage in their communities. It's obvious that signs can be an effective  instrument for identifying the names, functions, and products of businesses. The problem with signage begins, however, when one business wants to outdo another one with its signs. Even in small cities like Missoula, where I live now, businesses go through endless battles in front of their signage boards. Arlington County, Virginia, just across the Potomac from DC, seems to be struggling with this issue right now. Like other areas, it has rules that restrict businesses from putting up massive billboards and huge neon signs. Just how massive and how neon is usually the sticking point.

There may be a parallel here with written language, where we use underlining, CAPITAL LETTERS, italics, exlamation marks!, color, and increased print size to emphasize the points we are trying to make. In speech, we mostly just talk louder. But too much emphasis in either writing or speaking tends to work against the communicator's purpose. Defining the line beween enough and too much can become a problem. The goal of signage is a bit different. It seems to require shouting and visually calling attention to itself. Like written language, signage involves the size of the message but also stuff like electric lighting, symbols, effective placement, directionality, intermittant flashing, size, architectural appropriateness, and a strong commercial need to attract rather than simply to identify or emphasize. Perhaps more important, signs have to compete for attention with those of business competitors. They turn out  to look like shouting, even screaming, which brings me to an acronym I learned from the Post article--"LED."

The Consumer Electronics Association--the people who brought the petition to the Arlington County Board--laughed when one of the board members admitted that he didn't know what LED stands for. Dummy! We all know this, don't we? It's a kind of electroluminescence called "light-emitting diode." I guess you're supposed to be up on such terms if you sit on a signage board.

Arlington County is a rapidly growing community trying to attract new commerce. It's in the same pickle as most towns, especially those who don't want to look like Las Vegas or Times Square. The signage board members scorn "visual clutter" but they also want to be business friendly and not too bland. So they argue about all the signage issues that other communities find so difficult to resolve.

It's probably time for Language Log to erect a huge, flashing LED sign here at the Plaza because we too want to be business friendly and not too bland--although I'm having a hard time thinking who our competitors might be.

Posted by Roger Shuy at 01:58 PM

A bit more on Senator Williams

I cited the Abscam case of Senator Harrison A. Willliams yesterday as an example of how people sometimes don't take "no"  for an answer. It was just an example and I didn't plan to get into the late senator's whole sad experience with the legal system but several readers have asked me to say a bit more about that case. I wrote about it in my 1993  book, Language Crimes: The Use and Abuse of Language Evidence in the Courtroom (Blackwell) but here's a shorter answer to one reader's question. Marc Joanisse did some research of his own in which he found that the quid pro quo used at Williams' trial was that the senator received stock certificates in a defunct titanium mine in exchange for his efforts to help some friends (a group that included his own lawyer) locate some investors for that project. That's true, and I didn't get into that issue in my earlier post because I was illustrating a different point. But since readers have requested, I'll talk about the stock certificate point a little.

After it was reasonably clear that Senator Williams' saying "no" in that meeting with the "sheik" would not be good enough evidence, the agents  developed a different scenario using the stock certificates as bait. The prosecution never mentioned that these fake stock certificates were handed to Williams with no explanation as he was hurrying out of a meeting, or that from the get-go he thought that reopening the closed titanium mine was a stupid and impractical pipe-dream, or that he considered these certificates worthless (which they were). Nor was it mentioned that he put these stock certificates in the gym bag he was carrying at the time and casually left it unguarded on the floor of his office until the time he was arrested. If he was involved in a bribery plan, he didn't seem aware of it.

Williams believed that the whole mining episode was misguided. His friends had this wild dream of resurrecting a Virginia titanium mine to make some money. They asked him to help them find investors but, as the tapes give clear evidence, he showed little interest in doing so and never actually tried to find any. But these were his friends and, unfortunately for him, he treated them in a friendly manner.  Incidentally, the titanium that this mine was capable of producing was the kind that Sherwin Williams used to make paint; not the type used for the shields in spacecraft, as was imagined by some. The FBI made dozens of video tapes of  meetings of his friends with undercover agents, but Williams was on only a handful of them. My analysis of his  conributions in those meetings, outlined in my book, showed that he was largely silent and uninformed about their plans.

But Williams was a big fish for the government to catch and getting him involved would be a capstone to their Abscam prosecution. His involvement could have been explained properly if he had had a competent criminal lawyer to represent him at trial. Unfortunately he was represented by a civil attorney who, in the opinion of many, simply dropped the ball. Williams was convicted of bribery and sentenced to three years at Allenwood.

My own involvement with Williams came after he was convicted and waiting to be sentenced. I then lived a block away from his townhouse and I met him one morning while he was out walking his dogs. When he asked what kind of work I did at Georgetown University, I explained that I consulted on criminal and civil law cases as a linguist and his eyes lit up. He gave me copies of the undercover tapes and I spent the next thirty days or so working on them, eventually producing a long linguistic analysis which his office made copies of and distributed to all 100 senators to read in preparation for his impeachment hearing. It's doubtful that many of the senators actually read my report but I got a chance to offer a summary of it on the floor of the senate during his impeachment  hearing. But even this was difficult because only senators, the senate chaplain, and the senate clerk are permitted to speak on the floor of the senate. So my report was read aloud by the senate clerk while I sat next to Williams and silently pointed to the appropriate parts of the large charts I had made while the reading took place. After the clerk finished reading, the senators were allowed to ask me questions. I wasn't permitted to voice my responses so I whispered them to Williams, who repeated them aloud to his senate colleagues. By the third day of the hearing, it had become very clear that the senate was going to vote for his expulsion. During a lunch break, Williams and I walked through a park near the senate office building and I suggested that it would probably be wise for him to tender his resignation rather than be expelled. He agreed, went back in, and resigned.

It was one of the saddest experiences of his life. Mine too.

Posted by Roger Shuy at 09:56 AM

The non-existence of Kilpatrick's Rule

In response to my recent post "Authoritarian rationalism is not conservatism" (12/11/2007), Breffni  O'Rourke has come to the defense of James Kilpatrick. I think Breffni's points are good ones, so I've given the whole note below, along with my response.

But first, a bit of background. Back in July of 2006, James Kilpatrick complained in print about the "horrid" headline "Mass Transit Not An Option for All Drivers", on the grounds that "if mass transit is not an option for 'all' drivers, it cannot be an option for even one driver". He added, "Even a little ambiguity is a dangerous thing. The problem with this Horrid Example is that it creates a nanosecond of uncertainty."

Neal Whitman and I ignored the "nanosecond of uncertainty" business, since a literal application of this idea would put pretty much all of the English language off limits. But we did take literally the assertion that "if mass transit is not an option for 'all' drivers, it cannot be an option for even one driver"; and we took this to be a claim about the interpretation of not and all in phrases of the same general structure as the "horrid" headline. Neal discussed the semantics of the question, in terms of the scope of operators in the corresponding logical formulae; and I added a set of historical citations, showing that Kilpatrick's prescription has been frequently and consistently violated by the likes of William Shakespeare, Herman Melville and T.S. Eliot.

At the time, I got a certain amount of negative email from Kilpatrick partisans; and recently, a related issue came up in Jan Freeman's column in the Boston Globe, which brought another few emails accusing me  of pandering to the vulgar mob, subverting truth by mere empiricism, and so on. So I wrote about the "strange tangle of ideas... according to which an appeal to the historical authority of usage by elite writers is seen as a step in the direction of mob rule", and suggested that Kilpatrick's partisans, though calling themselves conservatives, are actually authoritarian rationalists.

Here's Breffni's note:

First of all, I'm entirely with you on "authoritarian rationalism".

However, I think you were a little unfair to Kilpatrick in the case of the headline "Mass transit not an option for all drivers". I thinnk there really is something going on there. I don't think Kilpatrick explained himself very well (he barely explained himself at all), I don't think he thought the thing through, and certainly he wildly overstated the "horridness" of the headline. Even so, he didn't try to generalise to a "Kilpatrick's Rule": he only complained about *this specific headline*.

So, assuming good faith on his part, and assuming no real grammatical rule is being breached, where does his "nanosecond of uncertainty" come from?

You and Neal Whitman looked at negation and quantification alone, but I think on closer inspection there is a (mild) problem arising from the interaction of scoping ambiguities with a conflicting bracketing preference. In the headline, "not" is part of what I think is apt to be understood as an idiomatic chunk, "[is] not an option" - as in "Failure is not an option", "Surrender is not an option". It wants to be understood and parsed as a unit. On a crude Google search, three out of seven first-page hits for "is an option" (excluding "What is an option?", and also duplicates) are instances of "Failure is an option", which I take to be stunt allusions to "Failure is not an option".

So I think "not an option" is not just the syntactic negation of "is an option". If this is right, then "not an option" in the headline is open to being understood, if only on a fleeting preliminary parse, as occupying a single slot:

"Mass transit is (not an option) for all drivers"

...which conflicts with the intended interpretation:

"Mass transit is not (an option for all drivers)".

This would mean that one *possible* first-reading comparator for the headline "Mass transit not an option for all drivers" is "Mass transit unaffordable for all drivers" - the interpretation Kilpatrick claims to prefer (for polemical purposes, doubtless). The examples you use to counter the putative "Kilpatrick's Law" don't have the same ambiguous bracketing brought about by "not an option". "The solution of the mystery was not known to all" (Trollope) is obviously fine, but the "mass transit" headline has a possible reading that's more akin to "The solution of the mystery was unknown to all" - hence the "nanosecond of uncertainty" that Fitzpatrick felt, and which I think I may also have felt, before spending all this time staring at it. How strongly you feel the conflict will presumably depend on how formulaic "not an option" feels to you, and I imagine that will vary from speaker to speaker.

PS: I tried to use Google to back up my claim about "not an option" being a chunk, but Google declined to endorse me unambiguously:

"is not an option" OR "isn't an option" -"is not an option which" - "is not an option that" -"isn't an option which" -"isn't an option that" -- 264,000

"is an option" -"is an option which" -"is an option that" -"what is an option" -- 205,000

Still, it sounds like a formula to me - a mainly American one.

[Back to me.]

Breffni makes two excellent and related points. The first is that Kilpatrick may not have meant to claim anything about the interpretation of not and all in general. The second is that "not an option" is a common collocation, and this may change the normal scope preferences in examples like the cited headline.

With respect, I disagree with the first point. Although Kilpatrick's ire was aroused by the specific headline in question, he explains the problem in terms that apply to any sentence of similar structure. And it's true that his discussion is brief and allusive, but a similarly brief complaint by John Dryden was part of the process that has led to the slaughter of millions of innocent clause-final prepositions.

As for Breffni's second point, I'm less certain. It's certainly true that "not an option" is a common collocation. For example, from this morning's Google News:

  a/an __ not a/an __ ratio

And obviously a "local" interpretation of such a collocation would be incompatible with wide scope of negation. Thus"Mass transit is impossible for all drivers" can't possibly mean "It's not the case that mass transit is possible for all drivers". So if "not an option" tends to be interpreted as if it were a single word, like impossible, then Kilpatrick would be right about this case.

But there are many common examples of other constructions that point in the opposite direction. For example, contrastive subsitution of option seems to be no problem at all:

Renovation is not an option, it's a necessity.
The County states that recycling is not an option, it's the law.
Firestopping is not an option but a requirement.

Preventive diplomacy is not an option but a necessity

Obviously, forms like "Renovation is impossible, it's a necessity" are completely out of the question here -- but in reading the examples cited above, I can't detect the predicted instant of hesitation, putatively caused by a desire to interpret "not an option" as if it were a single word.

And in phrases of the form "not an option <preposition> all <plural nominal>", the interpretation seems without exception to give wide scope to the negation -- exactly as the surface-structure order predicts:

There is the possibility of camp for school-age children, but it is not an option for all working parents...
But card checking is not an option for all Las Vegas unions.
Out-wintering beef suckler cows on hill grazings is not an option for all farmers but there are situations where it can work well...
It does require some user exertion, meaning the iBOT is not an option for all wheelchair users.
She is generally a proponent of the cageless approach but warns that it is not an option for all dog owners.

Unfortunately this is not an option in all email clients.
Flex time is not an option in all positions ...
This is not an option in all jurisdictions.
This is not an option with all xenobiotics.
Heating substrates to elevated temperatures can help reduce columnar growth, but this is not an option with all polymer substrates.
TIF is not an option on all digital cameras so please check your owner’s manual.

Phrases of this form are reasonably common, and I have yet to find a single one whose interpretation goes in the direction that Kilpatrick claims to prefer.

We'd have to design and implement some reaction-time or eye-gaze experiments to try to determine whether the claimed ambiguity is causing any processing delay. But if the ambiguity were a seriously active one, I'd expect to see a reasonable percentage of examples in which the interpretation goes the other way, as we do for genuine cases of psychologically-valid scope ambiguity.

Posted by Mark Liberman at 08:04 AM

Leave if you like

Leave, if you like. Do you see anything odd or difficult about the syntax of that sentence? Because it does have an odd feature. Let me explain.

As a general rule, the verb like is strictly transitive: it must have a direct object.

Suppose you're trying to decide whether to buy an expensive new coat. It's not grammatical to say, *I'm not sure if I like.

Suppose someone asks you whether you get along well with a certain colleague. It's not grammatical to say, *Oh, yes, I really like.

The transitive verb like just cannot in general be grammatically used without an object. Yet in Leave, if you like, it doesn't have one.

It is true that there are certain syntactic contexts that can provide syntactic excuses for absence. Complements (such as objects) can be preposed, as in This, I like or This, I think you'll like. There the object is not so much missing as relocated to the beginning of the sentence for special discourse effect.

A variant of this is seen in Tell me who you like: this contains an open interrogative content clause beginning with who. The object is understood as a variable x that the question requests a (human) value for: it means "Identify for me a human being who is a value for x that will make ‘You like x’ true." The syntax of open interrogative content clauses requires a wh-phrase at the beginning and an absent noun phrase in some later position. So Tell me who you like is not a counterexample: there is an object, which is understood as an object (it denotes the entity that is the focus of the liking), but special syntactic conditions require that it not be overt.

Something very similar is true of You're the one that I like. The object is required to be non-overt here because of the syntax of integrated relative clauses: a relative clause in English is like a clause with a missing noun phrase, and optionally a that or a wh-phrase at the beginning.

But in Leave if you like, or closely similar sentences like You can go to sleep if you want, we do not have complement fronting or an open interrogative clause or a relative clause or any similar syntactic context providing a general excuse for absence of an object. There simply is no object.

What we have here, then, is a special construction with odd syntax that doesn't follow the rules that govern English syntax in general. It's out on its own. Sometimes I wonder just how many special constructions like this there are out there. Often Standard English syntax seems to me like a broad and apparently familiar landscape that is actually quite alien, only in ways that people typically don't notice. A strange syntactico-semantic Narnia that users of the language mostly don't realize they have walked through the back of their wardrobe into.

I am not sure yet [as of about 5 a.m. Eastern USA time] whether if you like is covered in The Cambridge Grammar of the English Language (it does, I confess, take a little time to check its 1860 pages; it seems to me that if this construction were covered it should have turned up somewhere in the special cases of object omission with transitive verbs in section 8.1.3 of chapter 4, pp. 300ff).

Update three hours later [about 8 a.m. Eastern USA time]: Aha! Thanks to Randy Alexander in Jilin City, China, the reference in CGEL has now been found. My gratitude to him. Although I do not claim that everything about English is covered, but most topics are covered somewhere in the book. What Randy found is that there is a reference on page 1529 regarding special ellipsis in complement clauses after conditional if. Other examples covered include We could make it Tuesday if you prefer __. It is noted that you get contrasts between conditional adjuncts and other kinds of adjunct; for example: Come if you want __ versus *Come because you want __.

I'm sorry I didn't recall that passage. Page 1529 is in chapter 17, on deixis and anaphora, which was mostly written by Lesley Stirling and Rodney Huddleston in Australia while I was teaching in Santa Cruz. It'll be a long time before I can say I have instant recall of all of the 1860 pages prepared by Rodney and me and our 13 additional collaborators. I can't hold all of CGEL in my head at once. I'm just glad that nearly everything, large or small, does seem to be in there somewhere.

By the way, those of you who are emailing me, don't think that putting this down as a special ellipsis construction associated with conditional clauses explains anything. It doesn't. It just notes the puzzle. Lots of people are writing to me to suggest paraphrases of Leave if you like with extra words added in various places ("Leave if you [would] like [to leave]", or "Leave if [that is what] you like", or "Leave if you [would] like [that]", etc.), as if somehow that made everything clear. This totally misses the point. Certainly, some sort of ellipsis is involved here, as CGEL says on page 1529. But ellipsis of what? Just anything you feel like leaving out? Why can't Leave if you like mean "Leave [only] if you [would not] like [anyone else to] leave [the party]"? You can't just supply additional words that would complete a meaning that you think is the right one, and call that an explanation!

And anyway, why is any ellipsis permitted? As noted above, the verb like doesn't normally allow its object to be omitted. It only happens in conditional adjuncts. Why would that be? I have no idea. You have no idea. None of us has an explanation. The comment on page 1529 of CGEL is in a shaded blue box headed "Some special cases". It might just as well have been in a gold chest emblazoned with "All you know is about to change". This is still a mysterious alien landscape.

Saying the magic words "There's no place like home" may get you out of Oz, but saying "Oh, it's just ellipsis!" doesn't spirit you away out of the syntactic Narnia that is English grammar.

Posted by Geoffrey K. Pullum at 05:16 AM

December 13, 2007

Not taking "no" for an answer

It's common for us to be rewarded when we agree or say "yes" to people. Parents want to hear it, teachers expect it, and police demand it. At about 18 months children begin to be fluent in saying "no" but they have to unlearn this when they get to school, where compliance seems to be the norm. This newly developed compliance can then get kids in trouble with their high school peers when they don't say "no" to drugs, alcohol, and other unnamed things. This Washington Post article takes it from there.

It's all well and  good to learn to say "no." But it's often what happens after you say "no" that matters. Some people just  won't take "no" for an answer. Used car salespersons leap to mind, as do many marriage partners (please understand that I'm not speaking from personal experience here; my wife is a marriage therapist who tells me stuff like this). I describe the conversational strategy of not taking "no" for an answer in my book, Creating Language Crimes (Oxford U Press 2006), citing cases where it was used by undercover officers and cooperating witnesses who tape-recorded suspects in sting operations.

One of the most blatant examples of the police not taking "no" for an answer can be found in the FBI's famous Abscam operation some three decades ago. A conversation between New Jersey's US Senator Harrison A. Williams and an agent posing as a sheik from the United Arab Emirates was recorded and later used as evidence that the senator was willing to take a bribe--a quid pro quo.

The quo part of the alleged quid pro quo was to get the senator to personally agree to sponsor legislation that would permit the sheik to get permanent legal residence in the US. Despite the sheik's efforts, this never happened. On tape the senator says only that he is willing to help the sheik do the things that would bring the sheik's request to proper consideration through the prescribed legal channels. Williams carefully explains that the sheik has to meet the legal criteria, that this is not easy to do, that it must go through a committee, and that both bodies of congress would have to pass a bill for this ever to happen. He concludes saying, "Quite frankly, I personally can't issue that. It goes through the whole dignified process of passing a law."

The quid part of the agent's plan was for the sheik to offer the senator $25,000 for his services in this effort. Undaunted by William's careful (and quite proper) description of the whole process involved, the sheik goes merrily on to represent the quid, what the senator would get in exchange. Here's the actual text of the most crucial part of that conversation:

Sheik: (with cash in his hand) I will, for your help, uh, assistance, I would like to give you some money for, for permanent residence.
Williams: No. No. No. No, when I work in that kind of activity, it is purely a public, not, uh. No, within my position, when I deal with law and legislation, it's not within my--(telephone rings, interrupting, but Williams continues)--My only interest is to see this come together.

Senator Williams said "no" to the money at least five times, but to no avail. And he never took a cent. Like a used car salesperson, the agent ignored the "no" just as though Williams had actually agreed to personally sponsor a bill for him. The evidence on the tape didn't seem to matter to the agent nor to the prosecutor who brought the charges, nor, for that matter, to the jury that eventually convicted Williams of bribery. Saying "no" doesn't always convince law enforcement about a target's unwillingness to agree with the scenario they had in mind.

The Post's explanation of how hard it is to say "no" is only part of the story. It's often what happens after you say "no" that really matters.

Posted by Roger Shuy at 11:07 AM

Antarctic WTF

Said Mr Brian Lee of Telford, Shropshire, speaking to reporters after his aborted Antarctic cruise ended in a night in a drifting lifeboat a couple of weeks ago:

I am a bit disappointed we didn't get to finish our trip, but how many people have been on a ship that's hit an iceberg in the middle of the night, sunk, and lived to tell the tale?

Language Log knows exactly how many. The answer can be deduced from the syntax and semantics of Mr Lee's rhetorical question.

It all hangs on the syntax and concomitant semantics of the coordination of verb phrases in Mr Lee's utterance. Let me explain.

One possible syntactic analysis would be that the coordination begins after the word have. That would mean that the verbs that have head function in the three coordinates are been, sunk, and lived (all of them are in the past-participial form that is required for verbs governed by the auxiliary verb have marking the perfect tense). Under that analysis, the relevant sentence has this structure:

How many people have
[ been on a ship that's hit an iceberg in the middle of the night ],
[ sunk ],
[ and [ lived to tell the tale ] ]?

On that understanding, we are talking about people who sank. People who sink in iceberg-strewn Antarctic waters do not tell tales. So the number of people meeting the three specified conditions is zero.

But there is another syntactically permissible analysis. It has a coordination of three past-participial VPs, but the form of the auxiliary have that precedes them is has, in its reduced form "’s" on the end of the orthgraphic word that’s. The verb form hit is the head of the first of the three coordinates. (The verb hit is unusual in that its past participle is identical in form with its plain form, so hit can serve as a past participle like been as well as a plain form like be.) In that case this would be the structure:

How many people have been on a ship that's
[ hit an iceberg in the middle of the night ],
[ sunk ],
[ and [ lived to tell the tale ] ]?

Here the VP coordination is the predicate of an integrated relative clause attached to the noun ship. So on this understanding we are looking for a ship with three properties, not a person. And the third of the three properties is having lived to tell a tale. Ships don't tell tales. Therefore the number of such ships is zero.

On both syntactically possible analyses, then, the answer to Mr Lee's question is clear: absolutely none. No one has ever (i) been on a ship that's hit an iceberg in the middle of the night, (ii) sunk, and (iii) lived to tell the tale, including Mr Lee. And no one has ever been on a ship that has (i) hit an iceberg in the middle of the night, (ii) sunk, and (iii) lived to tell the tale, including the ship that Mr Lee was on.

I'm not dinging him for grammar faults, by the way. For one thing, there is always the possibility that the reporter transcribed his utterance incorrectly (the first rule of attributional abduction). Brian might actually have said ...have been on a ship that's hit an iceberg in the middle of the night and sunk, and lived to tell the tale, which would be fully grammatical, and then been incorrected (as we say here at Language Log) by some sub-editor who, as Martin Hardcastle put it in an email to me today, "removed the first 'and' because of the rule about multiple 'and's in a list, not realising that in this case it completely breaks the grammar." With the extra and, everything would be fine, because we would have two coordinations. In the following I number the outermost brackets of the two coordinations, and show the smaller one (which is inside the larger one) in blue:

How many people have
[1been on a ship that's
[2 [ hit an iceberg in the middle of the night ] [ and [ sunk ] ] 2]
[ and [ lived to tell the tale ] ] 1]?

But in addition to the fact that he may not have misspoken at all, I admire Brian's pluck and fortitude. This dude is cool. He and his wife were cruising Antarctica and looking at wildlife on a ship called the Explorer. It hit an iceberg at night, Titanic-style, ripping the side open, and water started filling the cabins. Having seen Titanic and learned from it, the captain gave the "Abandon ship" order early enough, and they got the lifeboats launched before the ship rolled over on its side and eventually sank.

(By the way, why is the noun phrase ship in Abandon ship anarthrous? Because in an emergency at sea there is no time to say the? Surely not. It's just another of those mystery constructions where the usual principle — that singular count nouns in English require a determiner constituent of some kind — is waived.)

Brian wasn't woken initially by the alarm. His wife Gillian woke him up. "I thought she wanted a cuddle," he told the reporters. (These people who talk about British men being uninterested in sex! Ha! They don't know what they're talking about! Brian was ready for a nice bit of cuddle time even in the small hours of the morning on a doomed vessel in the Southern Ocean as it began to fill with icy salt water.) He and Gillian spent a harrowing night in an open lifeboat among ice floes, and the following night in a hut on King George Island, and he must have been a little bit weary when he finally talked stoically to the press (he had no complaints about the wonderful holiday trip). And (if indeed his utterance was correctly reported) he got the plan for his sentence a bit messed up and produced what Language Log readers will know is called a WTF coordination. You probably would too, if you had hit an iceberg, sunk, and lived to tell the tale.

[Update: Kris Rhodes at UC Irvine pointed out that Mr Lee could have intended sunk as what The Cambridge Grammar of the English Language (chapter 15) calls a supplement, a parenthetical interpolation embellishing the point about what the ship had done, in which case the reporter would have done better to transcribe the utterance thus:

... how many people have been on a ship that's hit an iceberg in the middle of the night — sunk — and lived to tell the tale?

That would be fully grammatical, and it is certainly a possible analysis.]

Posted by Geoffrey K. Pullum at 04:51 AM

December 12, 2007

Happy International Human Rights Week

In case you haven't already marked it on your calendar, you might note that this is International Human Rights Week, celebrating the 59th anniversary of the signing of the United Nations Declaration of Human Rights on December 10, 1948. If you haven't looked at it lately, you can read all 30 Articles is here.

Particularly interesting is the document's 28 uses of the positive,"everyone," in those articles that assure equality and its 8 uses of the negative,"no one," in the articles describing things that should be excluded. I leave it to readers to evaluate how well we're doing with this declaration, but it seems that we all might want to reflect a bit on Articles 5 through 12 these days.

(Hat tip to Margaret vanNaerssen)

UPDATE: Jonathan Lexxox writes that the Unicode project is working on providing electronic text of translations of the UDHR in every language available. Check what they have so far here.

Posted by Roger Shuy at 10:27 AM

What a difference a /d/ makes

Phonetically, many of the various ways to render what is called an /r/ are similar to ways of pronouncing what is called a /d/. Rccently, however, this small difference may have tipped the balance in a major international sporting event. According to Frank Keogh, "Anthem gaff 'lifted Croatia'", BBC Sport 11/23/2007:

Croatia rose to the occasion in their crucial Euro 2008 defeat of England - after an apparent X-rated gaffe by an English opera singer at Wembley.

Tony Henry belted out a version of the Croat anthem before the 80,000 crowd, but made a blunder at the end.

According to the lyrics posted here, the second quatrain of the Croatian national anthem should go as follows:

Mila kano si nam slavna,
Mila si nam ti jedina,
Mila kuda si nam ravna,
Mila kuda si planina!
     Dear, you are our only glory,
Dear, you are our only one,
Dear, we love your plains,
Dear, we love your mountains.

However, as the recording here indicates, Mr. Henry rendered two repetitions of "kuda" in the last two lines as "kura".

What Keogh's article says about this is

He should have sung 'Mila kuda si planina' (which roughly means 'You know my dear how we love your mountains').

But he instead sang 'Mila kura si planina' which can be interpreted as 'My dear, my penis is a mountain'.

Regular Language Log readers won't be surprised to find that the BBC's translation is suspect -- and was probably cribbed from other media sources, which seem to be in broad agreement on this error. See the comment at Naked Translations for a closer approximation to the truth:

First, the lyric "mila, kuda si planina". Literally, this means something like "my dear, what mountains you have!" (with poetic license). The (admittedly lame) translation from the Croatian Ministry of Foreign Affairs is "Dear, we love your mountains". But there is no "you know" in the lyric.

Secondly, "mila kura si planina" could be translated as "dear penis, you are a mountain". But I have no idea how you would get "My dear, my penis is a mountain" from this. In this translation, "Kura" is third person, even though the verb "si" is second person. Someone, presumably non-native, must have been trying reason by analogy, which doesn't always work with translation.

And Neven Mrgan sent me some further analysis:

First, "kura" is not a common word for "penis" in Croatian. At best it could be called odd slang. The word "kurac" is equivalent to "cock", and "kura" is a variation used as often as, say, "peener".

The translation of the lyrics you used is rather bland. Translated literally, "mila kuda si planina" means something like "dear, where are your mountains" (but not quite). As a phrase, it might be close to "how great are your mountains!" I bring this up because without the word "where", the offending last line neither makes sense literally nor works as a phrase:

  Mila kura si planina

The best approximation I can come up with in English is

  Dear peener are a mountain

The "are" is "incomplete"; you'd have to say "you are" ("Mila kura, ti si planina") to make a proper sentence. There is no implied "MY penis" anywhere in the sentence, by the way.

I'm sure this was entertaining to the Croatians who heard it, but the BBC article makes the gaff a little too perfect.

Clearly, this is one of a class of stories that mainstream media types view as "too good to check".

Whatever the amount of interpretive willingness required to work a mountainous penis into the last line of the quatrain, the preceding line appears to have had an analogous error, as you can plainly hear above. However, the oxymoronic phallic plain seems to have been ignored or forgotten -- not only Keogh, but also the Croatian fans and players were apparently more impressed by the singer's second slip than by the first one.

In any case, Mr. Henry apologized, but the Croatians are having none of it:

"It was the last thing that I would intentionally do, and all I can say is if I have offended any Croatians, then they have my deepest apologies."

On the contrary, Henry is becoming a cult hero in Croatia, but denies he played a part in England's exit.

"I can't take the blame for that. The last thing I would do is brag about my parts like that - especially to make it so public," said Henry.

According to The Register (Lester Haines, "England flops shafted by enormous todger", 11/23/2007),  the mistake

evidently delighted Croatian players Vedran Corluka and Luka Modric, who were seen "grinning at each other" at the gaffe, and fans claim the slip helped relax the team before its 3-2 drubbing of McClaren's lacklustre side.

Accordingly, Croatians are now calling for Henry to be awarded with a medal and appointed their team's official mascot for Euro 2008. Mate Prlic, of Croatian footie mag Torcida, suggested: "He obviously relaxed the players so why not invite him to Euro 2008 to keep the winning streak going?"

Henry's agent, Douglas Gillespie, said: "Tony had a great reception from the Croatian fans and already feels part of their campaign for Euro 2008."

[Hat tip: Alexa Mater]

Posted by Mark Liberman at 09:26 AM

The secret Saussure

A fascinating article on the life of Ferdinand de Saussure, a central figure in the emergence of modern linguistics, appeared in the Times Literary Supplement a few weeks ago, and Language Log is a bit late in providing a pointer to it: The poet who could smell vowels (alternate title "The secret Saussure"), by John Joseph (Professor of Applied Linguistics, University of Edinburgh). Saussure became known very early in his life for a student thesis in which he deduced the existence of two abstract phonological segments that must have been present in Proto-Indo-European (he called them "sonants", but they became known as "laryngeals"). These were sounds whose effect on later evolution of sounds was clear, that but they were not thereby identified phonetically — mystery consonants that no one alive has ever heard or ever will. He is primarily known for a series of lectures he gave at the University of Geneva between 1907 and 1911. He never published them, and by 1913 he was dead. But his students took such careful notes that a few of them were able to reconstruct the lectures almost in full, and publish them as a book, the famous Cours de linguistique générale. The title of that course of lectures, "general linguistics", appears on both my Ph.D. certificate and the title of the Professorship at Edinburgh that I now hold. Saussure's course introduced, or at least popularized, the name of the field that gave me my career. So I raise a glass to him in this, the 150th anniversary of his birth.

Posted by Geoffrey K. Pullum at 04:57 AM

December 11, 2007

Best neuroquip of the year?

In the New York Times' Year in Ideas for 2007, Matt Hutson discusses Eric Racine's coinage "neurorealism":

You've seen the headlines: This Is Your Brain on Politics. Or God. Or Super Bowl Ads. And they're always accompanied by pictures of brains dotted with seemingly significant splotches of color. Now some scientists have seen enough. We're like moths, they say, lured by the flickering lights of neuroimaging — and uncritically accepting of conclusions drawn from it. [...]

Eric Racine, a bioethicist at the Montreal Clinical Research Institute, coined the word neurorealism to describe this form of credulousness. In an article called "fMRI in the Public Eye," he and two colleagues cited a Boston Globe article about how high-fat foods activate reward centers in the brain. The Globe headline: "Fat Really Does Bring Pleasure." Couldn't we have proved that with a slice of pie and a piece of paper with a check box on it?

The "slice of pie and a piece of paper with a check box" is a lovely turn of phrase. But on Matt's weblog SilverJacket, he has a better one.

... last week, the neuropsychologist Daniel Amen, who makes commercial use of SPECT, published an op-ed in the LA Times arguing that we should scan the brains of all potential presidents so we can spot the types of "brain pathology" that would make one forget like Reagan, philander like Clinton, or flub words like Bush. He advocates the technique (and practically demands that the People employ his clinics) essentially as a form of Lite-Brite phrenology. [emphasis added]

The only possible problem with this terrific coinage is that the Lite-Brite, introduced in 1967, may be unknown to some younger readers. If you're one of them, or if you're in the market for some on-line nostalgia, you can try Hasbro's Lite-Brite Simulator.

In the same weblog post, Matt mentions that he "considered titling the piece Crockusology, after the elusive Dr. Alfred Crockus", with a link to the crockusological saga here on Language Log.

Posted by Mark Liberman at 09:13 PM

A gene by any other name?

According to David Biello, "Culture Speeds Up Human Evolution", Scientific American, 12/10/2007:

Homo sapiens sapiens has spread across the globe and increased vastly in numbers over the past 50,000 years or so—from an estimated five million in 9000 B.C. to roughly 6.5 billion today. More people means more opportunity for mutations to creep into the basic human genome and new research confirms that in the past 10,000 years a host of changes to everything from digestion to bones has been taking place.

"We found very many human genes undergoing selection," says anthropologist Gregory Cochran of the University of Utah, a member of the team that analyzed the 3.9 million genes showing the most variation. "Most are very recent, so much so that the rate of human evolution over the past few thousand years is far greater than it has been over the past few million years." [emphasis added]

Analyzing 3.9 million human genes, no matter how much variation they show, would indeed be an amazing feat -- the current standard estimate is that humans have about 20,000- 25,000 genes altogether ("How many genes are in the human genome?", Human Genome Project Information site).

The University of Utah press release ("Are Humans Evolving Faster?", 12/10/2007) clarifies what you've probably already guessed, namely that the reearchers analyzed not genes but "single nucleotide polymorphisms" (SNPs):

The study looked for genetic evidence of natural selection - the evolution of favorable gene mutations - during the past 80,000 years by analyzing DNA from 270 individuals in the International HapMap Project, an effort to identify variations in human genes that cause disease and can serve as targets for new medicines.

The new study looked specifically at genetic variations called "single nucleotide polymorphisms," or SNPs (pronounced "snips") which are single-point mutations in chromosomes that are spreading through a significant proportion of the population.

Imagine walking along two chromosomes - the same chromosome from two different people. Chromosomes are made of DNA, a twisting, ladder-like structure in which each rung is made of a "base pair" of amino acids, either G-C or A-T. Harpending says that about every 1,000 base pairs, there will be a difference between the two chromosomes. That is known as a SNP.

Data examined in the study included 3.9 million SNPs from the 270 people in four populations: Han Chinese, Japanese, Africa's Yoruba tribe and northern Europeans, represented largely by data from Utah Mormons, says Harpending.

But here's a problem. The University of Utah press release says that the study was "published online Monday, Dec. 10 in the journal Proceedings of the National Academy of Sciences". But it wasn't. It's not in the Dec. 11 issue; nor was it published online in the PNAS Early Edition for Dec. 10, or Dec. 11, or any other previous date, as far as I can tell.

I've complained about this before (for example, "PNAS embargo policies considered annoying", 7/24/2007). The result of this policy by PNAS -- and I suppose it's some combination of inefficiency and thick-headedness rather than malicicous intent -- is that interesting papers like this one are first exposed to the public, for as much as a week, only via press releases in which DNA base pairs are amino acids, and stories by journalists who can't be counted on to know the difference between genes and SNPs.

[Update -- we can also learn about the paper on the weblog of one of the authors, John Hawks, here and here. But it's still a scandal, in my opinion, for PNAS to impose an embargo, and then end it as much as a week before general readers can get access to the paper.]

[Michael Watts writes that a preprint of John Hawks et al., "Recent acceleration of human adaptive evolution", can be found on the web site of the University of Utah anthropology department -- no thanks to PNAS.]

[Update 12/13/2007 -- Stephen Powell points out that the Sciam article now has a footnoted correction reading:

*This article wrongly characterized the HapMap genotype dataset used for this analysis as "genes" rather than "DNA sequences."

But this is still not quite right, since SNPs are not DNA sequences, they're "single nucleotide polymorphisms", i.e. substitution of a single nucleotide in a DNA sequence.

And the body of the article now reads

...the team that analyzed the 3.9 million DNA sequences* showing the most variation.

But as far as I can tell, they did not choose 3.9 million SNPs showing the most variation -- they simply started with a database of 3.9 million SNPs, period -- because that's all there is, at the moment -- and based their analysis on the distribution of those variants.

Meanwhile, Empty Pockets wrote to point out to me that the University of Utah press release makes just a big a boo-boo:

The University of Utah press release that you quote in your recent post, "A gene by any other name?" refers to a DNA base pair as being made of "amino acids." Perhaps it was not the best example to cite as a clarification of Scientific American's confusion!

FYI, this mistake is (surprisingly, to me) not uncommon, for example here and here.

In fact, I've reached the point where I am more astonished when I find a story about genes, DNA, and proteins that gets all the terminology right than I am when I find one that makes a mistake! Just this week, for example, in the NYT, a short article got things exactly backwards by confusing the function of a gene with the consequences of loss-of-function mutations in that gene: "BRCA1, the scientists reported online Sunday in Nature Genetics, prevents PTEN from doing its work."

(In fact, it is just the opposite -- loss of BRCA1 prevents PTEN from doing its work. I won't get started on the NYT's causative interpretation of what are mostly correlative data...)

Your larger point about preprint embargoes is well appreciated.

Yes, I noticed the curious allusion in the press release to "a twisting, ladder-like structure in which each rung is made of a 'base pair' of amino acids", and put a snarky comment about it into a review of the rest of the MSM coverage of this research, which I wound up deleting because I didn't have either the time or the energy to finish it. The state of science reporting is depressing, it really is.

As for the PNAS "fuzzy embargo", it's now 9:00 on Thursday evening. The "embargo" on the Hawks et al. paper was lifted Monday morning -- and the paper is still not available on the PNAS "early edition" web site. This is apparently always the pattern, and I have to believe that it's a deliberate policy, though what PNAS thinks it's accomplishing by this policy is entirely opaque to me.]

Posted by Mark Liberman at 08:22 PM

Banned in Iran

A friend of mine, whom I cannot name, at an undisclosed location in Iran which will have to remain undisclosed, has risked telling me this in a private (let us hope) email message:

I just spent a good hour going through about a hundred illegal proxy sites to bypass the government's internet restrictions. Not for porn, not for politics... but for Language Log (which, I guess, isn't so different). I thought you'd find it interesting that languagelog is banned in Iran. Or at least, it is in the town I'm in. What makes it so blasphemous, I don't know. But now I'm on the computer at 4am secretly getting my fix of linguisticky goodness with a dial-up connection and a computer that turns off when I open too many windows. It's worth it though.

We here at Language Log Plaza are indeed interested to learn that our basically educative, informational, and humorous efforts have been banned. (What my friend means by that "isn't so different" crack I cannot imagine. Our pornographic posts are few and far between and always have redeeming social worth. And our politics is kept firmly in check.) What on earth could possibly induce the government of a country of such extraordinary natural and human resources as the Islamic Republic of Iran to imagine that it might be in danger from us? There are threats of many kinds in their region of the world; but can it be that they actually fear... Language Log?

By my count we have mentioned Iran in 59 posts on this site, including the one you're reading now (in the first version of this post I thought it was more like 83, but that appears to have been because of careless searching that picked up words like "antiperspirant"; see if you can see why!).

We haven't always been entirely grovelling in our approbation of the Islamic Republic, I suppose, but we can hardly be described as having gone all out for regime change. Sometimes our critical remarks have cast aspersions on the USA rather than Iran; and mostly our remarks have been about distinctly un-saucy topics like developments in the study of Old Persian, or the importance of modal semantics. In every case, language is our topic; and it seems utterly fantastic to think that they might have the power to harm a large and important modern country if its population were allowed to read our site without going through proxy servers.

Several times (Aiding and editing the "enemy"; Jail copy editors for the right reasons; OFAC Censorship Still in Place; Submit a manuscript, go to jail; and OFAC Relents) we mentioned the policy of the Office of Foreign Asset Control of the US Treasury Department, which held the view (and may still hold it) that it is a criminal offense for an American even to engage in collaborative scholarship with Iranians; but we were and are entirely against that policy, so that can't be why we're banned.

Once I mentioned "Iran's suspicious experiments in transuranic chemistry", but for heaven's sake, take a look at the post: it was entirely devoted to providing me with a chance to comment on the pronunciation of the abbreviation "IAEA" at normal speech rates, so I could put in a couple of goofy jokes like the title "IAEA yippy yippy yie" (which still makes me giggle; sorry, but I just love re-reading my own stuff). No threat to the state there.

There was a post that called the Persian Gulf "the Gulf" (Another view of Americans and Arabic in the Gulf), but we then discussed the nomenclatural issue even-handedly (The (so-called) Gulf and (so-called) friendly fire). That could hardly be it.

Mark once mentioned that the official Iranian news agency IRNA had announced that Jesus had given instructions that Iranian Jews should defend the national interest of Iran (see Help me, I'm trapped in an Iranian news agency). But he was only quoting IRNA, and asking how and why Jesus had issued these surprising instructions. Inquiring minds want to know.

I mentioned once that Iran seemed to be innocent of sending me spam, so that was a friendly comment.

Mark discussed the case of an art professor who was fired for possibly saying (or possibly not) something about hair or chadors or something ( Spinning Fish: Mullahs defend Herodotus), but only in the context of journalism exegesis.

The absolute most critical thing about Iran that I have been able to find on Language Log is Bill Poser's remark in passing in one paragraph of his Kurdish to be co-official in Iraq that "In Iran under the Shah Kurdish was suppressed, as described in Margaret Kahn's book Children of the Jinn. Kurdish is now in public use to some extent, but the use of the language is banned in schools and the Kurds are still oppressed. Among other things, since most are Sunni Muslims they are not permitted to vote." That's the absolute worst Language Log has to offer, I think. But as far as I know, it is the simple, verifiable truth; and notice that Poser adds no value judgment or moral critique. If we have said anything worse, I have not yet tracked it down.

[Added later: plenty of people have written in to suggest that it might be the occasional references to gay sex in various contexts that got us a triple-X rating in Iran, and I suppose that is quite possible.]

Anyway, you decide: below is a complete list of all of posts prior to today in which the word "Iran" occurred.

    Banned in Iran:   A friend of mine, whom I cannot name, at an undisclosed location in Iran which will have to remain undisclosed, has risked telling me this in a private (let us hope) email message: I just spent a good hour going...   (December 11, 2007 12:03 PM)
  • Modal semantics in National Intelligence Estimate:   The page I've copied below the cut (page 5 from the summary National Intelligence Estimate, Iran: Nuclear Intentions and Capabilities , as reproduced in NYTimes Dec 3, 2007) struck me as remarkable in showing how important modal semantics is in...   (December 4, 2007 02:16 AM)
  • What old linguists do after they retire:   The energetic staff here at Language Log Plaza tries to deal with all aspects of language life, including geezerdom. For some unknown reason, I was assigned to the Geriatric Desk and I've posted about how geezerdom feels in the past....   (November 21, 2007 11:13 AM)
  • Linguistics at Guantanamo Bay:   The Standard Operating Procedures manual for Camp Delta at Guantanamo Bay has been leaked. You can download it (238 page pdf document) here. (Feel free: it isn't classified - just "for official use only".) Most of it is rather...   (November 14, 2007 06:15 PM)
  • Ask Language Log: is "regime" a loaded word? :   Joseph Kynaston Reeves wrote: If you have the time and inclination, I’d be interested to hear your opinion on the word “regime”.  To me, the word, applied to a government, implies a lack of freedom and is implicitly critical...   (November 8, 2007 06:45 AM)
  • You say potato, I say bologna:   In the October 22 New Yorker, Michael Schulman reports on his conversations with Majella Hurley, an English dialect coach who is coaching Claire Danes as Eliza Doolittle in a revival of Pygmalion ("You say potato"). Halfway through the piece, Schulman...   (October 18, 2007 10:50 PM)
  • The Islamic language family:   One small point went unnoticed in Sally Thomason's very sharp two-part critique (here and here) of Tecumseh Fitch's recent short article in Nature. It was spotted by my sharp-eyed Edinburgh colleague Bob Ladd. The artwork accompanying Fitch's article depicts the...   (October 17, 2007 07:11 AM)
  • Zolf bar baad :   Today's NYT has a piece on Mohsen Namjoo (Nazila Fathi, "Iran's Dylan on the Lute, With Songs of Sly Protest", 9/1/2007). Searching on YouTube for Namjoo turns up this lovely song, a setting of a 14th-century ghazal by Hafez, as...   (September 1, 2007 05:58 AM)
  • From the headline desk at Language Log Plaza:   Here at the headline desk at LLP, we don't write headlines, we analyze 'em.  The latest headline episode began on Wednesday with Ben Zimmer's puzzlement at the following head on a Reuters story: Taliban say kill Korean hostage, set...   (July 28, 2007 08:55 PM)
  • Old Persian News:   There has been an interesting development in the study of Old Persian. Old Persian is the language of the royal inscriptions of the Achaemenid kings, such as the Behistun inscription of Darius, and is known to us almost exclusively...   (July 3, 2007 08:15 PM)
  • Omit needless needless:   George H.W. Bush, quoted in the NYT (story by Stephen Engelberg, 5/5/89): NORTH GUILTY ON 3 OF 12 COUNTS; VOWS TO FIGHT TILL 'VINDICATED'; BUSH DENIES A CONTRA AID DEAL   'No Quid Pro Quo,' President Insists ... The...   (June 3, 2007 02:26 PM)
  • Unilateralism:   This is a country that has failed to implement any of the recommendations of a six-year-old report to a U.N. committee dealing with key human rights issues; a country that stands almost alone in refusing to ratify international agreements on...   (May 29, 2007 07:53 AM)
  • Spinning Fish: Mullahs defend Herodotus:   In Stanley Fish's most recent NYT essay ("The All-Spin Zone", 5/6/2007), he unspins unSpun, Brooks Jackson and Kathleen Hall Jamieson's book about how to cope with political rhetoric: The book’s subtitle tells it all: “Finding Facts in a World of...   (May 8, 2007 07:52 AM)
  • Growing ice cream in the Russian winter :   After the build-up about U.S.-Iran discussions at the recent conference in Sharm el-Sheikh, journalists were left to interpret some scant and ambiguous shards of interaction (Lee Keath, "Suspicions remain after Iraq conference", AP (via Houston Chronicle), 5/4/2007): Baghdad also...   (May 6, 2007 07:43 AM)
  • Help me, I'm trapped in an Iranian news agency:   Was it an April Fool's joke? A simple translator's mistake? A subtle attempt at sabotage? A call for help? I'm baffled, frankly, but under the headline "Iranian Jews ready to defend national interests", the Islamic Republic News Agency (IRNA) ran...   (April 8, 2007 08:57 PM)
  • BCCI in the news again:   Every once in a while a law case I worked on years earlier reappears in the news. This time it's about the Bank of Credit and Commerce International (here) and (here). The current news article suggests that in the old...   (April 7, 2007 02:29 PM)
  • The jagged, dash-strewn syntax of Robert Fisk:   Breathlessly urgent syntax from Robert Fisk in The Independent the other day: Oh how pleased the Iranians must have been to hear Messers [sic] Blair and Bush shout for the "immediate" release of the luckless 15 — this Blair-Bush insistence...   (April 5, 2007 10:45 PM)
  • Linguistic intervention in Iran :   It's not quite as bad as the spammers' "I need of your assistance" or "within the nearest time", but L/S Faye Turney's most recent letter of "confession", released by the Iranian embassy in London on March 30, really doesn't read...   (April 2, 2007 06:40 AM)
  • The Theory of Date Formats:   Arnold's discussion of the clever forensic use of date order by the Iranian government made me think a little about date formats. An analysis along the lines of Optimality Theory seems appropriate....   (February 27, 2007 01:43 PM)
  • Tell-tale date format?:   In a letter published in the New York Times on 2/26/07 (p. A24), M. A. Mohammadi, the Press Secretary in the Mission of Iran to the United Nations, attacks U.S. allegations about his country: The United States media should...   (February 27, 2007 10:18 AM)
  • Aramaic in the Tomb of Jesus:   Today's New York Times has a report on a new documentary by a team that claims to have found the graves of Jesus and his family and that they show that he was married to Mary Magdalene and was...   (February 27, 2007 04:45 AM)
  • The (so-called) Gulf and (so-called) friendly fire:   Earlier today Sally Thomason entitled a Language Log post, "Another view of Americans & Arabic in the Gulf." Will Language Log Plaza now face a worldwide boycott over an omitted word? Let me explain. In today's Guardian, reader's editor Ian...   (February 19, 2007 05:19 PM)
  • Grammatical parables at the Pentagon:   In yesterday's Pentagon Roundtable (transcript here), with SecDef Robert Gates and the Chairman of the Joint Chiefs of Staff, General Peter Pace, the first question that came up was about the Iranian role in supplying EFPs to Iraqi insurgents. Or...   (February 16, 2007 07:33 AM)
  • Whom shall I say [ ___ is calling ]?:   Commenting that "a little knowledge is a dangerous thing", Gene Buckley offered me the following example from the New York Times of 1/15/07: (1) The answer, shaped in the National Security Council, is for the American military to make...   (January 23, 2007 10:31 AM)
  • Mailbag:   Well, I stepped on a few corns yesterday. Rather than add updates to the posts in question, as I usually do, I'll collect the email and my answers in one place here, and link forward from the earlier posts....   (January 16, 2007 07:07 AM)
  • "Every 52 seconds": wrong by 23,736 percent?:   Well, I wasn't going to blog this, because it's got nothing directly to do with speech and language. But it does have to do with rhetoric, and with the use of authoritative-sounding assertions backed up by empty references to scientific...   (October 13, 2006 08:02 AM)
  • Comma:   On CNN's The Situation Room, aired 9/20/2006, Wolf Blitzer interviewed President George W. Bush: BLITZER: We see these horrible... BUSH: Of course you do. BLITZER: ... bodies showing up, torture, mutilation. BUSH: Yes. BLITZER: The Shia and the Sunni, the...   (September 25, 2006 07:23 AM)
  • Roses of Mohammad for breakfast, elastic loaves for lunch...:   Back in January, the edict came down in Tehran that Danish pastries (shirini danmarki) should henceforth be called "roses of Mohammad" (gul-e-muhammadi). At the time, this seemed to be a specific reaction to the Danish cartoon controversy; but it seems...   (July 29, 2006 02:14 PM)
  • Blogging from the seat of power :   In a recent debate with other New York Times columnists (Times Talks, U.S. Politics: What's Next?, July 17, 2006), Maureen Dowd got a big laugh when she said I don't think Bush is stupid either, but I think that they...   (July 22, 2006 12:47 PM)
  • Phony Oriental wisdom in the 12th century :   [Guest post by Victor Mair] About thirty-five years ago, I encountered for the first time the following saying (translated from Persian): “Seek knowledge even as far as China.” When I first heard this maxim, it immediately struck me as being...   (July 18, 2006 07:33 PM)
  • Recycling grammatical terminology:   Christopher Hitchens' latest fighting words column for Slate ("Don't Talk to the Mullahs", 5/15/2006) directs a few desultory insults towards his recent virtual debating partner Juan Cole, while describing Mahmoud Ahmadinejad's letter to President Bush: It then turns to a...   (May 16, 2006 05:52 PM)
  • The alcoholic orientalist thief vs. the tenth-rate syntactical train wreck:   Christopher Hitchens and Juan Cole have been arguing about how to interpret the anti-Israel rhetoric of the Iranian leadership. As I understand it, Cole believes that the Iranians are calling for an end to the Israeli occupation of the territories...   (May 7, 2006 02:06 PM)
  • Arabic machine translation from Google Labs :   Franz Och at Google Labs has announced interactive sites where you can try Arabic-English and English-Arabic machine translation. I tried a random story on the Al-Hayat web site. Cutting and pasting from the story worked pretty well: the first...   (April 30, 2006 02:03 AM)
  • Adventures in celebrity onomastics:   When Tom Cruise and Katie Holmes announced the birth of their daughter on Tuesday, celebrity-watchers were eager to find out what to call TomKat's offspring (besides TomKitten, of course). The couple's publicist revealed that the baby's name is Suri, further explaining...   (April 21, 2006 10:45 PM)
  • I found my snowclone in Palo Alto:   Continuing our discussion of what's a snowclone and what's just a lot of playful allusion to and creative variation on some original (most recently: X-back Mountain?), I take up the legend on a flyer for the Artfibers Yarn Millshop...   (March 23, 2006 07:00 PM)
  • The perils of semiotic speculation :   During the recent controversy over the Danish cartoons, it's struck me how many people think that the events are messages of some sort, though they rarely agree about who has been communicating what to whom. The riots and embassy attacks...   (March 14, 2006 02:44 PM)
  • Respect :   Callimachus at Done with Mirrors represents his fellow Fluffians right back at Mahmoud Ahmadinejad: "Yo, Iran!": We took it from W.C. Fields, because he was one of us. We're not going to take this from you, Iran. What'd they do?...   (March 9, 2006 02:19 PM)
  • Roses of Muhammad, bread of Vienna:   You've probably heard that the Tehran confectioners union has ordered "Danish pastries" (shirini danmarki in Farsi) to be renamed gul-e-muhammadi, or "roses of Muhammad", in an echo of "freedom fries" and "liberty cabbage"....   (February 17, 2006 01:55 PM)
  • A guest rant: "All we want are the facts" :   Just as I was posting my little complaint about Technology Review, this fine rant was submitted by Paul Kay for publication in Language Log: Is truth really under attack in American society? Or does it just seem that way? Does...   (January 19, 2006 06:14 PM)
  • The [sic]ing of the President:   In November, when the White House Press Office sought to change transcripts of a briefing by Scott McClellan (who either thought that it was "accurate" or "not accurate" that Karl Rove and Scooter Libby were known to have had conversations...   (January 12, 2006 01:20 AM)
  • International Spam:   As you probably know I'm a fan of internationalization, but it can go too far. When I checked my email a few minutes ago I found two spam messages right in a row. One was in Japanese, advertising software....   (November 1, 2005 06:47 PM)
  • Kurdish to be Co-official in Iraq:   Article 4 of the draft of the new Constitution of Iraq makes Arabic and Kurdish co-official languages. Both languages are to be used in legislative bodies and courts, schools, and official publications and correspondence. This is great news for...   (October 12, 2005 07:52 PM)
  • [expletive discussed]:   In today's NYT story on Mahmoud Ahmadinejad's role in the 1979 U.S. embassy seizure in Tehran, there's an unusually cumbersome form of bowdlerization: At 6:45 p.m. Monday, after seeing the picture on the Web site of The Washington Post, Mr....   (July 1, 2005 10:32 AM)
  • Bill Clinton has a blog:   [Or maybe he doesn't -- several people have written in to suggest that this is a fake. That probably makes more sense than the hypothesis that Bill Clinton would really write as frankly as he seems to in the entries...   (March 27, 2005 12:03 AM)
  • OFAC Relents:   Last May we discussed the US Treasury Department's view that publication in the United States of material originating in countries embargoed by the United States is illegal and the effect that this has had on scientific publication. I am...   (December 15, 2004 06:36 PM)
  • An Escape from Election News into Brahui:   Lately I've been finding creative ways to take my mind off the political news, and one of them involved checking Dravidian references for a student. This took me to one of the books I inherited from my father, the atheist...   (November 2, 2004 08:04 PM)
  • The Axis of Spam:   America is under attack. Unwanted linguistic material, possibly hazardous, is being launched at us in vast quantities. Spam received at my email address, which I now never advertise anywhere in machine-readable form (though it is of course too late)...   (October 27, 2004 04:43 PM)
  • Fear North Dakota:   The Oct. 10 edition of NBC's Meet the Press was partly devoted to a sort of debate between the Colorado Senate candidates Ken Salazar and Pete Coors, moderated by Tim Russert. At one point, Coors got a little tangled up...   (October 16, 2004 11:20 AM)
  • hard opponent must hope: today even respect issues:   2004 is the 40th anniversary of Gerald Salton's SMART system for full-text information retrieval, or at least of the earliest documentation of it that I've seen. One of the key insights of this system was that the content of a...   (October 2, 2004 05:59 AM)
  • Birlashdirilmish yangi Turk alifbesi:   Spurred by Language Hat's post on origins of Bishkek, the (current name of) the capital of Kyrgyzstan, I typed in some sections of a fiction set against the background of Stalin-era Central Asia, featuring the effort to develop a "New...   (September 27, 2004 05:14 PM)
  • Submit a manuscript, go to jail:   I know, I'm perverse; but after reading Bill Poser's revelation that coauthorship with a citizen of an embargoed country is a Federal crime, I have suddenly developed a yearning, a positive lust, to collaborate on a little scientific paper of...   (May 18, 2004 01:15 PM)
  • OFAC Censorship Still in Place:   Last month I reported that the Office of Foreign Asset Control of the US Treasury Department had abandoned its position that journals published in the United States could not edit papers submitted by residents of countries with which the...   (May 18, 2004 01:06 AM)
  • Jail copy editors for the right reasons:   The news that copy-editing a paper before it appears in a journal may be a criminal offense if they come from one of the Bad Guy countries (further details here) is perhaps the most astonishing I have encountered in months...   (March 4, 2004 03:07 PM)
  • Aiding and editing the "enemy":   The New York Times reports that the U.S. Treasury Department has dreamt up yet another novel form of censorship: Treasury Department Is Warning Publishers of the Perils of Criminal Editing of the Enemy. The New York Times, February 28,...   (February 29, 2004 10:37 AM)
  • The Passion:   Actor Mel Gibson has produced a controversial new movie The Passion about the last 12 hours of Jesus Christ which will appear in theatres in two weeks. It is controversial for several reasons. Some people consider it to stimulate...   (February 10, 2004 11:37 AM)
  • Inflections, genes and western Iran:   As a gloss on the discussion a while back on Indo-European origins, I have just discovered the neatest thing. Persian is odd among Indo-European languages in how low on inflection it is. In particular, it is one of the few...   (February 5, 2004 06:51 PM)
  • He used all his French:   According to the 12/15 New York Times story on Saddam Hussein's conversation with four members of the Iraqi Governing Council, Mowaffak al-Rubaie asked "When they arrested you why didn't you shoot one bullet? You are a coward. " The response,...   (December 15, 2003 11:21 AM)
  • Irresponsible Punditry:   The paper "Language Tree Divergence Times Support the Anatolian Theory of Indo-European Origin" discussed in a previous posting was the subject of an article by Boston Globe staff writer Gareth Cook in the Thanksgiving Day issue (p. A16). The...   (December 10, 2003 10:33 PM)
  • It's like, so unfair:   Why are the old fogeys and usage whiners of the world so upset about the epistemic-hedging use of like, as in She's, like, so cool? The old fogeys use equivalent devices themselves, all the time. An extremely common one...   (November 22, 2003 02:14 AM)
Posted by Geoffrey K. Pullum at 12:03 PM

Authoritarian rationalism is not conservatism

A couple of days ago, someone emailed to complain about a post that I wrote a year and a half ago:

If you choose to speak to the 'vulgar mob,' you'll do fine, but don't include yourself among the 'better-informed' pedestrians.

What had I done to appeal to the vulgar mob? Did I defend the over-use of like totally by middle-aged men? Did I assert that final rises on declarative clauses are often strong and aggressive rather than weak and self-doubting? No, my culpable populism consisted in quoting William Shakespeare, Herman Melville, Anthony Trollope and T.S. Eliot.

There's a strange tangle of ideas out there, according to which an appeal to the historical authority of usage by elite writers is seen as a step in the direction of mob rule. As my correspondent explained,

You can cite a million examples of poor construction, but that doesn't make it right!

And what does "make it right"? Apparently, it's the authority of a "rule" invented by a self-appointed expert, who has concluded that the world would be a better place if it were to be run according his prescriptions. By a curious series of historical associations, some people have come to view this perspective as a "conservative" one, although in fact it's much more reminiscent of the Jacobin Club and the Khmer Rouge.

Here's the background of this particular example.

James J. Kilpatrick, presiding last year in what he calls "the Court of Peeves, Crochets and Irks", condemned the headline "Mass Transit Not an Option for All Drivers". According to Kilpatrick, the headline's obvious and sensible meaning is grammatically illicit, because "if mass transit is not an option for 'all' drivers, it cannot be an option for even one driver" ("Even a little ambiguity", 7/9/2006).

Neal Whitman at Literal Minded tried and failed to make logical sense of Kilpatrick's dictum ("If It's Not for Everyone, It's Not for Anyone", 7/21/2006). I supported Neal by providing a list of historical examples that violate the dictum, from authors such as William Shakespeare, Herman Melville and Newt Gingrich ("James J. Kilpatrick, grammarian" 7/29/2006).

My reward, back in the summer of 2006, was a brief but emphatic spurt of negative email questioning my analysis ("All drivers equates to everyone who drives a car with none excluded. That's what the qualifier all means. Unless, of course, you're Bill Clinton."), my motives ("You are a college professor of English, and this man Kilpatrick has debunked one of your arguments. If Kilpatrick is right, you are wrong. And if there is one thing I learned in college, it's that college professors can never ever be seen as wrong in their field to their piers."), and my qualifications ("When you have written books on English usage, ... when you are a nationally recognized expert on the English language and it's usage, and not just another College professor, then perhaps you will have the credentials to debate the issue.") .

I had observed that my views on semantic scope are in fact conservative ones, but my correspondents were having none of that: "A Conservative college professor, now there's a real oxymoron."

In fairness, I did cast aspersions, at least by implication, on Kilpatrick's motives and qualifications:

What led Kilpatrick to open his column so confidently with such a spectacularly wrong assertion about how the English language works? I won't speculate about his psychology, and I don't know about possible precursors in the prescriptivist literature for this particular piece of weird semantics. But my impression is that artificial rules about usage often start when a half-educated commentator with more self-confidence than insight, and with no respect for either demotic or elite traditions, decides that some common practice is inefficient or illogical. Why such pronouncements occasionally gain widespread acceptance is a question that could be the subject of several dissertations in intellectual history or social psychology. My own guess, FWIW, is that more insight will come from the natural history of religion than from rational choice theory.

And Kilpatrick is known as a conservative columnist, so anyone who disagrees with him on a point of usage must, by the logic of conceptual polarization, be motivated by political liberalism, and perhaps even some sort of grammatical socialism.

A few weeks ago, Jan Freeman responded to a reader who complained about the advertising caveat "not available in all areas" ("Entirely wrong", Boston Globe, 10/28/2007). The complainant argued that this must mean "not available in any area". Jan disagreed, and in her explanation, she cited Kilpatrick, Whitman and me.

This is apparently what led to the letter that I opened by quoting. Let me try -- probably in vain -- to strengthen my conservative credentials by closing with another violation of Kilpatrick's Rule, Samuel Rowlands' 1620 dedication to his "Reader":

1 This Crystall sight is not for all mens Eyes,
2 But onely serues for the iudicious wise,
3 Fooles, they may gaze as long as ere they will,
4 And be as blind as any Beetle still:
5 A purblinde Momus fleeringly will looke,
6 And spie no knaue but's selfe in all the Booke.
7 A Sicophant, that slaues himselfe to all,
8 Will his owne Knaue-Companions honest call,
9 And wilfull winke, because he will not see,
10 With diuers sorts of Buzzards else that be:
11 But these we leaue to their defectiue sight,
12 With Bats and Owles that blinded are by light.

[Update -- Stephen Jones writes:

Kilpatrick is syndicated in the Arab News so I have read his columns for many years (we get a double feast as his column is opposite Dave Barry's). In language matters Kilpatrick is genuinely a liberal; his normal rule is if it sounds right it is right. He has however no linguistic training whatsoever (he's an ex-political journalist) and on occasion it shows.

One thing I have noticed about American grammar mavens is that there is not the link between political conservatism and false linguistic conservatism that you would expect.

I guess the question is, "sounds right" to whom? And what should we do when different ears give different answers?

There are several dimensions involved in attitudes towards linguistic norms, and these have become oddly (and sometimes irrationally) tangled with political labels and allegiances. In the particular column that I quoted, Kilpatrick seems to support a particular type of authoritarian rationalism that is often associated with those on the left who believe that society can and should be reorganized along logical lines whose justice and benefits are obvious to them. Such people believe that everyone should be compelled to obey certain "rules", because -- they assert -- these rules are logically correct, even -- or perhaps especially -- if most people have always behaved differently.

In linguistic matters, however, contemporary "liberals" tend to be more like libertarian conservatives, who believe that it's generally best to let social groups evolve their own norms, without central planning. For such people, the first thing to ask about a contested usage are "what are the historical precedents and the current patterns of behavior?"

In my opinion, there's space for both attitudes in the social ecology of meta-linguistic discussion. But it's ironic that people who believe in laissez-faire political economy seem so often to be linguistic dirigistes; and that people who generally support government regulation and social engineering tend to be linguistic libertarians.]

[Joshua Jensen writes:

I don't have any data for what I'm about to say -- just introspection and personal reflection -- but I suspect that at least one motivation for conservatives' reactions is based on a disdain for postmodernist deconstructionism. When I was growing up, this academic heresy was a big deal, and the primary danger was its denial of determinate meaning. The first time I heard my older cousin (an English major at UNC Chapel Hill) tell me that language meaning is determined by usage, I was appalled that anyone could think something so obviously false.

Secondarily, the concept of Truth is very important to conservatives (and they often think that it matters to them only). Because truth is communicated primarily through language, then language's properties much be such that -- when used according to its innate rules -- it states categorical truths unambiguously.


Posted by Mark Liberman at 09:16 AM

Trope-watch, Oslo edition

With dreary inevitability, Al Gore dusted off his favorite language-related trope for his speech accepting the 2007 Nobel Peace Prize (text, video):

In the Kanji characters used in both Chinese and Japanese, "crisis" is written with two symbols, the first meaning "danger," the second "opportunity." By facing and removing the danger of the climate crisis, we have the opportunity to gain the moral authority and vision to vastly increase our own capacity to solve other crises that have been too long ignored.

Faithful readers will recall that Gore has relied on this canard about the Chinese word wēijī on numerous occasions, such as a Vanity Fair article in May 2006 and congressional testimony last March. I'll point again to Victor Mair's careful debunking on, though the trope is so firmly implanted in the rhetoric of Gore and other political figures that debunkage is rather beside the point. As I explained here, the wēijī story was first traded among Christian missionaries in China as early as 1938 and made its way stateside by 1940, later to be popularized by John Foster Dulles and John Kennedy, among others.

Last time around I wrote, "After nearly seven decades of increasingly hackneyed use, isn't it time to retire poor overworked wēijī?" Apparently not quite yet.

(Hat tip, James Hargrave.)

Posted by Benjamin Zimmer at 01:11 AM

December 10, 2007

It's, like, war

It's a war on like.  On this spoof site for the "Acadamy of Linguistic Awarness" that quotes an unlikely-to-be-20-year-old "Cathy" from Santa Barbara (yes, young and female and in southern California) using like in ways that ordinary people could only dream of, and cites a bogus research finding from the (of course) non-existent "National Survey of Academic Affairs" about the academic dangers of using like:

(Hat tip to Levi Self.)

Plenty of commentary on the Reddit site.

[Shamefaced addendum: Ben Zimmer points out that Eric Bakovic posted on this spoof and a related one back in 2005.  So treat this as a reminder.  I plead the enormous number of Language Log postings and the difficulty of searching on like.]

Posted by Arnold Zwicky at 11:15 PM

VPE three-way

Ordinarily I'd have appended this to my last posting on Verb Phrase Ellipsis, but I think that this three-way example (from "The Rhinoceros" by Michael Flanders and Donald Swann) deserves a posting of its own (ellipses indicated as before):

Yet a sensitive heart the Rhinoceros owns.
If you doubt it, here's the proof;
That thing on his nose is for taking stones
Out of a horse's hoof:
He seldom, if ever meets a horse
(It is this that makes him sad)
When he does then it hasn't a stone in its hoof,
But he would ___ if he did ___ and it had ___!

(Hat tip to John V. Burke.)

The ellipses are, in order: base-form VP take stones out of a horse's hoof, base-form VP meet a horse, and NP a stone in its hoof.  The last of these really works only in British English, where "possessive have" can function as an auxiliary, as you can see by the negative inflection in it hasn't a stone in its hoof (where current American English would have it doesn't have a stone in its hoof -- and the last line would go But he would if he did and it did, which isn't as entertaining as the Flanders and Swann version).

Flanders and Swann were wonderfully playful with language, as Mark Liberman noted a while back in a discussion of syllepsis.

Posted by Arnold Zwicky at 10:50 PM

This blogging life: Not doing, failing to do, neglecting to do

Another posting about aspects of my life blogging on Language Log (after the first, which mostly elicited hostile e-mail; at least two more are to follow).  This one is about getting e-mail from readers following up on some posting of mine and offering something more.  There are three main variants:

NEG: You didn't mention X.
FAIL: You failed to mention X.
NEGLECT: You neglected to mention X.

In my understanding of things, these three are on an increasing scale of implied responsibility on my part, and hence culpability on my part.

The NEG variant is in some ways the most complex, though lexically the least rich.  Often it's clear that e-mail in the NEG form is just offering more examples of the sort I mentioned in my posting.  I try to stress that I'm giving examples that illustrate some point(s) and not assembling an inventory of all examples of this sort, but, still, people want to tell me about their favorite examples.  I try to be generous about these things, but such messages add a lot to my burden of e-mail.  [For another day: why I don't enable comments on Language Log.]

In other cases, X is some topic connected to the topic of my posting -- often interesting in its own right, but not directly relevant to the point(s) of my posting.  My correspondent is saying, in effect, "I wish you had posted on X", or just "Your posting reminds me of an interesting question".  Usually, I've CHOSEN not to post on X, because it would take me far afield, because I'm saving X for its own posting, or because I'm just plain ignorant about X.

On the other hand, the NEG variant, which SAYS merely that I didn't do something, can IMPLICATE that I should have, so that there's a suggestion of some culpability on my part.

The FAIL variant conveys much more strongly, to my mind anyway, that I should have mentioned X.  Not mentioning X is, well, a failure on my part.

The NEGLECT variant, to my mind, is like the FAIL variant, but adds the suggestion that I knew about X and knew that X was relevant, but nevertheless, perversely, didn't mention X.

E-mail with the NEGLECT variant in it usually annoys me a lot when I first read it.  Occasionally, I have indeed neglected to say something relevant I knew and should have said, in which case I amend the original posting or write (or plan to write -- there are an awful lot of items in my queue for posting) a new one, but usually it's not neglect, but (as I said above) a choice on my part.  I try to avoid snapping back at the authors of the NEGLECT messages, but I'm afraid I'm often rather curt, because I feel unfairly attacked.

Even when I am in fact ignorant, I bridle at the accusation that I SHOULD know these things, as though I should know all about the histories of all English expressions, the ins and outs of many different computer programs and systems, the details of youth slang in various places, everything about current popular culture throughout the world, the history of ideas in Western culture since the Greeks, and so on.  No one could possibly encompass all of this (and no one ever could, despite people's claims that there was a time when educated people could know EVERYTHING there was to know about the world; there have always been experts, and so single person has ever possessed everyone's expertise, and that's a good thing). 

I freely admit that I'm ignorant about an enormous number of things, even some that I "ought", as a highly educated person, a professor at an elite university, to know.  In my defense, I sometimes say (accurately, I think) that I'm an ambitious child of the working class and so selected areas where I thought I could advance by building on my talents.  But mostly I think it's because no one can encompass all of this.  I'm astounded at true polymaths (especially those who bridge natural science, social science, engineering, the humanities, and/or the arts), though I do recognize that every polymath has huge gaping holes of ignorance in their knowledge.  In any case, I am not a polymath.

I'm also astounded at people who can reel off serious writing at a great clip -- like my old friend, now sadly departed, Jim McCawley, who turned out high-quality stuff in print at rates unimaginable to me.  As I said in my earlier posting, I work slowly, thinking things through many times.

When I've posted quickly, I've almost always regretted it.  I've had to amend these postings, sometimes several times, and often have had to do follow-up postings, or at least plan them (there are dozens in the works right now).  This doesn't encourage me to toss stuff off.

Several friends have mentioned to me the notion of e-mail bankruptcy, in which a poster "writes off" the accumulated debt of response to e-mail.  I'm reluctant to do such a thing; I always hope to revive topics in the queue.  Hey, I started this posting eight weeks ago.

Posted by Arnold Zwicky at 08:09 PM

Dual VPE

Here at Language Log Plaza, we collect remarkable coordinations and we collect remarkable instances of Verb Phrase Ellipsis (VPE), and sometimes we get both in one package.  Back in May, Eric Bakovic produced one of these deliberately:

All completely unnecessary, if you ask me (though, of course, nobody did ___ or is ___).

Both of the coordinated ellipses (indicated by the underlines) have the same antecedent, the finite VP ask me, but the understood VPs have different verb forms in them: base-form ask me as the complement of did, present-participial asking me as the complement of is.  The conjuncts are in a sense not parallel.

And now John Lawler has pointed me to a somewhat more complex example of dual VPE -- non-parallelism of conjuncts just as in the Bakovic example (base-form ellipsis plus present-participial ellipsis), but with different antecedents for the ellipses -- in a Partially Clips cartoon:

That is: I did ___ and I am ___!, with elliptical jump (base form) and running away in terror (present participial), respectively.  This is probably over the line for some people, but Lawler and I are ok with it.

Here's another sort of dual VPE, with one of the antecedents contained within the other, from the television show Dante's Cove:

Corey: You don't want to leave the Cove.
Kevin: I do ___, and I will ___.

The elliptical VPs (both in base form) are want to leave the Cove and leave the cove, respectively.  Entirely comprehensible, I think, but I did notice it.

One final example.  This is a story -- I don't recall where I first heard or read it -- about a young man's first sexual encounter with another man.  As the young man tells it, in the version I remember:

I was holding his hard cock, staring at it, and he said to me, "You put your mouth on that and you're a cocksucker."  So I did ___ and then I was ___.

(with base-form VP put my mouth on it and predicative NP a cocksucker as the ellipses; remember that the ellipses in VPE are not always VPs, despite the name).  The only version of the story I've been able to find on the net is in the "queer quotes" section of the Queer Resources Directory, and it's less syntactically exciting than the dual VPE version I recall:

The first time I was naked in a bed with another boy for the express purpose of having sex, I paused, holding his erection, thinking "Put your mouth on that and you're a cocksucker!" And then I was. -- Ronald Ramage

(No, I don't know who the Ronald Ramage in question was or where he wrote this.)

Posted by Arnold Zwicky at 01:32 PM

North American Computational Linguistics Olympiad

Here's an important message from Drago Radev -- tell all the high-school students you know!

The 2008 edition of the North American Computational Linguistics Olympiad will take place on February 5, 2008, at live sites in select cities in the USA and Canada as well as remotely. The top finishers will be invited to participate in the team selection round which will take place on March 11, 2008.

To register or for more information, go to

Lori Levin, NACLO Co-chair

Thomas Payne, NACLO Co-chair

Dragomir Radev, Program Chair and US team coach

Posted by Sally Thomason at 11:20 AM

More fun with machine translation

A reader asks:

Babelfish translates 'wow' into French as 'défaut de la reproduction sonore'.

I'm wondering: how on earth does that happen?

Well, "defect in sound reproduction" is one of the meanings of the word -- although there's no reason for a citizen of this digital age to know it. The online Merriam-Webster dictionary gives us:

Main Entry: 4wow
Function: noun
Etymology: imitative
Date: 1932

: a distortion in reproduced sound consisting of a slow rise and fall of pitch caused by speed variation in the reproducing system

The superscript 4 means that this is the fourth option -- the others are:

1. interjection -- used to express strong feeling (as pleasure or surprise)
2. noun: a striking success : HIT
3. transitive verb : to excite to enthusiastic admiration or approval

A machine translation system needs to decide which of these to use. Without going into the theory and practice of machine translation, let's just say that Babel Fish makes the wrong choice here, just as Kingsoft so spectacularly did in the case of 干:


This doesn't really answer the question, but just pushes it back to another one: why do machine translation systems make the particular wrong choices that they do? This implies that such errors are somehow abnormal or unexpected, which is a natural reaction to notably silly mistakes such as Babel Fish's translation of wow into French, or Kingsoft's (former) translation of 干 into English.

But in fact, mistakes of that general kind are all too easy for computer algorithms to make, and very difficult to eliminate entirely -- though uniformly translating wow as "défaut de la reproduction sonore", or 干 as "fuck", is unnecessarily and indeed inexcusably bad practice. The real question should be, how can an MT system possibly make such choices correctly, not just sometimes or even most of the time, but almost all of the time?

There is an answer, but the margin of my breakfast time is too small to contain it.

To illustrate a closer approximation to the state of the art, here's what Google's current public system does with the Chinese title that Kingsoft 2002 translated as "Expand Enterprising and Really Grasp Solid Fuck and Continuously Expand and Great the New Situation of Buildings of Western Region":

Google's current English-to-French offering likewise gets wow-the-interjection right, at least in my concocted example:

However, when the meaning should actually be the "speed distortion" one, things don't go so well at all:

Leaving aside the translation of wow as "plaire" (which is a verb, as far as I know, as well as having entirely the wrong meaning), we can note that cheap is sometimes "bon" (= good), but not here.

[Sending the English-to-French output back through the French-to-English system, we get "The good cassette player, introduced many pleasing and flicker."]

Posted by Mark Liberman at 07:38 AM

December 09, 2007

The IronPigs mascot

The Lehigh Valley of Pennsylvania has a new minor league baseball team (class AAA, International League; affiliated with the Philadelphia Phillies; to play in Coca-Cola Park, still under construction, in Allentown).  The team has been playing in Ottawa, as the Ottawa Lynx.  Now it has a new name, the Lehigh Valley IronPigs (that's how it's spelled on the website, though you can find Iron Pigs in print), alluding to the Bethlehem Steel Company.  And it has a mascot, the pig pictured below:

The question is: what to call the mascot?  You might want to imagine some possibilities before reading on about how the first choice was abandoned as offensive.

Here's what happened when the mascot's name was announced, according to the Morning Call:

When Guillermo Lopez picked up his newspaper Sunday morning, he froze at the sight of a prominent headline, alongside a photo of a fuzzy, human-sized pig.

There, in bright red ink, was the name of the IronPigs new mascot: PorkChop.

The word brought back uncomfortable memories of Lopez's time working at Bethlehem Steel, where he said racism was overt.

''If someone wanted to talk to me in a derogatory way, they'd call me 'pork chop','' said Lopez, whose family came from Puerto Rico to the Lehigh Valley in the 1940s to work at Bethlehem Steel.

... Lopez, who is vice president of the Latino Leadership Alliance, theorizes the term is regional, and came into use during the period when Latinos began arriving to work at the steel plant. Though Lopez was often called pork chop when he began working in 1973, by the time he left in 1998 he rarely heard the slang.

The paper cites the Urban Dictionary on pork chop:

pork chop: A racist term referring to Puerto Rican people; also slang referring to Portuguese people. It is not offensive to call someone of Portuguese descent a pork chop because it is considered humorous and endearing.

So the name was changed.  From the Lehigh Valley Line a few days ago:

Some say the Lehigh Valley Iron Pigs' decision to change the name of its mascot from PorkChop to Ferrous is commendable. After Latino members of the community tipped off the team's management about the derogatory connotations of the name PorkChop, the Iron Pigs chose the word that means, "of or containing iron" instead.

(The Morning Call on-line has a video interview with Ferrous.)

Reactions to the abandonment of PorkChop (or Pork Chop), and to the new name, have been decidedly mixed.  Many fans, including some Latinos, don't see what the fuss is about.  But the management, "trying to avoid the appearance of evil" (as Roger Shuy put it in the most recent Language Log posting on the Washington Redskins and related nomenclatural issues) chose to not risk offending even some fans.

Notice that Ferrous is a pig of color, not your ordinary pink porker.  He is, in fact, (appropriately) more or less iron- or steel-colored.

(Very big hat tip to Ned Deily, who grew up in Bethlehem.)

Posted by Arnold Zwicky at 03:18 PM

Antedating hyphenation

In response to "American Indian Hyphens" (12/1/2007), David Nash has a guest post at Transient Languages and Cultures, "The hy-phen at Port Jackson" (12/9/2007), showing that

Here in Australia, by about 1791 hyphens between syllables were common when the Sydney Language was being written down by the English colonists (who had arrived in 1788).

David cites the example of the word lists in David Collins' 1798 work An account of the English colony in New South Wales. However, he observes that

Lt William Dawes, the best recorder, did not use hyphenation this way, as can be seen in the facsimile sample from his notebook illustrating HRELP's Dawes online.

The first vocabularies recorded in Australia were Cook's and Banks' lists taken down at Endeavour River in 1770. Those did not use hyphens. They are similarly absent from the 1777 vocabulary recorded for Cook by his surgeon Mr Anderson at Adventure Bay (Tasmania), at least as published in 1821.

Several people have suggested to me that the North American hyphenation practices, apparently developed in the 1820s, might have been influenced by Sequoyah's highly successful syllabary for Cherokee, whose development is described in an account from the Cherokee Phoenix, "Invention of the Cherokee Alphabet", 1(24) p. 2, August 13, 1820. A compilation of some other early sources is available in this 1930 article from the Chronicles of Oklahoma.

[Update 12/16/2007 -- the erudite commenters over at Language Hat have taken the practice back to some 17th-C examples. Mike McMahon writes there:

Eliot's Indian Primer (1669) proceeds as follows:

1. Alphabet.
2. Possible syllables.
3. Brief passages with hyphens.
4. Lord's Prayer.
5. Longer passages.

So, that third section seems to be following the convention from the earliest time; later ones dispense with the hyphens. It begins:

Wa-an-tam-we . uſ-ſeonk. ogke-tam un-at . Ca-te-chi-ſa-onk.
Ne-gon-ne . og-kee-taſh. Primer.
Na-hoh-to-eu . og-kee-taſh.
Ai-uſ-koi-an-tam-o-e . weh-kom-a-onk.
Ne-it . og-kee-taſh . Bible.

Which I believe is recommending a graduated reading program from among his many publications. If you've got access to Early American Imprints : Evans, and many American libraries do (the BPL has a proxy), it's page 8 of that scan.

It seems that it is indeed, as marie-lucie hypothesized, a scheme for pedagogy and narrow transcription that leaked over into normal writing for a while.

Here's the section that Mike cites:

And here's a sample of the main part of the primer, which (like other publications of the period) lacks the hyphenation:

(The poor quality of the scans is original, alas.)

I'm still curious about exactly who made the change to pervasive hyphenation, apparently in the 1820s, and why.]

Posted by Mark Liberman at 07:54 AM

The Etiology and Elaboration of a Flagrant Mistranslation

[Guest post by Victor Mair]

A series of earlier Language Log posts have discussed the curious phenomenon seen in the grocery-store sign on the right: absurdly crude English mistranslations in bizarrely inappropriate contexts.

In "Gan: whodunnit, and how, and why?" (5/31/2006), I explained one of the sources of this phenomenon: several Chinese characters pronounced GAN1 or GAN4 -- and meaning such widely disparate things as "dry," "calendrical sign," "to do," and much else beside -- all got collapsed into one simplified character: 干. This has led to enormous confusion, especially when people who know next to no English rely on machine translation software to convert Chinese into English. The chaos caused by this combination of circumstances is vastly exacerbated by the fact that this little, three-stroke symbol also has a vulgar meaning when pronounced in the fourth tone, GAN4, namely "fuck," which is probably an extension of the regular sense of "do." Because GAN4 ("do") and GAN1 ("dry") are now both written with that little, three-stroke character, the damage is compounded by the enormous range of intended senses of GAN1/4 ("dry," "do," "act," "work," "undertake," "shield," "have to do with; be concerned with," "edge of a body of water," "be rude, impolite, blunt," "embarrass or annoy," "give the cold-shoulder to," "empty, hollow," measure word for a group of people, "trunk, stem, main part," "cadre," "competent, capable, able, talented," "go bad," "be a disaster," etc.), all of which are capable of coming out of the translation software as "fuck."

People who see signs employing the f-word all over China, even in large stores and fancy restaurants, are not only aghast, they wonder how the dickens such a gross mistranslation could have originated and proliferated. I believe that the explanation given in the previous paragraph adequately and accurately accounts for the origin of the basic GAN1/4 = "fuck" mistranslation. The question that remains, then, is how did this virus spread? Theories abound, to say the least. I have intelligent colleagues who believe that naughty people do this on purpose just to scandalize customers and clients. Others hold that it is done to make English look like an uncouth language. Still others maintain that devious foreign translators plant these mistranslations all over the place to make Chinese look stupid and crude. And there are so many additional theories that attempt to account for the GAN1/4 = "fuck" misrendering that I can't keep track of them.

Most of us, however, have all along suspected that this phenomenon resulted from reliance on faulty translation software. Indeed, it is easy to prove that absurd English translations are being spewed out daily in China when individuals who don't know English merely plug Chinese sentences into the software and expect it to come up with reasonable renditions.

For example, one of my students has told me about a sign in Tai'an, in Shandong province, near Qufu, the legendary birthplace of Confucius. The Chinese text reads "Liang2shi4 zheng3gu3 tui1na2", which means "Liang Family Bone-Setting (Medical) Massage". (For a description of TUI1NA2 in traditional Chinese medicine, see the Wikipedia entry.) The English text on the sign reads "Whole Bone of the Beam Surname Pushes to Take"!

After trying for more than a year to find proof that the GAN1/4 = "fuck" mistranslation was indeed the result of relying on poor translation software, I am now able to demonstrate that this really does seem to be the case. Here is the evidence.

There is a bulletin board called Lequ Yuan (which means something like "Pleasure Garden"), and on that site there is a page called "Lüyou Wenhua Tan" (which punningly means both "Donkey Friends Forum" and "Travel Friends Forum"). On January 14, 2003 at 00:51:23, somebody calling him/herself "kailash" made a post entitled "Gaoxiao fanyi (Ridiculous Translations)." The contents of kailash's post (in the top left) are the following, in my English translation:

Truly ridiculous!

Location of photograph(s): The new X supermarket at Hexi in Changsha (Hunan).

The one that is most classic is that for GAN1HUO4 [= "dry goods" -- it must have actually been "fuck foods" in the photograph, which is no longer accessible on the web site]. Can GAN1 really be translated that way?

Below are some (translated) comments on that post, with the commenter's name after each one:

"I suspect that somebody's been playing tricks." Balang

"You go buy a Jinshan Ciba with Added Jinshan Kuaiyi." Huayuan ("Flower Garden")

"Really!" kailash

"I still don't believe it." windy

"Ha, ha, ha!" Knight yvan

"Is it true or fake? Somehow I suspect that it has been faked." By No Means a Lonely Ostrich

Huayuan followed up with another post later that same day, at 11:05:21 to be precise. His response to the doubters, begins thus: NI QU MAI GE JINSHAN CIBA JIA JINSHAN KUAIYI ("You go buy a Jinshan Ciba with Added Jinshan Kuaiyi").

Now, just what is this Jinshan Ciba that can be equipped with a Jinshan Kuaiyi? Literally rendered, these names mean "Gold Mountain Word Hegemon" and "Gold Mountain Fast Translator." Jinshan Ciba is an electronic reading aid (a sort of glorified bilingual [Chinese and English], and to a certain extent multilingual [since some Japanese is included], dictionary), while Jinshan Kuaiyi is a machine translation device. These tools are widely used all over China. Nearly all of the PRC students at the University of Pennsylvania have brought theirs with them, although they realize that Jinshan Kuaiyi usually produces English that is laughable.

Huayuan's post continues (emphasis added):

Then let it (Jinshan Ciba / Kuaiyi) translate this sentence: NI3 XIANG3 GAN4 SHEN2ME? ("What do you want to do?")

It will come out with: "what do you want to fuck?"

This is an experiment that can be repeated; scientifically it is sufficiently rigorous.

Now, that was in 2003. I have checked the latest version of Jinshan Ciba / Kuaiyi, and it does not return the vulgar mistranslation recorded by Huayuan, although it still comes out with a lot of other gibberish (see below). It would appear that somebody told the makers of Jinshan Ciba / Kuaiyi about this gross problem with their translation software and they have taken steps to remedy it. However, the damage has already been done and the cancer continues to spread throughout the Chinese body linguistic. There still must be countless earlier versions of the Jinshan software out there that are being used daily. The only way to halt this ludicrous phenomenon is never to paint, print, or publish an English translation without first checking with someone who is fluent in the target language. That, of course, is unlikely in the foreseeable future, because there just aren't enough skilled speakers of English to go around for the huge demand that has engulfed China.

I tested the latest version of the Jinshan software, and here are some samples of what it now produces (screen shots taken a few days ago):

Jinshan: "What do you want to do?"

Jinshan: "What do you want to do?"

Jinshan: "You just at stem what?"
corrected translation: "What are you doing right now?"

Jinshan: "What do you do?"
corrected translation: "What are you doing?"
or "What do you think you're doing?"

Jinshan: "What are you just dry?"
corrected translation: "What were you doing just now?"

Jinshan: "What do you do to say so?"
corrected translation: "What do you mean by talking like this?"
or "Why are you speaking this way?"

Jinshan: "Dry goods"

Jinshan: "The stem adjusts area"
corrected translation: "Dry Seasonings Section."
(see below for a brief discussion)

Those who travel around China know for certain that the GAN1/4 = "fuck" abomination is not fake. It is real, and now -- though it remains thoroughly deplorable -- I think I understand how it happened.

Here are just two of the countless instances of the GAN1/4 = "fuck" paradigm that have spread throughout in China:

Pinyin:     GAN1TIAO2 QU1
English: "Fuck to adjust the area"

Correct translation: "Dry Seasonings Section"

Note: It is no wonder that machines get confused by this expression (see the less salacious machine translation above, no. 8 of the screen shots, "the stem adjusts area"), since every Chinese to whom I've shown this sign has hesitated in their pronunciation (the second syllable could also be read DIAO4) and in their interpretation of its meaning -- there are many different possibilities.

Pinyin:    SAN3 GAN1GUO3
English  "spread to fuck the fruit"

Correct translation: "Loose Dried Fruit"

Note: More often GAN1GUO3 actually refers to nuts instead of fruit.

Many other examples of GAN mistranslated as "fuck" can be found on the internet -- several have been featured in the earlier Language Log posts linked above, for example this sign from a hotel buffet:

And this restaurant sign:

Two more: what should be "dry foods price counter" is rendered in large letters as "FUCK THE CERTAIN PRICE OF GOODS":

And what should be "dried foods" becomes "FUCK GOODS":

As a sample of the other widespread effects of unwise reliance on dictionaries, digital or otherwise, here are some incredibly fine examples of Chinglish in Shanghai., probably also caused by excessive reliance on poor-quality machine translation. Among other precious items, notice the "Bang products" and the "antithetical couplet ocean."

Well, enough for now. I hope that I do not have to spend the rest of my life documenting and explaining Chinglish. For the moment, however, it would appear that I still have much work to do.

I wish to express my gratitude to Jonathan Smith for helping me track down early occurrences of the GAN1/4="f*ck" atrocity on the Web and for the fabulous TUINA sign in Shandong, and to Jiajia Wang for making the screen shots of Jinshan Ciba / Kuaiyi in action, also for scouring Facebook for interesting examples of Chinglish.

[Above is a guest post by Victor Mair]

[Update -- Joel Martinsen writes:

I dug up a CD of Kingsoft 2002 and tested it on some of Prof. Mair's example sentences. Thought it might be nice to have a screenshot of how the software used to work to compare to the more acceptable translations of recent versions of Kingsoft QuickTrans:

The last line of the attached image is part of the translation of the title of an academic paper that was published in the Economic Administrative Cadre Bulletin in 2003, and discussed here in the EastSouthWestNorth blog:

* [019] A Solid Fuck (12/08/2007) (KDnet) In issue 2, volume 16, June 2003 of the Kansu province Economic Administrative Cadre Academy Bulletin, there appeared an essay titled <开拓进取,真抓实干,不断开创西部大开发的新局面>. The author Song Chaosu is the Kansu provincial party secretary and the chairman of the Kansu provincial people's congress standing committee.

In the database, the title of the article was translated into English as: Expand Enterprising and Really Grasp Solid Fuck and Continuously Expand and Great the New Situation of Buildings of Western Region.

If there is a "solid fuck," it is the translation job. A more appropriate translation is: Develop and Forge Ahead with True Understanding and Effort in order to Continuously Create New Situations to Open Up the Western Region."


Posted by Mark Liberman at 06:59 AM

December 08, 2007

Due to <NounPhrase> and that <Sentence>

Here's one for Arnold Zwicky's collection of Coordination of Unlikes, from Frank Cho's comic strip Liberty Meadows for 12/7/2007.

I don't follow the strip closely enough to get the joke -- and maybe more context wouldn't help -- but I believe that the content of this message is not to be taken literally. If anyone censored Cho, it was Cho himself.

As for the form of the message, I don't have any trouble with most of Arnold's C. of U. examples, but this one is definitely in WTF grammar territory. The only other examples that I've been able to find of NP and that S coordination are in quasi-incoherent passages like this one:

Industries running primarily on coal-driven energy is bad yet unavoidable in the meantime due to the drought and that water-turbine energy simply isn't all too grand in the scheme (due to drought and that water-turbines alone simply can't generate the energy Australia uses).

The rest of Friday's Liberty Meadows is here:

[Update -- Bruce Webster explains:

Actually, the joke with the Liberty Meadows strip is that there are plenty of Liberty Meadows strips that newspapers have refused to run -- so many that Cho has a website dedicated to just those strips:

If you're wondering why some of these strips were dropped -- well, so do the rest of us ("ovarian cyst"?). But you can usually figure out what the likely source of objection was.


[Several readers have observed that the coordination changes its nature if "the fact" is inserted in an appropriate place. Exactly -- then it would be simply a coordination of noun phases.]

[And Craig Russell wrote to suggest that it might all be Strunk & White's fault:

Did any of these readers mention that Strunk & White have a whole subsection, under the classic Omit Needless Words slogan, about "the fact that":

"'The fact that' is an especially debilitating expression. It should be revised out of every sentence in which it occurs."

That might be it! ]

Posted by Mark Liberman at 10:39 AM

Journalistic exegesis

Over at Talking Points Memo, Josh Marshall is applying a hermeneutical microscope, not to the constitution or the bible, but to one sentence from a story in the New York Daily News ("Editorial Dispute over Shag Fund", 12/07/07):

... there's some controversy over the specific meaning of the reports that then-Giuliani mistress Judi Nathan's NYPD valet and security escort also helped walked Nathan's dog.

Now, the original report, or at least the most recent one came in this December 1st article in the New York Daily News which detailed how the NYPD was dragooned into being Nathan's personal limo service when they ferried her on trips to her parents house 130 miles away in Pennsylvania.

The key passage comes midway through the article ...

At the time, it was not uncommon to see Nathan being chauffeured around the city in an undercover Dodge with two detectives, who sometimes even helped to walk her dog.

Now, since the article was published, I've taken this to mean that the NYPD officers detailed to take care of Nathan sometimes walked her dog for her.

We can characterize this as the transubstantiation position: the role of dog-walker was entirely taken over, at least in substance, by a cop (though this was not plain to the eye, since these were undercover officers).

But Paul Kiel disagrees. He interprets the statement as meaning that the cops escorted Nathan while she walked her dog.

This is the consubstantiation position: during dog-walking, a cop was present alongside Ms. Nathan, who remained present as well.

As often in doctrinal disagreements, the arguments hinge on subtle details of semantic and pragmatic interpretation, as sentence meaning interacts with contextual facts in counterfactual reasoning as to what the writer's intentions must have been:

To me, Paul's reading is redundant. A security detail escorts you everywhere you go in public. So saying they were helping her walk her dog if they were only escorting her while she did it would be misleading.

In other words, I read this as implying direct cop to dog contact, that they were walking her dog.

And the hair is split even more finely as others get into the discussion:

Meanwhile, TPM Editor-at-Large David Kurtz basically takes my position but does a little bit of annoying hedging. From our work chat area, I quote David writing, "I imagine them holding leash while she talks on phone, that kind of thing."

I take it that the elaboration of this argument on Josh's blog is basically a display of fatigue-induced silliness. Still, it must be an interesting experience for the writers of the Daily News article, Michael Saul and David Saltonstall, to find their sentence dissected and examined as if it were a clause in the Bill of Rights or a verse in Leviticus.

Posted by Mark Liberman at 10:09 AM

December 07, 2007

Recursive news

The latest nominee for the prestigious Trent Reznor Award for Tricky Embedding is Michael Sneed in the Chicago Sun-Times ("Stacy told clergyman Drew killed ex", 11/29/2007):

Sneed hears Stacy Peterson told a clergyman in August that her husband had claimed to have killed his former wife Kathleen Savio and made it look like an accident.

A source close to the investigation tells Sneed the 23-year-old, who had been pregnant and living with Peterson when Savio was found dead in an empty bathtub in 2004, also told two other people close to her about her husband's statements regarding Savio's demise.

The trickiness here is mainly epistemological. (Michael Sneed often needs epistemological camouflage -- for example, she was the author of the anonymously-sourced false story that last April's VA Tech shooter was Chinese.)

The clausal embedding is only three levels of communication verbs plus a causative in the first sentence, and a different stacking of the three communication predicates in the second sentence, along with a two-level relative clause for lagniappe -- but it does illustrate the theory that the evolutionary force behind syntax is gossip:

[Sneed hears
   [Stacy Peterson told a clergyman in August
      [that her husband had claimed
           [to have killed his former wife Kathleen Savio and made
                 [it look like an accident] ] ] ] ]

[Hat tip -- Headsup: The Blog]

Posted by Mark Liberman at 08:05 AM

December 06, 2007

Speak xkcd or die

As always, xkcd says it very, very well.

National Language

(Hat-tip to Margo Schwartz.)

[ Comments? ]

Posted by Eric Bakovic at 04:01 PM


Bob Ladd follows up on my posting about the sentence

Queen Elizabeth will be opening the British Parliament ... and deliver a speech.

with some suggestions about further factors that might be playing a role in interpreting (or failing to interpret) the coordination in this example, beyond the structural considerations I posted about: pragmatic and performance factors.

To recap in some detail: the QE example has a VP with a coordination in it:

will be opening the British Parliament and deliver a speech

The question is: what are the conjuncts?  The second conjunct is clearly the VP deliver a speech (hereafter, (1)).  What's the first conjunct?  There are three possibilities:

(2) will be opening the British Parliament
(3) be opening the British Parliament
(4) opening the British Parliament

Possibility (2) is a dead loss: (1) be coordinated with the "highest" of the three alternatives (against the default tendency Low Attachment, to associate expressions with the structurally lowest available alternative); and the two conjuncts would not be structurally parallel ((2) is a finite VP, while (1) is a base-form VP, thus running against the default preference Internal Parallelism, favoring internally parallel conjuncts); and, fatally, though the subject of the sentence (Queen Elizabeth) is interpretable with the finite VP (2) as its predicate, it doesn't fit with the base-form VP (1), against what is at the very least a strong default preference -- often treated as a rigid requirement -- that the "factor" in coordination (the subject Queen Elizabeth in this case) should distribute equally over each conjunct (Distributivity): the whole thing fails on the bizarreness of Queen Elizabeth deliver a speech as a finite clause (parallel to Queen Elizabeth will be opening the British Parliament). 

I remind you that all three of these principles -- Low Attachment, Internal Parallelism, Distributivity -- are preferences, not laws of nature.  (Even Distributivity is violable in certain circumstances; see below.)

There are still two candidates left.  Low Attachment recommends (4) + (1), opening the British Parliament and deliver a speech, and this is the interpretation I got at first.  But this loses points on Internal Parallelism ((4) is a present-participial VP while (1) is in the base form) and is very bad news on the Distributivity front: (4) is fine as a complement of be, but (1) is unacceptable in that context in standard English (o.k.: be opening the British Parliament, bad: be deliver a speech).

On to (3) + (1).  This loses points by not having Low Attachment (o.k., Low Attachment is just a preference, and there are tons of perfectly fine examples where it's violated), and also by not having Internal Parallelism, since (3) is progressive aspect while (1) is plain, or unmarked, aspect (though again, Internal Parallelism is just a preference, not a requirement).  Meanwhile, (3) is fine on Distributivity, with (3) and (1) both serving as base-form complements of the modal will, and this was the intended interpretation of the QE sentence.

Now Bob Ladd introduces two further factors.  The first of these is the event structure denoted by the coordinated VPs.

... the point about the Queen opening Parliament is that her speech ("the Queen's Speech") is to all intents and purposes the opening of Parliament. It's not really two separate events, which is clearly suggested by your "improved variant" with the tense reversed (Queen Elizabeth will open the British Parliament ... and then be delivering a speech).  Comparable North American examples might be:

On January 20th the new president will be taking the Oath of Office and swear to uphold the constitution.
The President will be opening the new baseball season today and throw out the first pitch.

To me, these sound a lot better than, say,

Senator Clinton will be meeting her senior staffers today and fly to a campaign rally in Iowa.
Mark Liberman will be writing three posts for Language Log today and meet with the Dean.

This is a general fact about coordinated expressions, especially in "reduced" coordinations: they will ordinarily be understood not merely as denoting two independent things, but as denoting two CONNECTED things -- connected either by temporal sequence, as in

Kim entered the room and surveyed the damage.

or by association as parts of a larger entity, as in

Kim collects Hello Kitty items and generally likes things in pink.

The tighter the syntactic association of the conjoined expressions -- the more "reduced" the coordination is -- the stronger the implicature of sequence or association in a coherent whole is.  The sentence

I met Magic and Philip Johnson at the party.

is bizarre (but entirely grammatical), even though it's possible that the speaker met both Magic Johnson (the basketball player) and Philip Johnson (the architect) at this party.  It's bizarre because it suggests that the two men constituted some sort of unit.

So Ladd's first point is that (3) + (1) isn't so bad, because opening Parliament and delivering a speech together constitute a single event.  (My "improved variant" has the sequence interpretation, clearly marked by then, so it's golden on almost all fronts.)

Ladd's second point:

As for the performance factor, another thing that makes the original sentence about the Queen sound funny is that the second conjunct ("and deliver a speech") is too short. I think the following is a lot better, even for an American who doesn't know about the institution of the Queen's Speech:

Queen Elizabeth will be opening the British Parliament ... and deliver a speech in which the government's legislative plans for the coming year are spelled out.
Mark Liberman will be writing three posts for Language Log today and meet with the Dean to discuss the shortfall in the budget for the phonetics lab.

Again, this is a familiar effect: in a variety of contexts (the details are very complex) longer-before-shorter is not as good as shorter-before-longer.  This "law of increasing members"-- the term is not my invention, but a genuine technical term, translated from German, and entertaining because of the mild raciness of "increasing members"; no actual "law" is involved, however -- shows up all over the place: salt and pepper is a bit better than pepper and salt, I gave up this fruitless quest for truth is a lot better than I gave this fruitless quest for truth up, and so on.  (There's an enormous literature on manifestations of the tendency.)

I agree with Ladd that the QE example is improved when the second conjunct is longer; there's more time to process the second conjunct before the sentence comes to an end.

And now for still another possible take on things.  On the Low Attachment parsing, the QE example has the present-participial VP (4), opening the British Parliament, conjoined with the base-form VP (1), deliver a speech.  Although (as I pointed out in the earlier posting) this is not a GoToGo example (as in I'm going home and take a nap), it has one significant point in common with GoToGo: the coordination of a present-participial VP with a base-form VP, in which the former fits the larger syntactic context (be takes a present-participial complement in the progressive construction) and the latter does not.  Distributivity fails in both.

The fact is that Distributivity ALWAYS fails for GoToGo; present-participial going (simultaneously representing prospective (be) going (to) and also motional go in construction with a goal adverbial) is a feature of the construction, and so is a base-form VP as the second conjunct.  This is just the way the construction works, for those of us who have  it.  (One more time: for a fair number of people, GoToGo is NOT an inadvertent error, but a regular, though non-standard, part of their linguistic system.)

Here's the point: Distributivity is not some sort of natural law or logical necessity.  It's just a way parts of people's linguistic systems can work.  It makes sense, because it's iconic of the semantic parallelism of the conjuncts: each conjunct is treated the same way formally.

But Distributivity isn't the only possible game in town.  For instance, individual idioms and "small constructions" (like GoToGo) can work any way they want to.  People learn them as special cases, overriding more general conditions like Distributivity.

And there are alternatives to Distributivity -- in particular, a Single Marking scheme, in which one conjunct gets the marking appropriate to the context, and the other conjuncts appear in some default form.  This is exactly as "logical" and "natural" as Distributivity.  In fact, Distributivity can be seen as wasteful and redundant: why mark every conjunct for some feature, when one would do?  Compare this situation to the marking of negation within clauses: many languages have negation distributed to all eligible constituents -- this is "multiple negation" or "negative concord", as in non-standard English I didn't see nobody, and quite generally in Romance and Slavic languages -- but others mark negation only once in the clause, as in standard English I didn't see anybody and I saw nobody.

(I chose the words "wasteful and redundant" deliberately, to counter those who would say that Distributivity is "only logical".  Note that multiple negation in varieties of English is widely, though absurdly, labeled "illogical" -- exactly the opposite value judgment from an insistence on Distributivity in coordination.)

In fact, Single Marking is well attested in (various corners within) the world's languages.  (Confession: I know this is so, but I haven't had the time to do many days of library research surveying the matter, and I might never have it.  But I recall allusions to such systems in the ancient Indo-European languages, and I'm familiar with VP chaining in various languages, in which VPs occur in sequence -- without overt conjunctions -- with the first marked for the appropriate categories in the syntactic context and the rest with much reduced marking.  And the phenomena of agreement "with the nearest" and government "by the nearest" -- not as inadvertent errors, but as parts of a linguistic system -- are clearly related.)

Now back to the QE example.  Maybe this is just Single Marking in the (preferred) Low Attachment configuration: the first conjunct (opening the British Parliament) is marked appropriately (in the present participle form) for the syntactic context, while the second conjunct (deliver a speech) defaults to the base form.  It would be like GoToGo, but on a much larger scale.

Why do I suggest this?  Because the GoToGo Crew (a loose association of linguists who've thought about the construction) occasionally come up with things that look like Single Marking in current English VP coordination, with base-form VPs in non-initial conjuncts.  Here are two from Joel Wallenberg:

[Linguistics professor in class, 2003]  It's a way of looking at the big chi-squares and see if we can figure out ...

[New York Times article, 2004]  "The way you do that is by having hearings, find out who is responsible, get it done and get it behind us," Mr. McCain said.

I have a few more of these squirreled away in places I can't at the moment locate.  They might, of course, be inadvertent errors.  But they might be indications that Single Marking is around, as a minor constructional option, for some speakers in modern English.

Posted by Arnold Zwicky at 01:39 PM

Stir-fried Wikipedia, with pimientos or salamander

Pekka Karjalainen sent in a link to this apparent Chinese menu image:

A quick Google search suggests that it is a genuine picture of a genuine menu from a genuine restaurant in Beijing, originally presented back in October on Evolving Web ("Jimmy Wales Grows Them Good and Organic", 10/10/2007).

The author suggests this plausible etiology:

"Hey I'm making the new menu, what's the english name for those flat crispy mushrooms?"

"Um, there isn't one."

"Well what should I put down here?"

"I don't know, look it up in wikipedia."



The comments on that same post will take you to a picture of "barbecued congo eel with wikipedia and Fermented bean curd":

I've heard that wikipedia is only safe to eat in months that contain a 'd' (or is it months that don't contain a 'd'?).

In other Chinese-English translation news, Victor Mair has hinted to me that he'll soon supply a definitive scholarly exegesis of the GAN () phenomenon: watch this space. But I worry that the ingenuity of Chinese menu translators may overwhelm the collective capacity of the international scholarly community.

[Note that you can also get "Wekipedia bread" from Beijing Wekipedia Foods Co., Ltd.]

[Update 12/7/2007 -- Barbara Zimmer's friend Emily suggests: " That would go perfectly with eBay soup. Or maybe moo google gai pan".]

Posted by Mark Liberman at 01:19 PM

December 05, 2007

They doesn't give a damn

Never mind a full implementation of subject-verb agreement. For the Facebook case Arnold Zwicky just noted, all we need is a replacement of "is" by "are" when no sex specification is available. For heaven's sake, even I, who cannot program in anything fancier than the Unix C shell language, can program that:

set predicate = "now looking for friendship, a relationship and networking."
if ( $sex == male ) then
  echo "He is $predicate."
else if ( $sex == female ) then
  echo "She is $predicate."
  echo "They are $predicate."

It's still a bit ugly to use singular they with a proper name antecedent, but don't tell me the programmers couldn't have set things up to do what the above piece of code does. This is not about verb agreement complexities; this is about not giving a damn.

Posted by Geoffrey K. Pullum at 11:42 AM

They is looking for ...

Back in April, the Youth and Popular Culture desk at Language Log Plaza, managed at the time by Eric Bakovic, reported on "singular they" in Facebook.  Facebook users can leave their sex unspecified in their profiles.  If they do, other items in their profiles will be produced with the pronouns they, them, their rather than the sex-appropriate pronouns.  So (to quote Eric) we get things like

Kim Doe added "burger" to their favorite foods.

I'm not really comfortable with this (though I have no problem with singular they with indefinite antecedents), but I understand how it happens, and Language Loggers have reported on similar cases over the years.

Facebook's handling of pronouns turns out to lead to much worse problems, though, because there's no mechanism for subject-verb agreement.

This we discover from a message to ADS-L by James Harbeck yesterday, who noted that the Facebook messages about a friend of his (who had not specified sex) included

[Name] updated their profile. They is now looking for friendship, a relationship and networking.

Way over the line for me, though you can see how it came about: there's a program that pulls the sex item (called "sex", and not "gender", by the way) from the user's profile and then plugs in she/her/her, he/him/his, or they/them/their according to what it finds there.  So if someone with no sex specified checks one or more boxes in the "Looking for" item -- Friendship, Dating, A Relationship, and Networking are the choices offered -- that will be reported as

[Pronoun] is looking for ...

with the program supplying the "appropriate" pronoun, in this case they.  Result:

They is looking for ...

This is akin to those annoying reports like

You have 1 messages.

which result from a lack of number-marking rules within NPs; a number is simply inserted in a template.

[Full disclosure: I am a Facebook user, which is why I can tell you about profile items.  Among my Facebook friends are Steven Bird, Dan Jurafsky, and Ben Zimmer here at LLP, plus Michael Erard, Jesse Sheidlower, and Chris Waigl (whose names have come up on Language Log several times), a number of friends from the newsgroup soc.motss, and various current or recent students.]

Posted by Arnold Zwicky at 10:33 AM

December 04, 2007

Extramarital toes

The 1 December Economist entertained me by beginning a story ("Labour pains", p. 17) on the latest British political scandal with a wonderful coordination containing a figurative surprise in the final conjunct:

As British political scandals go, this one is not particularly juicy.  No honours seem to have been sold, no politician's Parisian hotel bills picked up, no extramarital toes sucked.

Well, no toes were sucked extramaritally; nevertheless, the reference to extramaritality is in an adjective modifying toes, rather than in an adverb modifying toes (were) sucked.  This is a figure of speech known as the transferred epithet or displaced epithet -- or as hypallage (for the pronunciation, think of allergy).  I promised you back in April that I would post on hypallage, and now I am.

Also back in April, Michael Quinion's World Wide Words newsletter (#541) cited a somewhat different sort of example:

I was at a meeting on Thursday that included a sandwich lunch. Mine was Italian Chicken, whose other ingredients were Italian pesto, sun-dried tomatoes, freshly-ground black pepper, and free-range mayonnaise. It was sad to think of those cute little mayonnaises, running around unconstrained and happy until it was time for them to join the rest of the ingredients in my sandwich.

That free-range mayonnaise would be mayonnaise made with free-range eggs, which are in turn eggs from free-range chickens (NOAD2 has both the poultry sense of free-range and the sense with the adjective displaced from poultry to their product, eggs, but not of course any senses with the adjective further displaced to foodstuffs made with eggs).  This sort of transfer, of an adjective "from noun to noun", is sometimes taken as definitional for hypallage, as in the Wikipedia entry:

A transferred epithet or hypallage is the transfer of an epithet from one noun to another.

with the examples "restless night", "happy morning", and Thomas Gray's "The ploughman homeward plods his weary way".

Strictly speaking, the transfer is not really from one noun to another, but from one referent to another: an adjective that syntactically modifies one noun is understood as applying semantically not to the referent of that noun but to some other referent (one not necessarily named by a noun in the discourse). 

In any case, in addition to the "noun-to-noun" cases, there are also many like the extramarital toes example I started with, with adjectives interpreted adverbially.  Here's one that Marc Sacks posted to ADS-L on 30 July, from e-mail to him:

Internet Radio in the United States dodged a very narrow bullet yesterday when SoundExchange, the thuggish lobbying arm if the Recording Industry Association of America, backed off on demands which would have virtually silenced this exciting and original kind of Radio due to the imposition of a fee schedule which was roundly, and rightly, criticized as excessive and unfair.

That is, dodged a narrow bullet 'narrowly dodged a bullet'.  Larry Horn then said that he thought of this as the "cocked an inquisitive eyebrow" construction (i.e., 'cocked an eyebrow inquisitively'), and I supplied the technical terminology, plus a reference to Robert A. Hall, Jr.'s 1973 squib in Linguistic Inquiry 4.92-4, "The transferred epithet in P. G. Wodehouse" (there's also Hall's 1974 book, The Comic Style of P. G. Wodehouse).  Wodehouse was fond of hypallage.

On 14 October, Larry added an odd example from Cris Collinsworth on NBC, discussing a big fumble: it turned the complete game around 'it turned the game completely around' (or 'it completely turned the game around').

(Note: English has a class of adjectives understood adverbially that aren't figurative, in expressions like previous president 'person who was previously president'.  There's no displacement here; these adjectives semantically modify the nouns they are in construction with, but in a more complex way than ordinary adjectives do.)

My interest in hypallage was piqued back in April by e-mail from Daniel Hulme asking about examples like a cold cup of tea, which I took at first to involve displacement ('a cup of cold tea').  After further discussion with Hulme and later on ADS-L, I concluded that cup here is just a measurement noun, so that cold is in fact modifying cup of tea; it's hard to interpret a cold plate of toast as 'a plate of cold toast', because plate isn't ordinarily used as a measurement noun.

Then came free-range mayonnaise, dodging narrow bullets, and, most deliciously, sucking extramarital toes.

Posted by Arnold Zwicky at 05:53 PM

GOP cell phones?

Here's a rather startling headline for a recent Associated Press article, as hosted by Google News:

If you click through to the article, you'll see that Montclair State University is requiring students to carry cell phones with GPS tracking devices, not cell phones from the Grand Old Party. But somehow GPS got changed to GOP by the AP headline writer or an intervening editor. It no longer reads that way on the AP site or on most of the news outlets that have hosted the article, but at the moment it still lingers on Google News and some foreign outposts of Yahoo! News.

Thanks to Eric Jusino for sending this in. Eric suspects that the Cupertino effect is to blame here, and he may well be right that this is a spellchecker-induced slipup. GPS is a relatively new abbreviation, so it's conceivable that a spellchecker wordlist still lacks it and has GOP as a suggested correction. That wouldn't be a very good wordlist, though, as GPS has been around since 1974, according to the OED, and has been in all the leading American dictionaries for a while now.

(And thanks to all the other readers who have sent in possible Cupertino-isms. I hope to do another Cupertino roundup in the near future.)

[Update, Dec. 6: Kilian Hekhuis writes in with a more likely scenario:

It may very well be possible the spell checker did not correct GPS to GOP (since I'd expect all references to GPS to be affected), but a misspelling of GPS, most likely GSP. GSP would be a common misspelling since GS are types with one hand, and P with the other. Being a bit too fast with the S, the P lags behind creating GSP. And I can see GSP being 'corrected' to GOP (although personally I think GPS should be higher on the correction list than GOP, given that letter reversal is more common than typing an S where an O should be).

So that would group this example with other incorrections of actual misspellings, like aquainted getting changed to aquatinted instead of acquainted, or dentified getting changed to dentrified instead of identified.]

Posted by Benjamin Zimmer at 02:23 PM

Language Materials Forbidden at Guantanamo Bay

I recently discussed what the leaked Standard Operating Procedures manual for Guantanamo Bay revealed about linguistic operations. A more recent version of the manual, from 2004, is now available, along with a systematic comparison of the two versions and summary of the differences. The one difference I noted of linguistic interest is in the list of materials to be excluded from the Detainee Library. In the earlier version of the manual, the forbidden categories are unsurprising : works promoting jihad, anti-Americanism, etc. or dealing with sex. The new manual adds "Technology/Medical Updates" and "Geography", which seem a bit odd, and "Dictionaries" and "Language Instruction". Why are language materials considered dangerous?

Posted by Bill Poser at 02:09 PM

I done 'em wrong

Yesterday a friend forwarded a link to Elizabeth Little's charming essay "Ablative, Allative, Adessive, Compulsive", and I followed it and read it, thinking it was current. But I was surprised that there was no associated plug for her recent book (Biting the wax tadpole: confessions of a language fanatic). So I jumped to the conclusion that she'd been betrayed by careless editors in the NYT travel section, and this morning, I dashed off a little note under the title "They done her wrong".

What I didn't notice -- until several readers pointed it out to me -- was that the NYT published her essay back in November of 2006, whereas the book was published in November of 2007. In fact, it's plausible that the essay, far from failing to promote the book, was the seed from which it grew.

I haven't read Biting the wax tadpole, but judging by Little's NYT essay, it should be fun. In my original post, I ventured to suggest that her enthusiasm might exceeds the boundaries of mere fact, as in her mention of "at least nine" Hungarian "locative cases", and the anecdote underlying her title, which appears to be an urban legend. However, another reader quickly set me straight:

Actually, she accurately ascribes the "wax tadpole" not to Coke but to Chinese shopkeepers.

And her "nine Hungarian locatives" is actually eighteen Hungarian cases, with a sample declension of "bar", included in the context of jokingly worrying how to remember that you weren't in the bar or at the bar or even gone to the bar when you came home late and drunk.

It's an engaging little book, really.

So I'm going to shut up and wait to read the book before I make any more mistakes. Here's the only part of my original post that wasn't an error: it's nice to see a popular book about language that promotes enthusiasm about morphology and syntax as well as lexicography and etymology.

Posted by Mark Liberman at 09:18 AM

Rational inquiry: past, present and future

Remember that National Geographic spread last year about the newly-discovered "Gospel of Judas Iscariot"? An opinion piece by April DeConick in the New York Times ("Gospel Truth", 12/1/2007) argues that its essential argument was untrue and perhaps even dishonest:

It was a great story. Unfortunately, after re-translating the society's transcription of the Coptic text, I have found that the actual meaning is vastly different. While National Geographic's translation supported the provocative interpretation of Judas as a hero, a more careful reading makes clear that Judas is not only no hero, he is a demon.

Several of the translation choices made by the society's scholars fall well outside the commonly accepted practices in the field. For example, in one instance the National Geographic transcription refers to Judas as a "daimon," which the society's experts have translated as "spirit." Actually, the universally accepted word for "spirit" is "pneuma" — in Gnostic literature "daimon" is always taken to mean "demon."

Likewise, Judas is not set apart "for" the holy generation, as the National Geographic translation says, he is separated "from" it. He does not receive the mysteries of the kingdom because "it is possible for him to go there." He receives them because Jesus tells him that he can't go there, and Jesus doesn't want Judas to betray him out of ignorance. Jesus wants him informed, so that the demonic Judas can suffer all that he deserves.

Beyond the pattern of translations "well outside the commonly accepted practices in the field", Prof. DeConick described a mistake that seems to rise to the level of outright fakery:

Perhaps the most egregious mistake I found was a single alteration made to the original Coptic. According to the National Geographic translation, Judas's ascent to the holy generation would be cursed. But it's clear from the transcription that the scholars altered the Coptic original, which eliminated a negative from the original sentence. In fact, the original states that Judas will "not ascend to the holy generation." To its credit, National Geographic has acknowledged this mistake, albeit far too late to change the public misconception.

She leaves the question of motivation unresolved; or, to put it another way, she leaves the National Geographic dangling in an ethical limbo:

How could these serious mistakes have been made? Were they genuine errors or was something more deliberate going on? This is the question of the hour, and I do not have a satisfactory answer.

You can read more about this textual controversy, and additional background on the Tchacos Codex, on The Forbidden Gospels Blog. Her arguments about the translation seem persuasive to me, though I haven't tried to evaluate them in detail. But this case underlines an important respect in which the norms of inquiry in traditional humanistic scholarship are superior to those of modern science.

When scholars like Prof. DeConick debate a point, they normally do so in a context where all participants have access to all of the underlying data. If someone mis-translates a crucial word, or leaves out a negative, other scholars will catch it -- because they have independent access to the same texts.

But in this case, as Prof. DeConick observes in her NYT OpEd piece, the National Geographic violated those norms, at least temporarily, in pursuit of a scoop:

National Geographic wanted an exclusive. So it required its scholars to sign nondisclosure statements, to not discuss the text with other experts before publication. The best scholarship is done when life-sized photos of each page of a new manuscript are published before a translation, allowing experts worldwide to share information as they independently work through the text.

Another difficulty is that when National Geographic published its transcription, the facsimiles of the original manuscript it made public were reduced by 56 percent, making them fairly useless for academic work. Without life-size copies, we are the blind leading the blind. The situation reminds me of the deadlock that held scholarship back on the Dead Sea Scrolls decades ago. When manuscripts are hoarded by a few, it results in errors and monopoly interpretations that are very hard to overturn even after they are proved wrong.

To avoid this, the Society of Biblical Literature passed a resolution in 1991 holding that, if the condition of the written manuscript requires that access be restricted, a facsimile reproduction should be the first order of business. It's a shame that National Geographic, and its group of scholars, did not follow this sensible injunction.

Now that full transcriptions and/or full-sized facsimiles are available, the normal situation in humanistic scholarship has been restored. And this is a situation that scientists, in general, can only dream about.

When scientists publish a new and controversial claim, they normally keep their basic data secret, publishing only a few illustrative examples, and some summaries in the form of tables, graphs and evaluative numbers from various statistical tests. This is supposed to present the material that is essential to the argument. But even if there has been no out-and-out faking of data, the path that leads to the illustrative examples and statistical summaries is usually full of choices that are not neutral ones.

There may be seriously confounding factors (in the selection of materials or subjects, or in the process of running experiments or gathering data), and the scientists may fail to notice these, or may notice them and choose not to document the problems.

The measurements, even if they're made from a scientific display like a spectrum or an MRI image, often involve some subjective judgment. Sometimes measurements made "blind", i.e. by people who are unaware of the hypothesis to be tested and also of the category of each measured display. But often this kind of blind measurement isn't possible, or isn't done where it might have been possible.

The data used in the final argument is often selected and corrected, with outliers or other "bad" data detected and removed. This is a fine thing to do, in principle -- but if "outliers" are in effect defined as data points that disagree with the hypothesis, this cleaning process may bias the results.

And once a data set is created and cleaned up, there are many different kinds of statistical models and tests that might in principle be applied to it. It's common for scientists to analyze their data in dozens of ways, and publish only one or two of these. This may be the result of an honest evaluation of the best way to show what's going on -- but it's usually at least an attempt to find the most "interesting" angle, one that makes a case in the strongest way. And sometimes it's a frankly partisan choice, with equivocal or contrary indications suppressed.

The area most open to abuse is the selection of "characteristic" or "typical" examples, where the goal is to illustrate the phenomena under discussion, but the result may be to leave readers with a very misleading idea of what things are like.

Disciplinary norms are supposed to prevent most of these problems, and referees and editors are supposed to catch the rest. But I can tell you that disciplinary norms are spotty in the fields that I'm familiar with, and even the most active referee can't easily penetrate the veil that separates scientific raw materials from the summary presentations in papers submitted for publication.

These are among the reasons that it's crucial for results to be replicated, especially by people who are not committed to any particular outcome. In effect, scientists who are replicating -- or failing to replicate -- someone else's results are going back to Nature's texts for a independent reading. (That's Mother Nature, not the Nature Publishing Group...). However, replications are slow, and in some cases they are dauntingly (or even prohibitively) expensive. And when the phenomena under investigation are diverse, as linguistic behavior certainly is, a failure to replicate can be just as misleading as an initial result.

So in general, research would progress a lot faster, and with fewer false starts and blind alleys, if scientists in most fields normally published their raw data, as well as a record of the crucial stages in cooking the final presentation. Once this was completely impractical, but cheap mass storage now makes it relatively easy. And in the fields where it has become normal for people to work with shared raw (and curated) data, the effect has been more cost-effective research and faster progress.

There are many reasons that people in less enlightened subdisciplines give for not wanting to do this. There's the extra work of documenting and organizing their raw and partly-cooked materials so as to make them coherently accessible to others. There are problems about data formats. There can be a problem of confidentiality of human subjects. There's yadda yadda yadda. These arguments have some force, but (in my opinion) most of them are suspiciously self-serving.

The continuing development of networked computing makes it inevitable, in my opinion, that scientific practice will change in the direction of fuller publication of experimental data. It'll be a slow process, especially in academic science, since modern academics are among the most conservative cultures in history. But eventually, the science of the future will be as empirically responsible as the humanism of the past.

Posted by Mark Liberman at 07:09 AM

Forwarding Implies Endorsement?

Controversy is growing over the firing by the Texas Education Agency of its Director of Science Curriculum, Chris Comer. Her crime? Forwarding to a local mailing list an announcement of an upcoming talk by a critic of the inclusion of "Intelligent Design" in the science curriculum. Official documents obtained by the press contain the following explanation:

Ms. Comer's e-mail implies endorsement of the speaker and implies that TEA endorses the speaker's position on a subject on which the agency must remain neutral.

Most of the criticism is directed at the idea that a state's education department should remain neutral on the content of the science curriculum, and I agree, but there is also a rather striking linguistic point here which no one seems to have picked up on. Since when does forwarding the announcement of a talk imply endorsement of its content? This is simply nonsense. Forwarding email is approximately like quotation; the only inference that can reliably be drawn is that the forwarder thinks that the recipient may be interested in the information. That may be because the forwarder endorses the content, but it may also be because she opposes it. For that matter she may have no opinion on the subject and have decided to forward the announcement to people she believes do. Indeed, her belief that recipients will find the announcement of interest might not even be based on the content of the talk: her purpose could be to draw attention to the venue or the date or amusing title.

Posted by Bill Poser at 02:56 AM

Modal semantics in National Intelligence Estimate

The page I've copied below the cut (page 5 from the summary National Intelligence Estimate, Iran: Nuclear Intentions and Capabilities , as reproduced in NYTimes Dec 3, 2007) struck me as remarkable in showing how important modal semantics is in the real world -- the authors of the summary used a full page out of their 9 total to give an exegesis on their use of modal adverbials and modal verbs and on the distinction between circumstantial modals ("assessed likelihood") and epistemic modals ("level of confidence we ascribe to the judgment"). Given the historical-political context, it's clear that these modals really matter. The authors are not linguists, but it seems to me that they've done a very good job of explaining the language and the concepts behind it in clear non-technical terms - I doubt that a linguist could have done better.

What We Mean When We Say: An Explanation of Estimative Language

We use phrases such as we judge, we assess, and we estimate -- and probabilistic terms such as probably and likely -- to convey analytical assessments and judgments. Such statements are not facts, proof, or knowledge. These assessments and judgments generally are based on collected information, which often is incomplete or fragmentary. Some assessments are built on previous judgments. In all cases, assessments and judgments are not intended to imply that we have "proof" that shows something to be a fact or that definitively links two items or issues.

In addition to conveying judgments rather than certainty, our estimative language also often conveys 1) our assessed likelihood or probability of an event; and 2) the level of confidence we ascribe to the judgment.

Estimates of Likelihood. Because analytical judgments are not certain, we use probabilistic language to reflect the Community's estimates of the likelihood of developments or events. Terms such as probably, likely, very likely, or almost certainly indicate a greater than even chance. The terms unlikely and remote indicate a less then even chance that an event will occur; they do not imply that an event will not occur. Terms such as might or may reflect situations in which we are unable to assess the likelihood, generally because relevant information is unavailable, sketchy, or fragmented. Terms such as we cannot dismiss, we cannot rule out, or we cannot discount reflect an unlikely, improbable, or remote event whose consequences are such that it warrants mentioning. The chart provides a rough idea of the relationship of some of these terms to each other.

Remote < Very unlikely < Unlikely < Even chance < Probably (Likely) < Very likely < Almost certainly

Confidence in Assessments. Our assessments and estimates are supported by information that varies in scope, quality and sourcing. Consequently, we ascribe high, moderate, or low levels of confidence to our assessments, as follows:

High confidence generally indicates that our judgments are based on high-quality information, and/or that the nature of the issue makes it possible to render a solid judgment. A "high confidence" judgment is not a fact or a certainty, however, and such judgments still carry a risk of being wrong.

Moderate confidence generally means that the information is credibly sourced and plausible but not of sufficient quality or corroborated sufficiently to warrant a higher level of confidence.

Low confidence generally means that the information's credibility and/or plausibility is questionable, or that the information is too fragmented or poorly corroborated to make solid analytic inferences, or that we have significant concerns or problems with the sources.

Update from Kai von Fintel:

This kind of exercise in determining a graded list of expressions of probability/confidence is not uncommon among professionals who need to give such assessments. There is an interesting article in the February 1990 issue of the journal "Statistical Science" with peer commentary by Herb Clark among others. (The link works only if you have access to JSTOR).

Other articles in this vein here, here, and here.

I'm sure there's plenty more where these came from. These are just what I came across a while back (I don't remember how I got there -- probably had to do with Herb Clark somehow).

Oh, and while we're at it, here's a legal take on the meaning of 'must'.

Update from Roger Shuy:

I encountered the somewhat similar attempt to quantify uncertainty a while back. In 1990 Mosteller and Yountz wrote an article in Statistical Science, offering their word definitions about such things. If you're interested, you can find it linked in my LL post here .

Posted by Barbara Partee at 02:16 AM

December 03, 2007

Sally in Natural History

Sally is too modest to tell you herself, so I'll do it for her. The latest issue of Natural History has an article by Sally entitled "At a Loss for Words", about language endangerment generally but with a focus on Montana Salish. You can read it on their website. It includes some amazing Salish words, a map of the distribution of the Salishan languages, a picture of one of the elders, the late John Peter Paul, and a picture of Sally.

Posted by Bill Poser at 02:55 AM

December 02, 2007

Suggestive blending with Satchel and Bucky

This past week in the comic strip "Get Fuzzy," Satchel Pooch and Bucky Katt explored the pleasures and perils of neologization through blending, and they managed to get banned by the Chicago Tribune (and perhaps other newspapers) for their efforts.

First, the strips:

("Get Fuzzy," Nov. 28 - Dec. 1, 2007)

This is a fine demonstration of the output constraints on lexical blending, wherein the creation of new blends that coincide with pre-existing homonyms (especially taboo items) is generally avoided. (Compare similar constraints on syntactic blends analyzed by Elizabeth Coppock.) But this is really all just an elaborate setup for Satchel's aborted blend of shiny + itty-bitty, and Bucky's blend of dog + sick. In the latter case, Bucky's owner Rob interrupts him before he can finish saying sick, making the joke even more oblique. Still, it wasn't oblique enough for the editors of the Chicago Tribune. According to a participant in the rec.arts.comics.strips newsgroup, the Tribune substituted the Nov. 30 and Dec. 1 strips with reruns, declaring that they were "not up to our standards of taste."

The creator of "Get Fuzzy," Darby Conley, has become a master of obliqueness, since even indirect references to obscenity, drugs, or other "adult content" can get the strip banned from newspapers around the country. Most recently, "Get Fuzzy" was censored in September by the Tribune for a strip featuring a double entendre on the phrase "nut crunch." And last January the Washington Post and other papers pulled a series of strips in which Bucky came up with campaign slogans inadvertently referring to marijuana use. Conley obviously enjoys using wordplay to test the boundaries of what is considered acceptable in "family" newspapers, so the blending gambit looks like his latest attempt to toy with the sensibilities of local editors. In Chicago, at least, suggestive blends have been deemed off-limits, a decision I would consider rather Satchel + Bucky.

[Update: Laura Kalin and Tim McKenzie are both quick to point out that the firetruck in the Nov. 30 strip could very well be another veiled joke, alluding to the schoolyard classic, "What word starts with F and ends with U-C-K?" McKenzie adds:

Incidentally, last Wednesday's linguistics column in Wellington's Dominion Post newspaper here in New Zealand (which is written by real linguists) was called "The psychology of spoonerisms", and discussed the fact that, in an experimental situation, at least, people are less likely to accidentally produce spoonerisms from pairs like "hit shed" than from pairs like "hot shirt".

Here's a link to the Dominion Post article.]

Posted by Benjamin Zimmer at 03:33 PM

Death claims singular them

David Morgan-Mar's Irregular Webcomic! takes on "singular they" as the character Death rages at his incompetent minions, who keep doing things that cause the dead to be sent back to life:

(Hat tip to Bruce Webster.)

We've been posting about singular they since the earliest days of Language Log.  Here's a summary statement by Mark Liberman from last year:

The argument was settled long ago: singular they has routinely been used throughout the history of English, by all the best writers, until certain subcases were artificially turned into "errors" by self-appointed experts. Successively less discriminating pseudo-authorities then generalized the proscription in successively sillier ways, although they have largely been ignored by the users of the language.

Endless numbers of commentators have noted the usefulness of they (and its forms them, their, and themselves) in situations where the sex of a singular referent is not determinable, known, or relevant, as in Death's use of them to refer to the next person who dies.  Morgan-Mar addresses the issue directly:

The only thing wrong with using "they" as a singular third person pronoun is that some people consider it to be poor grammar. Compared to all the other issues with the alternatives, why is there even still a question about this?

The good thing is that common English usage seems to be heading in the direction towards full acceptance of "they" as a singular neutral pronoun. Lots of people use it this way already. More will do so over the next few decades. Everyone understands it. The trend is already here. Eventually the current generation of grammar prescriptivists will die out and we'll finally have the solution we can all live with.

It's not often that comic strips come with expositions on questions of usage.

Morgan-Mar asks why there is still an issue with singular they.  There are two standard objections to it, neither of them cogent:

"Logic": they with a singular antecedent is grammatically incorrect, because it's "illogical" for a plural pronoun to have a singular antecedent.

"Political correctness": singular they is being used only to avoid giving offense to women (as a replacement for the "grammatically correct" pronoun he); it represents an excess of feminist political correctness.

These objections are combined in a rant by Anatoly Liberman in the spring 2006 issue of Verbatim.  On p. 27:

We want to speak in fully inoffensive gender-neutral sentences, but neutering English is hard.  In the old days, no one objected to instructions like: "Every applicant should indicate his preference by checking one of the boxes."

On the next page Anatoly Liberman notes that

The plural is an ideal device for neutering, since English does not distinguish genders in the plural: "Applicants should indicate their preference by checking one of the boxes ..."

... The trouble starts when English is murdered in cold blood for the sake of a lofty idea.  "When a student comes to see me, I always answer their question," a proud counselor says.  This horror has been sanctioned by teachers, some editors, and by just about everyone who is responsible for the norms of modern American English.

PC and the "correct" he first.  If Anatoly Liberman had looked at the literature on gender-neutral pronouns (instead of just retailing his own beliefs, impressions, and prejudices), he would have discovered that (as Mark Liberman noted) singular they has a history dating back many centuries before feminism; that the choice of he as the "correct" pronoun was prescribed by grammarians only in the 18th century (and has been an uncomfortable choice for a great many speakers and writers); and that people looking for ungendered schemes of reference seized on an existing variant that fit the bill nicely, rather than inventing a new variant for their purposes.  MWDEU has a nice, reasonably compact account of this history in its entry on they, their, them, and it's entirely at variance with Anatoly Liberman's imagined history.

[Though Anatoly Liberman is a distinguished etymological scholar, he behaves like any uninformed but opinionated person when he wanders out of his specialty.  So instead of a linguist's scholarly reflections on the use of the plural, we get the sort of angry unloading of peeves about language that you can find on blogs by non-linguists all over the net.]

Now, the issue of grammaticality and "logic".  This is a bit more subtle.  There are two places where the reasoning goes off the rails. 

First, there's an unexamined identification of grammatical categories with meaning (something I've complained about in various contexts here on Language Log).  Singular and plural are just labels for grammatical categories, which could in principle be called, say, #1 and #2Singular and plural are not at all bad labels, since expressions in the former category mostly have the semantics of singularity or individuality, while expressions in the latter category mostly have the semantics of multiplicity or numerosity.  But the fit between categories and meaning is almost never perfect; a particular category can easily have conventionalized uses with non-default semantics.  There's nothing "illogical" about a grammatical system in which plural pronouns can sometimes refer to individuals.

Second, there are supressed premises here about the nature of "agreement": (a) that all types of "agreement" work according to the same principles; and (b) that these principles require identical GRAMMATICAL CATEGORIES for pairs of expressions.  The first premise involves a kind of word magic: because the same label has been used for various phenomena, the phenomena are the same.  The second premise is a hypothesis about the way these phenomena work in English (and possibly other languages).  Neither premise is justified.

English has (at least) four types of "agreement" involving number:  NP-internal agreement, as in this dog vs. these dogs; subject-verb agreement, as in My dog bites vs. My dogs bite; subject-predicative agreement, as in Sandy could be a spy vs. Kim and Sandy could be spies; and anaphor-antecedent agreement, as in Mary thinks she is brilliant vs. Mary and Norma think they are brilliant.  There is plenty of literature about these different sorts of concordance between expressions, and about subtypes of each.  Nothing guarantees that the same principles will be at work in all these contexts; the extent to which they share some characteristics is an empirical question, to be answered by examining actual practice.  The short answer to this question is that somewhat different principles apply in different contexts.

Finally, these principles do not always turn on grammatical categories alone, but sometimes refer to semantics.  Again, there is plenty of literature here, having, for example, to do with circumstances in which "grammatical" agreement and "notional" agreement are in conflict; the conflicts are resolved in different ways in different cases.  For "singular they", generic uses of they are, by convention, linkable to singular antecedents, so that such uses of they are notionally singular but grammatically plural.  As a result, generic uses of they as a subject require plural verbs (Each child in the class will think they are the best in the class), though their singular antecedents require singular verbs (Each child in the class thinks they will the only one to succeed).  In any case, the grammar of the language can't be deduced from an appeal to "logic", but must be discovered by examining practice.

A further complication -- here as everywhere -- is that different people have somewhat different systems.  There's a lot of variation, even within speakers of standard English, so that we can't actually talk about "the grammar of the language" in the abstract, but must note who uses which system on which occasions and for which purposes.  That's true for everything in language, but it's especially worth noting in a domain where there's been so much contention about how "the language" works.

Posted by Arnold Zwicky at 03:14 PM

The effects of global warming on vocabulary

It's been a while since we had a "Snowclone blindness" story. So to renew the appeal of this perennial favorite, we bring you Elizabeth Day, "My voyage north into a land of light", The Observer, 12/2/2007:

The Sami people are fascinating to talk to - like the Inuit, they have many words for 'snow', though with the damage wrought by global warming, they have lost quite a few of them.

But perhaps they now have a wider range of words for "slush"? A Finnish friend once told me, in a discussion of cross-country ski waxing, that Finnish snow is much more uniform than snow in the American northeast, since temperatures that swing above and below freezing have complicated effects on frozen precipitation and its residues.

It seems likely that Ms. Day's observation is a pure expression of a journalistic cliché, without any direct connection to facts of any kind, whether lexicographic or climatological. But if you have any actual information about the Sami snow vocabulary and its recent evolution, please let me know.

Rob Balder at PartiallyClips has a topical strip:

[Hat tip: Jeremy Hawker]

[Update -- Ray Girvan writes:

I don't know anything about Sami snow vocabulary, but Nils Jernsletten and Pekka Sammallahti seem to be the names to check out.

As the refs I can find are non-English and/or behind paywalls, I can't tell what their stance is, but I've seen them quoted in contexts supporting snowclone-style factbites about Sami vocabulary (whether concerning snow, landforms or reindeer).

I'm always willing to be convinced. But whatever the recent warming of their environment, it seems improbable that the Sami have already "lost quite a few" of their words for snow. ]

Posted by Mark Liberman at 06:48 AM

December 01, 2007

American Indian hyphens

A few days ago, Jeffrey Kallberg sent a note asking about the practice of writing American Indian words -- especially proper names -- with multiple internal hyphens.

I write with a query having to do with a current project of mine on Chopin's Berceuse, as it might relate to the composer's broader world view in the 1840s.

The essay deals in part with Chopin's interest in a group of Ioway (Baxoje) Indians "exhibited" in Paris by George Catlin; Chopin may have visited two of the Ioway himself, in the company of George Sand, who was particularly taken with the native Americans (she wrote a long essay about them). And he certainly read a number of the articles about them that appeared in the Parisian press.

In a letter to his family, echoing the practice found in all the press accounts I've seen, he gives the names of the husband and wife as "Shin-ta-yi-ga" and "Oké-wi-me." There are variant spellings of the individual syllables of these names, but the names themselves always appear in this hyphenated form.

My question has to do with the history of this hyphenating practice. I assume that it derives from the practices of the various Western missionaries and explorers who first documented their encounters with the native Americans. And I presume equally that the hyphens have little do with Baxoje self-representation. Has anyone written about why hyphens came to be used so prominently in writing the names of native Americans? It obviously draws attention to the "unusual" (to Western eyes) syllables - but you don't see this (I think) when people write about "unusual" eastern European names (to take one area I know pretty well).

Prof. Kallberg relays a hypothesis:

A colleague who works on native American (and Canadian) music opined that European notions of "primitive" culture might lead them to think that the language they encountered had only monosyllabic words. (She noted that she didn't know of any Native American languages with short words since most are polysynthetic -- making whole sentences by adding syllables into the middle of words). Hence the hyphens may have been a means of marking what they thought were word divisions.

Gene Buckley commented:

I've always thought this was a holdover from the 19th-century style of phonetic transcription, such as that in John Wesley Powell's "Introduction to the Study of Indian Languages" from 1880. I have copies of a "schedule" from that book -- lists of vocabulary to elicit -- in Alsea (coastal Oregon), and hyphens separate nearly all syllables. I don't have the instructions for the schedule in front of me but I think this practice is specifically advocated. I do have the printed "Indian linguistic families of America north of Mexico" by Powell and he uses hyphens when he gives the native pronunciations of various language names.

Other sources also did this frequently, which is helpful when the phonetic alphabet was fairly primitive and sometimes vulnerable to regular orthographic interpretation. Early transcribers probably sounded things out syllable by syllable anyway. This was the way that many of the languages were first written down, so the hyphenated style at least has tradition behind it, and doesn't require the same linguistic training to interpret as the 20th century transcriptions do.

But there's more to it, because earlier sources don't use this style. Thus Benjamin Smith Barton's "New views of the origin of the tribes and nations of America", 1797, contains long vocabulary lists compiled from multiple sources, in which hyphens are quite rare. Here's p. 2 of his word lists, for example:

The same is true for the word lists in Alexander MacKenzie's 1793 journal, described in Bill Poser's post "Words from the West" (9/27/2004).

After a very short search, the earliest hyphenated examples that I've been able to find are in John Tanner's 1830 work A Narrative of the Captivity and Adventures of John Tanner (U.S. Interpreter at the Saut de Ste. Marie) During Thirty Years Residence Among the Indians in the Interior of North America (1830). Thus the subheading to Chapter 1 reads:

Recollections of early life -- capture -- journey from the mouth of the Miami to Sa-gui-na -- ceremonies of adoption into the family of my foster parents -- harsh treatment -- transferred by purchase to the family of Net-no-kwa -- removal to Lake Michigan.

Describing his abduction by a Shawnee band at the age of nine, he writes:

The Indians who seized me were an old man and a young one; these were, as I learned subsequently, Manito-o-gheezhik, and his son Kish-kau-ko.

So apparently the hyphenation practice arose at some time between Barton's book in 1797 and Tanner's book in 1830.

As in the original examples "Shin-ta-yi-ga" and "Oké-wi-me", Tanner's hyphenation is usually syllable-by-syllable but sometimes not. (Another example from Tanner with a polysyllabic part: the name Taw-ga-we-ninne. ) This seems most likely to be meant to represent morphological division; but perhaps it sometimes reflects some perceived difference in prosody.

Some words are left entirely unhyphenated. In some cases, this might be because they are treated as having been borrowed into English. Thus:

The kindness of this family of Tus-kwaw-go-mees continued as long as we remained near them. Their language is like that of the Ojibbeways, differing from it only as the Cree differs from that of the Musk-ke-goes.

But there are other unhyphenated examples that do not seem plausibly to be borrowings:

Lakes of the largest class are called by the Ottawwaws, Kitchegawme; of these they reckon five; one which they commonly call Ojibbeway Kitchegawme, Lake Superior, two Ottawwaw Kitchegawme, Huron and Michigan, and Erie and Ontario. Lake Winnipeg, and the countless lakes in the north-west, they call Sahkiegunnun.

Hyphenation is sometimes applied to common nouns as well as proper nouns:

...The sugar trees, called by the Indians she-she-ge-ma-winzh, are of the same kind as are commonly found in the bottom lands on the Upper Mississippi, and are called by the whites "river maple".

Whatever the explanation for Tanner's hyphenation choices, it does not seem likely to have been a difficulty in determining pronunciation, or a misapprehension about the nature of the morphemes involved, as he lived among Indians as one of them for more than 30 years, and then worked as an interpreter.

If you know anything more about the origins and spread of this orthographic practice, please tell me.

[Update -- David Eddyshaw writes:

Might this be connected with the practice one sees in old-fashioned King James Versions of the Bible, of similarly splitting up Hebrew (and even Greek) names, eg Sol-o-mon, Je-sus, Neb-u-chad-nez-zar, The-oph-i-lus?

I'm fairly sure this is goes back to the nineteenth century, but I can't quote chapter and verse to prove it (uurgh).

The original KJV was not hyphenated this way. I'm not sure which later editions might have followed such a practice. ]

[A bit of Google Scholar search turns up Lonnie Underhill's "Indian Name Translation" (American Speech 43(2) 114-126 1968). Underhill cites a 1902 directive from the Commissioner of Indian affairs including 10 rules about the use of Indian names, of which number 6 is:

Spell the names, whether Indian or translated, as one word, and do not use hyphens, as Onehatchet or Miahvis.

The same paper contains a number of interesting anecdotes about hyphenated phrasal names, e.g.

One case arose on the Pawnee Reservation, Oklahoma, where an indian was named Coo-rux-rah-ruk-koo. Commonly he was known as Afraid-of-a-bear. A literal translation of his Indian name was "fearing a bear that is wild." From this translation the agent recorded him as Fearing B. Wilde.

However, Underhill doesn't tell us where or when the practice of hyphenation arose.

The hyphenation of phrasal names in translation can be seen in many late nineteenth and early twentieth century works, for instance Rudyard Kipling's "How the first letter was written":

ONCE upon a most early time was a Neolithic man. He was not a Jute or an Angle, or even a Dravidian, which he might well have been, Best Beloved, but never mind why. He was a Primitive, and he lived cavily in a Cave, and he wore very few clothes, and he couldn't read and he couldn't write and he didn't want to, and except when he was hungry he was quite happy. His name was Tegumai Bopsulai, and that means, 'Man-who-does-not-put-his-foot- forward-in-a-hurry'; but we, O Best Beloved, will call him Tegumai, for short. And his wife's name was Teshumai Tewindrow, and that means, 'Lady-who-asks-a-very-many-questions'; but we, O Best Beloved, will call her Teshumai, for short. And his little girl-daughter's name was Taffimai Metallumai, and that means, 'Small-person-without-any-manners-who-ought-to-be-spanked'; but I'm going to call her Taffy. And she was Tegumai Bopsulai's Best Beloved and her own Mummy's Best Beloved, and she was not spanked half as much as was good for her; and they were all three very happy. As soon as Taffy could run about she went everywhere with her Daddy Tegumai, and sometimes they would not come home to the Cave till they were hungry, and then Teshumai Tewindrow would say, 'Where in the world have you two been to, to get so shocking dirty? Really, my Tegumai, you're no better than my Taffy.


[A bit more searching has antedated the use of hyphens in American Indian names by a bit, to Edwin James, "Account of An Expedition From Pittsburgh to the Rocky Mountains, Performed in the Years 1819, 1820 ... ", published in 1823. James occasionally uses hyphens, mostly (as far as I have seen) in contexts where he is apparently intending to give morphological divisions. At least, he seems to use hyphenated forms whenever he also gives a translation, and not otherwise. Thus from Chapter X ("Account of the Omawhaws. -- Their manners and customs, and religious rites. ..."):

When the guests are all arranged, the pipe is lighted, and the indispensable ceremony of smoking succeeds.

The principal chief, Ongpatonga, then rises, and extending his expanded hand towards each in succession, (see Language of Signs, no. 43. App. B.) gives thanks to them individually by name, for the honour of their company, and requests their patient attention to what he is about to say. He then proceeds somewhat in the following manner. "Friends and relatives: we are asembled here for the purpose of consulting respecting the proper course to pursue in our next hunting excursion, wor whether the quantity of provisions at present on hand, will justify a determination to remain here to weed our maize. If it be decided to depart immediately, the subject to be then taken into view will be the direction extent, and object of our route; whether it would be proper to ascend Running-Water creek, (Ne-bra-ra, or Spreading water), or the Platte, (Ne-bres-kuh, or Flat water), or hunt the bison between the sources of those two streams; [...]

When Ongpatonga is first introduced, in an earlier chapter, he too is hyphenated as well as translated (p. 155):

On the 14th of October, four hundred Omawhaw Indians assembled at Camp Missouri. Major O'Fallon addressed them in an appropriate speech, stating the reasons for their being called to council; upon which Ong-pa-ton-ga, the Big Elk, arose, and after shaking by the hand each of the whites resent, placed his robe of otter skins, and his mockasins under the feet of the agent, whom he addressed to the following effect, as his language was interpreted by Mr. Dougherty.


[Barbara Zimmer writes to remind us of the connection to Hiawatha, which was was written a bit later than Chopin's letter, but was based on sources published slightly earlier:

Obviously the romance of the Native Americans and their various languages influenced Longfellow who started his poem Hiawatha in 1854; it was completed and published in 1855.

Longfellow obtained much of his knowledge from legends and stories collected by Henry Rowe Schoolcraft who was superintendent for Indian Affairs for the state of Michigan from 1836 to 1841. From an introduction to Hiawatha available online:

Schoolcraft married Jane, O-bah-bahm-wawa-ge-zhe-go-qua (The Woman of the Sound Which the Stars Make Rushing Through the Sky) Johnston. Jane was a daughter of John Johnston, an early Irish fur trader, and O-shau-gus-coday-way-qua (The Woman of the Green Prairie), who was a daughter of Waub-o-jeeg (The White Fisher), who was Chief of the Ojibway tribe at La Pointe, Wisconsin.

Jane and her mother are credited with having researched, authenticated, and compiled much of the material Schoolcraft included in his Algic Researches (1839) and a revision published in 1856 as The Myth of Hiawatha. It was this latter revision that Longfellow used as the basis for The Song of Hiawatha.

From another online article about Longfellow's poem "Hiawatha" we can see a chart of many of the words that Longfellow used, and can learn that

Though Hiawatha is an Iroquois hero, Longfellow's poem is set in Minnesota, and most of the Native American words he uses in it come from the Minnesota Indian languages Ojibwe (Chippewa) and Dakota Sioux. The story Longfellow relates, too, is primarily based not on the Iroquois legend of Hiawatha but rather on the Chippewa legend of Nanabozho, a rabbit spirit who was the son of the west wind and raised by his grandmother.

Like James' Account of an Expedition..., is available for full download from Google Books. ]

[James Crippen writes:

The "dread hyphen disease", as I call it, affected even the Russians in Russian America (later Alaska). The American Russian Orthodox Church has been digitally retypesetting documents in several Alaska Native languages from the 19th century, including the language I study, Tlingit. The document below was purportedly published in 1901 but was probably composed much earlier given the low sophistication of the Tlingit transcription. Much better ones were published in 1846, so this is probably earlier than that.

For the 1846 document, see:

I'm still working out how the orthography in each of the Cyrillic Tlingit documents was structured.


Posted by Mark Liberman at 07:13 AM

Innovation in education

A news flash from The Onion: "Underfunded Schools Forced To Cut Past Tense From Language Programs":

Faced with ongoing budget crises, underfunded schools nationwide are increasingly left with no option but to cut the past tense—a grammatical construction traditionally used to relate all actions, and states that have transpired at an earlier point in time—from their standard English and language arts programs.

A part of American school curricula for more than 200 years, the past tense was deemed by school administrators to be too expensive to keep in primary and secondary education.

There's a nice comment by Senator Orrin Hatch:

Despite concerns that cutting the past-tense will prevent graduates from communicating effectively in the workplace, the home, the grocery store, church, and various other public spaces, a number of lawmakers, such as Utah Sen. Orrin Hatch, have welcomed the cuts as proof that the American school system is taking a more forward-thinking approach to education.

"Our tax dollars should be spent preparing our children for the future, not for what has already happened," Hatch said at a recent press conference. "It's about time we stopped wasting everyone's time with who 'did' what or 'went' where. The past tense is, by definition, outdated."

Said Hatch, "I can't even remember the last time I had to use it."

This led me to wonder how often, in fact, various politicians (or their speechwriters) use various tenses, aspects and moods.

Left out of the piece: the interview with Geoff Pullum, who asked why they didn't start by eliminating instruction about the "passive" tense.

[Update -- James Sinclair writes:

The recent article reminded me of one of my all-time favorites from The Onion: Rules Grammar Change. I don't have the time, energy, or talent to check the entire article for syntactic consistency, but I was impressed nonetheless. A few quotes:

U.S. Grammar Guild according to, the new structure loosely on an obscure 800-year-old, pre-medieval Anglo-Saxon syntax is based. The syntax primarily verbs, verb clauses and adjectives at the end of sentences placing involves. Results this often, to ears American, a sentence backward appearing.

The enthusiasm of government officials despite, many Americans about the new plan upset are. "Why in the world did they do this?" a New Canaan, CT, insurance salesman, said Brent Pryce. "There's absolutely no reason. It's utterly pointless and will cause total chaos throughout the country, not to mention the fact that it will cost billions of dollars to implement. And what's this U.S. Grammar Guild, anyway? I've never heard of it."

When of this complaint informed, government officials that they could not the man's words understand said, because of the strange, unintelligible way of speaking he was.


Posted by Mark Liberman at 07:11 AM