January 31, 2007

The Queen's English

Over at the blog of the Norman Lear Center, Leo Braudy makes a nice observation about the dialogue in The Queen, where Queen Elizabeth uses fulsome to describe praise that is merely abundant, rather than being oleaginous or smarmy. True, that usage is common enough among educated speakers, but was it a deliberate choice to have the Queen use the word that way, and if so, what might it signify?

As Braudy puts it:

. . . towards the end [of the movie] I was knocked right out of the suspension of disbelief by an odd ambiguity in the writing. Unfortunately enough it comes out of the mouth of the Queen herself. As I remember, it occurs when the Tony Blair character comes in to see Elizabeth II after she has finally made her speech about the death of Diana. He praises her for the speech and she replies (a paraphrase): "Some of your associates were not so fulsome." There it is: the hypercorrecting belief that "fulsome" is a fancy way of saying "copious" or "abundant" when its primary meaning is actually "gross, offensive." Certainly this is a slippage of meaning that is in pretty common usage, but in the mouth of the Queen? Is it the mistake of Peter Morgan the scriptwriter? Is it a sly bit of characterization in which the lapse in a precise use of the English language parallels the other problems of the royal family? Or is the Queen subtly attacking Blair's previous speech in a way he is not able to appreciate because of his own faulty background?

My guess is that the line in the movie was almost certainly inadvertent. For one thing, the use of fulsome to mean "gross" or "smarmy" is relatively rare nowadays, even in the reputable media. Merriam Webster's Dictionary of English Usage traces the non-pejorative uses of the word to apply to praise back to the 1940's, citing examples from the New York Times and Washington Post, among others. And a look at the 50 most recent non-duplicate occurrences of the word in Nexis major papers turns up only four that clearly carry a pejorative meaning:

While caterers busied themselves filling bowls with cheese doodles and onion dip for the numerous receptions in the hallways outside the massive chamber doors, a string of speakers doodled its way through a series of fulsome platitudes about cooperation and civility. Steve Wiegand, Sacramento Bee, December 7, 2006

Most of the world knows the fulsome, pompadour-bearing developer Donald Trump as the king of New York property. South China Morning Post, Jan. 7, 2007

[E. B.] White disliked many things, but particularly that which degraded human intelligence. So he was no fan of advertising, overly fulsome service at hotels, gathering in groups, New York City taxi cabs, public speaking, noise, printed gossip and especially improper grammar. John Freeman, Denver Post, Dec. 24, 2006.

Is it really necessary to introduce each dance with fulsome prose, not to speak of endless encomiums to the company's own history and board, however generous? John Rockwell, New York Times, Dec. 1, 2006

Of the remainder, the majority use the word to describe praise that is enthusiastic or abundant:

Critical reaction to the ensemble's superb recording of Handel's Messiah. . . has been unanimous and fulsome. Kenneth Watson, The Scotsman, Jan. 24, 2007.

Ashton had not intended to take the plunge by starting Wilkinson, even though he was fulsome about the fly-half's performance in training last week which was described as "staggering''. Mick Cleary, Daily Telegraph, Jan. 30, 2007

Others use the word in more general ways, sometimes to mean simply "full" or "copious" (actually the earliest use of the word recorded in the OED, but marked there as obsolete since the 16th century), and on at least one occasion, to describe a full-cut shirt:

The Vancouver Sun, clearly concerned about reader revulsion, has taken the unusual step of providing a stripped-down account of the trial on Page 2 daily, an alternative to its more fulsome reportage elsewhere in the paper. Toronto Star, Jan. 27, 2007

Some of the new Reits have enjoyed fulsome rises in their share prices as investors size up the potential for higher dividend pay-outs. Jim Pickard, Financial Times, Jan. 3, 2007

You'll need a robust and fulsome shirt with a David Davies-style collar - ie, tough, resilient and not likely to get all sulky if you suddenly decide to wear an old school tie. Dylan Jones, Daily Telegraph Jan. 5, 2007

So it wouldn't be at all surprising if Peter Morgan was unaware that there was any problem in using fulsome in a non-pejorative way, particularly since he doesn't have a particularly "literary" background (his degree was in fine arts, not English). Nor for that matter is there any reason to suppose that Elizabeth would have qualms about using it that way herself. People may call it the Queen's English, but in language, as in politics, the monarch reigns but doesn't rule.

But in that case, is there any justification for continuing to maintain that the non-pejorative uses of the word are erroneous, or that the change in meaning should be described as a "slippage" rather than simply as a shift? When the tide of educated usage is running 90 percent in favor of the non-pejorative meaning, isn't it time to throw in the towel? Or might it be more prudent simply to throw the word over the side in all its uses?

Yet even some assiduously descriptivist authorities have their reservations about the new use of fulsome. WDEU urges caution in using the word non-pejoratively, given the likelihood of unfortunate misinterpretation: "If you are tempted to use fulsome, remember that it is quite likely to be misunderstood by both the innocent reader and the gimlet-eyed purist unless your context makes the intended meaning abundantly clear." And in the other camp, William Safire suggests that the word is best avoided in both its meanings:

This commentator says: If you mean full, say full, or if you want to put your thumb on the upscale, copious. But if you mean gross, say gross or yucky, or try an expletive like feh!.. . . Indeed, cross fulsome off your list entirely. Phooey on ambiguity.

Of course fulsome isn't the only word that raises this sort of problem. I do a double-take when I see a reference on ESPN to "the enormity of Joe Dimaggio's impact on the game" -- that's something I wouldn't even say about George Steinbrenner. And when you use enormity in its "monstrousness" sense, you run the risk that a very large portion of your readers will miss the point. (In Google News stories, the word is used between 80 and 90 percent of the time to refer simply to great size.) When a story says, say, "the enormity of America's Twin Tower attacks is too fresh in the US psyche to explore," it's hard to know for certain what the writer meant to convey -- either meaning would make sense in the context. Should we say phooey on ambiguity with this one, too, substituting hugeness or monstrousness as circumstances require?

There are really two questions that writers have to resolve here. The easier one is whether to permit themselves to use these words in their non-pejorative senses. My guess is that most writers who are aware of the older (current) senses of fulsome and enormity tend not to use the words to mean simply "abundant" and "hugeness" respectively. For one thing, the ambiguities here are much more likely to lead to ironic or comical misreadings than the ambiguities created by other malaprops, like using fortuitous to mean "fortunate." And then too, people tend to attach themselves to these scraps of prescriptive lore, particularly when they're a bit recondite. Writers who are familiar with the prescriptive canons are more likely to flout the rules against splitting an infinitive or ending a sentence with a preposition than to deliberately use masterful in place of masterly.

The other question is a bit more difficult: should writers who know the "correct" meaning of a word like fulsome allow themselves to use it in that sense, pace Safire's objections? True, there's a good chance the usage will be misinterpreted by many readers. But the misreadings here are generally less pernicious than when the words are intended in a non-pejorative way. When you describe the mayor's remarks as fulsome with the intention of conveying that they were smarmy or excessive, and a reader takes you to mean simply that the remarks were effusive, the misreading isn't embarrasing or ironic in the way it would be if the intended and inferred meanings were reversed. And similarly when you refer to the enormity of the adminstration's Iraq untertaking and someone assumes you mean simply that it was very big -- that's less embarrassing than if you only mean it was big and somebody takes you to be saying it was monstrous.

Then too, fulsome is clearly a word that belongs to a literate register. Consider the ratios of anxious to fulsome in various publications:

New York Review of Books: 14 to 1

National Review: 15 to 1

Nexis Major Papers: 30 to 1

People: 41 to 1

USA Today: 131 to 1

The fact is that people who use fulsome in the popular media are generally reaching for a fine or literary word. (That isn't true of enormity.) It isn't quite accurate to describe the non-pejorative use of fulsome as a "hypercorrection," as Braudy does, nor is it really a folk etymology (the word is in fact ultimately derived from full, though possibly colored by foul). When it appears in the popular media, it might better be described as a "genteelism," the term H. W. Fowler used in Modern English Usage to describe the replacement of except by save, dentifrice by toothpowder, and anent for about. In those circumstances the critics who insist that the word be used properly are less liable to charges of pedantry than those who decry the "incorrect" use of an everyday item like anxious or nauseous.

My ruling: you're within your rights to toss out the odd fulsome in a sufficiently highbrow context. But while it's fair to presume that your readers know the proper meaning of the word when you're writing for the New York Review (or, needless to say, for Language Log), you have no call to complain when they misinterpret it if you use it in People. Why did you want to go there in the first place?

Posted by Geoff Nunberg at 07:41 PM

Helium thought balloons

Yesterday's Six Chix, by Margaret Shulock.

It was a shock to learn by googling her that she's now the writer for Apartment 3-G. I'll have to start reading it -- I guess while I wasn't looking, it became so lame that it's hip again.

Posted by Mark Liberman at 06:05 AM

Visualizations at "many eyes"

A couple of language-related visualizations from many eyes (click to get the interactive java versions):

Posted by Mark Liberman at 06:01 AM

Today's language knot: the stripped cleft sluice

In Tuesday's Guardian, I came across the following sentence:

1) This time it is no longer what brands say that is changing, or how they say it, but where. [source]

Where? Where what? The intended meaning's clear enough, though, like an Escher sentence (e.g. more people have analyzed it than I have), it may get less clear when you stare at it. Unlike an Escher sentence, if you stare at this one long enough, you can make sense of it again. It's roughly like this:

2) This time it is no longer what [blah blah] or how [blah blah] but where brands say whatever they say that is changing.

What we have here is for language geeks only, a double language knot: a sluice followed by an unusual strip. What? You didn't cover sluicing OR stripping in sentence diagramming class? Jeez.

As you see if you stare into the cyberheavens, Geoff Pullum has kindly suspended me in a glass cube hanging by a 20 foot chain from one of the cannons protruding from the turret of his dreaming spire high above the main offices of Language Log Plaza, and with only 10 minutes supply of oxygen.  I will now attempt, before your very (I love that `very') eyes... to undo the language knot. Geoff tells me this is a standard qualifying exam to get a syntax license at LL plaza. So wish me luck! I think I can do it, if you'll just bear with me for these precious few minutes. (Though all alone up here I have a queasy feeling of deja vu.)

Here we go.

Both sluicing and stripping are types of ellipsis, constructions where stuff seems to have been left out, although we can reconstruct what that stuff would be by looking at the rest of the sentence or discourse.

Let's start with stripping. It's a general process for leaving stuff out, with only a small core remaining. Here's an example I just found (first rule of lingua-blogging... any point made with an example can be made with a scatological example):

3)  At the time the press were stating that lion poo keeps not only deer away but rabbits too. [source]

In this case, rabbits is what I'm calling the core, though you can count but and too in the core as well if you like. And the stuff that's missing, at least in the sense that it could have been there without changing the meaning, is all that lion poo. We can picture the process of reconstructing the missing lion poo as follows (I like to decorate the core in a lovely purple, and use green for bracketed material I've added):

4)  but rabbits too ----> but (lion poo keeps) rabbits (away) too

Now what we have in our original example (1) is slightly different: it involves a cleft sentence, in fact an it-cleft, a sentence of the form it is X that Y. And the example also involves coordination, where several pieces are glommed together using connectives like and, or or but. As far as I can tell, in a sentence with coordination and a cleft, it's common to strip out the last part of the cleft, the that clause. Here are some examples I found:

5) It's not just morning-after pills that are being denied, but also routine non-emergency birth control (that is being denied). [source]

6) It's not so much death that I hate but the thought of leaving people (that I hate). [source]

7) Of course, it is not just marriage that some people want restricted to the purpose of having children, but sex (that some people want restricted to the purpose of having children) too. [source]

OK, we've been denied birth control, hated the thought of leaving people, and stripped for sex. Time to move on to sluicing.

Sluicing involves reconstructing material after a wh-word. Jason Merchant (stage name: Mr. Ellipsis, read The Syntax of Silence to find out more) sent me the following delicious example, to which I've added the sluiced material in blue.

7) [The Smart Toilet] is a paperless device that not only accommodates calls of nature, but also 'knows' who's using it and how (they are using it). [San Jose Mercury News, 6 Aug 1996]

Jason, by the way, is concerned about privacy issues arising from smart toilets, but I'm more worried by the word paperless... not a feature I look forward to in the smallest room in the house. Anyhow, the point is all the missing blue stuff reconstructed after how.  That's sluicing.

So back to my question: where what? We can now rebuild the original example as involving, first, sluicing of the blue stuff, and, second, stripping of the green stuff. And to keep it interesting, the reconstructed sluiced material provides the core for the cleft strip.

1) This time it is no longer what brands say that is changing, or how they say it, but where (they say it) (that is changing).

The funny thing about language knots is that any fool can tie them. The trick is to untie them. So I'm feeling rather pleased with myself. But... uh... how the hell do the newly licensed syntax bloggers get out of the glass cube and back down to the ground? Geoff? I think I passed the exam, Geoff! Can anyone hear me? Geoff, the air's getting rather uh, thin... up... here....

Posted by David Beaver at 02:26 AM

The astrophysical lexicon and liver shrinkage

On tonight's Daily Show, Jon's guest, astrophysicist Neil deGrasse Tyson1 remarked that astrophysical terminology was kept simple because the universe is complicated enough as it is:

What is a black hole? Matter is so dense, and has such a high gravity, that light travelling even at its tremendous speeds cannot escape. So it's dark, it's a black hole, and in astrophysics, we call it like we see it. It's black, it's a hole, it's a black hole. [laughter] We are simple people in astrophysics. (Jon: You are not simple people!) We are! Our lexicon -- spots on the sun? 'Sunspots'! [laughter] No, I'm serious! The universe is complex enough! We don't want to lay down a lexicon to confuse the public, [who] try to follow what we do. The chemists do that! The medical doctors do that! Not in my field. (Jon: You are going to walk out of here tonight and get jumped by a gang of chemists.)

Of course, he's right that medical terminology, and most English scientific terminology in general, is Latin- (and Greek-)based, because Latin was the international language of learning when science was first getting going. (Modern astrophysics perhaps came along late enough in the game not to be bound by this convention.) Even in England, back in the day, English couldn't get no respect, at least as far as scientifically codifying the natural world went. As a consequence, English chemical, medical and biological terminology is generally opaque to nonspecialists, requiring particular effort or special training to understand natively.

This famously causes something of a distancing effect between patients and their problems: understanding the exact nature of our illnesses often involves an extended interview with the diagnostician, asking for precise explanations for what the diagnosis really mean, in lay terms. It's not that English lacks native or common terms for most relevant body parts or conditions (it would be a very odd language indeed that did), but rather that science doesn't use those terms, instead employing parallel Latin-based ones.

However, I've often wondered what the patient-doctor relationship feels like in other languages. In Romance languages like Spanish or Italian, the Latinate terminology presumably seems at least somewhat familiar, being cognate with the everyday terms for the relevant body parts.

An English speaker might be able to get something of a feel for what that must be like by looking at some German disease names, where the terms are often cognate with familiar English terms. Although it does employ plenty of Latinate medical terminology, German seems not to employ as much as English. For example, cardiovascular disease translates as Herz-Kreislauf-Erkrankung, 'heart circulation illness', according to this online dictionary.2 What if, instead of being told you had a fracture of the tibia, you instead heard you had a Schienbeinbruch, a 'shinbone break'? How would you feel if your doctor told you you had Lungenentzündung, 'lung inflammation', rather than pneumonia, or Leberschrumpfung 'liver shrinkage' rather than cirrhosis, or Sprachstörung, 'speech disorder', rather than aphasia? It feels different, doesn't it?

Does this sensation of semi-understanding mean that Germans and Italians have a less boggling experience when talking about their illnesses with their doctors? Perhaps patients are lulled into a false sense of understanding by the familiar terms, and skip the extended interview that would really let them know what's going on? Or perhaps they have a better understanding of what's going on, not needing an extended translation of even the most basic terminology? Or are doctors inscrutable the world over?3

1How's that for an anarthrous NP, eh?

2(Despite my given name) I don't actually speak any German, so I've relied entirely on this dictionary for the particular translations offered here.

3Comments can be made on the crosspost at Heideas, if you like.

Posted by Heidi Harley at 02:22 AM

"Democrat majority": offensive but not ungrammatical

Roger Shuy ("-ic") is not the only one who's been talking about the president's missing morpheme. At the start of Maura Reynolds' article "The 'Democrat majority' is still the talk of the capital" in the Los Angeles Times, 1/30/2007, she asks:

Will President Bush put the "-ic" back in "Democratic"?

That was the hot topic around Washington on Monday after the president was asked why, during his State of the Union address last week, he referred to Congress' new "Democrat majority."

"That was an oversight," Bush said in an interview Monday with National Public Radio. "I'm not trying to needle…. I didn't even know I did it."

Reynolds quotes various experts to establish that this is a deliberately offensive way to talk:

"It's a long-standing intentional partisan political slight," said Daniel Weiss, chief of staff to Rep. George Miller (D-Martinez). "It's kind of like flashing colors in a gang. It's code. It says, 'I'm one of you, I'm a right-wing conservative.' "

And experts on political locution say it's a deliberate, if ungrammatical, linguistic strategy.

"The word 'democratic' has such positive emotional valence … so they politicize it to use it as a term to describe a group of political rivals," said Roderick P. Hart, a professor of communications and government at the University of Texas in Austin.

"Democrat Party" is not common usage in Texas, Hart said, noting that the only people he had heard use it were "sitting Republican legislators." [emphasis added]

"Intentional partisan political slight", check. "Like flashing colors in a gang", check. As Geoff Nunberg explained a couple of years ago ("Making the world safe for 'democracy'", 10/16/2004), this usage was pioneered in national politics by Herbert Hoover in 1932, and "became a Republican tic" in the 1950s. But what's ungrammatical about it?

A few sentences later, Prof. Hart (who is dean of the College of Communication at the University of Texas at Austin) explains:

"It sounds illiterate to me," said the University of Texas' Hart. "It's a noun used to modify a noun, and everyone knows you use an adjective to modify a noun."

As for the "sounds illiterate" part, it's true that forms like Democrat Party have a vernacular vibe. Geoff Nunberg's post cited a poem, published in the New York Times in 1908, that used vernacular forms to accuse William Jennings Bryant of flip-flopping:

Nothin' at all to say, William; nothin' at all to say;
There ain't no Democrat Party, so go on and have your way.
Fix up th' platform to suit you; put in what planks you may choose;
You've been on all sides of everything, so you've got plenty to use.
The New York Times, July 29, 1908

Also, the only other case I can think of where it's offensive to use a noun as a modifier in place of an associated adjective is in forms like "jew doctor" -- and since this particular way of expressing prejudice is out of fashion these days, it's most often used by linguistically unsophisticated people. (Though the situation was very different in the 1920s and 30s.)

But what is this business about nouns not modifying nouns? You don't have to look very far to find nouns that are apparently doing exactly that, including in the names of political parties.

In fact, there are quite a few nouns acting as modifiers right in Reynolds' article, e.g. "right-wing conservative", "election years", "House Speaker Nancy Pelosi", "the U.S. government", and so on.

And if we go to Dean Roderick P. Hart's own web page at the University of Texas, we find plenty of other cases of apparent modification of nouns by nouns. For example, he teaches a course on "Political Language" whose syllabus asks "Is the language of the policy sphere different from the language of the public sphere? How does establishment politics differ from movement politics?" And even though Dean Hart seems to work mostly on American politics, I imagine that he's heard of Britain's Labour Party.

Mere empirical observation aside, you could consult any reasonable work on the grammar of English and learn that nouns are routinely used to modify nouns.

So why did Dean Hart say that "everyone knows you use an adjective to modify a noun"? His individual background and motivations in this case are unknown to me, and also none of my business. But it is part of my business to educate college students in linguistics, including the basic grammar of the English language, and Dean Hart's statement is one more piece of evidence that my profession has been falling down on the job over the past half century.

Karl Hagen, on his blog polysyllabic, attributes Dean Hart's "fairly boneheaded comment" to "the way traditional grammar is usually mis-taught. In that scheme, almost anything that modifies a noun is called an adjective." Perhaps so -- but even the traditional grammar of the 19th century recognized the existence of [noun noun] constructions, under a variety of names such as apposition or juxtaposition. If I pick a mouldy old grammar book off the shelf at random, I find that Samuel Kirkham's English Grammar in Familiar Lectures, published in Baltimore in 1837, explains (p. 70) that

Some consider the adjective, in its present application, exactly equivalent to a noun connected to another noun by means of juxtaposition, of a preposition, or of a corresponding flexion. "A golden cup," they say, "is the same as a gold cup, or a cup of gold."

Kirkham disagrees, observing sensibly that "a beer cask, and a cask of beer, are two different things", and similarly "a virtuous son" vs. "a son of virtue". But he doesn't try to rule out "a gold cup" or "a beer cask" on the grounds that "a noun used to modify a noun" is wrong.

English-language structures of the form [noun noun] are grammatically diverse --the first noun might be a modifier ("cotton shirt", "first-year student"), or just the first part of a lexical compound ("ice cream", "spark plug"), or a complement of the following head ("dog trainer", "cup rack"), or a title or appositive ("President Bush", "the opera Fidelio"); etc. And people can and do disagree about the taxonomy of types and the analysis of specific cases.

What's clear, though, is that such structures are common, and furthermore that it's perfectly OK to use them in forming the name of a political party. Looking around the world, in addition to the Labour Party, we find the Hope Party, the Optimist Party, the Liberty Party, the Unity Party, the Reform Party, the Freedom Party, the Independence Party, and so on.

So when the dean of the College of Communication at one of America's best universities, a specialist in the language of politics, thinks that nominal modifiers are always ungrammatical or at least substandard, perhaps we've reached a historical low-water mark in the ability of intellectuals to analyze language.

Then again, maybe he was just misquoted. It happens all too often in the popular press -- and if that's what happened in this case, I apologize for suggesting that past generations of linguists have failed in their duty to teach Dean Hart how to think and talk about the structure of sentences.

[For those of us who like to hear as well as read things, here's the president's offending SOTU sentence:


[Update -- Karl Hagen argues that Kirkham is not relevant, because more recent school grammars are even worse. That's certainly true in the case that he cites. The main thing that's changed, I think, is that even Kirkham, as bad as he was, presented students with exercises in which they had to do a complete analysis of chunks of real text. And therefore they had to have something to say -- even if it was something stupid -- about most of the commonplace phenomena of the language. In contrast, current instruction seems to teach (false) general principles, and never asks students to apply them systematically to the analysis of any significant amount of actual English text.

Instead, the more modern exercises that I've seen involve classifying or otherwise commenting on very limited portions of a few artificial sentences.


More on this:

Making the world safe for "Democracy" (10/16/2004)
-Ic (1/30/2007)
-Ic -y matters (2/11/2007)
Old habits die hard (2/15/2007)
Hatchet job on Hart? (2/18/2007)
More political morphology: Democrats, Great British, and geese (2/19/2007)
The perils of comic-strip lead time (2/25/2007)
The International Democrat Union (3/26/2007)
Third time's a charm (4/3/2007)

Posted by Mark Liberman at 12:12 AM

January 30, 2007

Pronouns: the early days

Back in the days when men hunted and women gathered, the parts of speech were thin on the ground.  Here's a moment of discovery from Bizarro:

Ok, Oogo is stuck using "woman", having no pronoun "you".  (On other fronts: he's fine on extraction, at least within a clause, though he still doesn't have auxiliary inversion down; but we're talking about pronouns here).

What's notable here is that Oogo DOES use a pronoun -- but it's an interrogative pronoun, "what" (which is grammatically indefinite), rather than an ordinary personal pronoun (all of which are grammatically definite), like Ooga's "you" and "I".  Apparently, "pronoun" here means '(definite) personal pronoun'.

In a tradition going back (at least) to the grammarians of Latin, Pronoun (the exemplars of which are definite personal pronouns) is a part of speech distinct from Noun.  Many modern scholars -- notably Huddleston and Pullum in the Cambridge Grammar of the English Language -- dispute this (correctly, in my view), maintaining that, as far as their syntax goes, pronouns are just one of a number of somewhat idiosyncratic subtypes of the category Noun (they're similar in a number of ways to proper names).  So maybe Ooga, with her adverbial subordinate clauses and progressive aspects and past/present tense distinctions and articles, not to mention discourse markers like "here", is being somewhat unfair to Oogo: he's clearly got nouns, but maybe he just hasn't worked out all the subtypes (proper nouns, count vs. mass, collectives, etc.).

(Surely you can supply here the obvious references back to Brizendine and others on male/female differences.)  Me Tarzan, you Jane.  Why, Tarzan, I'm so delighted that you and I could meet!

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:42 PM

Support our troop

Following up on my geeky posting on plural, mass, and collective, the Onion (of 1/29/07) has its own take on troop 'serviceperson'. 

The story begins:

Bush Commits One Additional Troop to Afghanistan

WASHINGTON, DC -- In an effort to display his administration's willingness to fight on all fronts in the War on Terror, President Bush said at a press conference Monday that American ground forces in Afghanistan will be aided by the immediate deployment of Marine Pfc. Tim Ekenberg of Camp Lejeune, NC.

and ends:

Although the 325th is forbidden from disclosing specific details of the upcoming assignment, his father spoke to reporters from the brigade's childhood home in North Carolina shortly after Bush's announcement.

"Even if you disagree with our commander in chief, I ask that your prayers go out to Tim and that we continue to remember the sacrifices that are being made out there," Dean Ekenberg said. "Please, support our troop."

(Hat tip to Haninah Levine.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:52 PM


If you listened, you probably noticed it. The President dropped the "-ic" morpheme in his State of Union address when he said, "I congratulate the Democrat majority." Many thought this was a calculated insult that othewise belied his protestation of a new bipartisanship. They should be getting used to by now, though, because it's been used over and over again in recent years.

What's more interesting is Bush's effort to defuse this criticism. The Washington Post picked up on it.

In the President's own words:

It's nothing more than an oversight

I didn't even know I did it

Gosh, it's probably Texas

I'm not good at pronouncing words anyway

There was no intentional slight of anyone

There's probably no way to determine for sure whether it was conscious, an oversight, or intentional. But regional dialect studies don't include -ic dropping as a common Texas speech pattern, eliminating that one. So what about his "I'm not good at pronouncing words anyway" excuse?

People who habitually mispronounce words usually do this rather consistently. So if Bush had some sort of personal speech problem with the morpheme, -ic, you'd expect him to drop it across the board. Let's look at his other uses of the -ic morpheme in that same address.

economic reform

public schools

basic private health insurance plan

basic health care insurance

domestic oil production

Strategic Petroleum Research

Atlantic Ocean

public servants

horrific scale

democratic legislature (of Afghanistan)

democratic constitution (of Afghanistan)

tragic escalation

democratic Iraq

diplomatic strategy

democratic Palestine

Of the 16 potential -ic words to which the morpheme applies, Bush had no trouble pronouncing it well -- except for this one time: "I congratulate the Democrat majority." And when he used the same word in reference to foreign countries, it also came out just fine. Afghanistan is said to have a "democratic legislature" and a "democratic constitution," while Iraq and Palestine are said to be "democratic."

Wondering if Bush has developed this affectation only recently, I also did an -ic check on his State of the Union speech of January 2003 and found:

domestic programs

historic education reform

economic growth

democratic Palestine

ballistic missiles

Atomic Energy Agency

catastrophic attacks

economic stagnation

economic sanctions

No -ic dropping here. All -ic words are fully formed. Must not be just recent.

Sorry. I'm afraid that it's not his Texas dialect and it's not just vernacular pronunciation--like his habitual way of pronouncing "nuclear."

It must be something else.

Posted by Roger Shuy at 07:10 PM

Using words you don't understand

Arnold's post about the danger of using the Hindi words encountered untranslated in a novel, some of which are obscene, reminds me of a prank played on me years ago when I visited Taiwan. I went out for dinner with a bunch of Taiwanese people. Knowing that I could speak a little Mandarin, they told me that I should advance my education and learn Taiwanese. As my first lesson, they taught me what they said meant "thank you" and encouraged me to use this in the restaurant. Whenever one of the waitresses, who were all teenage girls, brought me something I would say what I thought meant "thank you". This produced a great deal of giggling, which at first I attributed to the humor of the foreigner trying to speak Taiwanese.

The giggling went on though, past the point where it seemed justified. A foreigner using a Taiwanese expression isn't that funny. I finally realized that something was going on, and my hosts revealed that the expression they had taught me did not mean "thank you" at all. Rather, it is something a man says to a woman who is flirting with him. It means "Back off. You're being too forward. But don't think that I'm not interested." This of course explained the continued giggling.

The moral of both stories is that you shouldn't just try out bits of language that you don't understand. People may think that you mean what you are saying.

Posted by Bill Poser at 01:03 PM

Not objectionable

Without meaning to be at all picky, let me just enter an important demurral about Claire's remark (at Angarrgoon): "I agree with Geoff Pullum at Language Log that the phrase “person of color” is objectionable." Please hear me: I never said anything of the sort.

The phrase is not objectionable, grammatically or politically. Linguistically, its widespread use by people who cannot possibly all be making sporadic coincidental errors shows us that it is irrevocably part of Standard English now, and certainly it must be regarded as grammatically well formed. Politically, the people who invented it were trying to show some kind of terminological respect for the oppressed peoples of the world who are commonly classified in terms of a lack or deficit (they aren't "white"), and also to unify a wider sense of solidarity among those regarded as non-Caucasian, and in principle their efforts were supposed to be a part of building an anti-racist political consciousness, and I approve of that.

(Whether it's actually sensible to classify humanity according to who has the "color" property and who doesn't is a different matter, not under discussion here. I'm talking about apparent original motives. I don't think those who like the phrase "person of color" are in general trying to plan a worldwide racial war against whites. But just in case they are, then let me just depart from my prepared remarks to go on record as taking the controversial stance of opposing any global movement to slaughter white people, O.K.? I think mass slaughter of white people would be objectionable.)

So, use the term person of color at will. Feel free. There is nothing wrong with it. What I said was merely that I hated it, and I won't be using it.

I did mention the oddness that it doesn't seem to follow a regular pattern (a suntanned person is not a "person of suntan"), but that was incidental; it's just part of what I think might have initially made the phrase irritating to me when I first heard it. Much more important is that I stressed that I was evincing a purely personal dispreference: a dislike comparable to the fact that I dislike "real ale" (the strongly hop-flavored British beers from breweries that true beer enthusiasts rave about are just not for me), or that I am repelled by Hawaiian pizza.

And that was my key point, the one that I related to an issue about attitudes to language. Not everything is objectionable just because I (or you) have a personal distaste for it. The solution for you if you hate the term person of color, or the taste of real ale, or the notion of pineapple on your pizza, or the use of they with a morphosyntactically singular antecedent, is essentially the same in each case: don't use it, drink it, eat it, or say it, respectively.

That's what the worst of the grammar grumblers and usage whiners consistently fail to see: that their personal dislike of (say) split infinitives does not determine automatically that split infinitives are incorrect in Standard English. Your dislike of split infinitives might instead simply mean that you hate them: they might be (and in fact are) fully grammatical at all stages of the history of English, and often recommended as the best choice on style grounds, and sometimes obligatory if you don't want to completely rephrase, and you still might hate them. In that case, don't use them. End of point.

Posted by Geoffrey K. Pullum at 12:41 PM

Mind your Hindi!

From the NYT Book Review, 1/28/07, p. 6:

Editors' Note

If readers of the Book Review have been considering picking up a little conversational Hindi, they would probably do well not to begin with the sample list of words in the Jan. 7 review of "Sacred Games," a novel by Vikram Chandra that sprinkles untranslated Hindi throughout its English text. Indian readers pointed out that while most of the Hindi terms in the review were innocuous, several were in fact obscene -- suitable for Chandra's tough-guy characters, no doubt, but not for the Book Review, where editors failed to check the meaning of the words in the novel's glossary.

Just another item in the annals of taboo avoidance, in the "Oops!" subcategory.  The price of modesty is eternal vigilance.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:10 AM

The sex

Claire Bowern at Anggarrgoon recently wrote ("person of color", 1/26/2007):

I agree with Geoff Pullum at Language Log that the phrase “person of color” is objectionable. But it does have one useful function: it highlights markedness relations. After all, white people have “colour” too [...] The only way people could have come up with a phrase “person of color” is by highlighting the marked relationship between minority and majority skin colour. [...]

Perhaps we should try to make this formulation a bit more widespread. If we’re stuck with the phrase “person of color”, how about adding to it person of gender? A computer geek might be a person of RAM? Any more?

Well, people who are not bald could be called persons of hair, for example.

But Claire's observation reminds me of something that struck me as very strange when I first encountered it -- the use of "the sex" to mean "the female sex", just as a "person of color" has come to mean a "person of darkish color". Claire's reference to markedness explains both usages; but the explanation has a subtle twist in it, I think, that may be worth exploring.

"Markedness" is a Prague School term from the middle of the 20th century. It's been given various technical meanings in various linguistic theories over the years -- originally, I think, in the phonological theories of Nikolai Trubetzkoy -- but these days, it's mainly used in a more informal way, to talk about (attributions of) naturalness in general. This typically involves a system of classification or description where some sub-groups are "marked" (and thus distinctive or noteworthy) while others are "unmarked" (and thus the normal or default case).

The gendered names of animals are a common example, as in the wikipedia article:

A marked form is a non-basic or less natural form. An unmarked form is a basic, default form. For example, lion is the unmarked choice in English — it could refer to a male or female lion. But lioness is marked because it can only refer to females.

Now as I mentioned, for about 350 years, the phrase "the sex" could be used used to mean "the female sex", i.e. "women". Here are the citations from the OED for this sense of the word sex:

1589 PUTTENHAM Eng. Poesie III. xix. (Arb.) 235 As he that had tolde a long tale before certaine noble women, of a matter somewhat in honour touching the Sex.
1608 D. T[UVILL] Ess. Pol. & Mor. 101b, Not yet weighing with himselfe, the weaknesse and imbecillitie of the sex.
1631 MASSINGER Emperor East I. ii, I am called The Squire of Dames, or Servant of the Sex.
1697 VANBRUGH Prov. Wife II. ii, He has a strange penchant to grow fond of me, in spite of his aversion to the sex.
1760-2 GOLDSM. Cit. W. xcix, The men of Asia behave with more deference to the sex than you seem to imagine.
1792 A. YOUNG Trav. France I. 220 The sex of Venice are undoubtedly of a distinguished beauty.
1823 BYRON Juan XIII. lxxix, We give the sex the pas.
1863 R. F. BURTON W. Africa I. 22 Going ‘up stairs’, as the sex says, at 5 a.m. on the day after arrival, I cast the first glance at Funchal.
1892 ‘MARK TWAIN’ Amer. Claim. xvii. 160 The customers applauded, the sex began to flock in.
1920 D. LINDSAY Voyage to Arcturus i. 2 He was used to such receptions at the hands of the sex.

Less often, simple "sex" without the article was used to mean "female":

a1700 DRYDEN Cymon & Iph. 368 She hugg'd th' Offender, and forgave th' Offence, Sex to the last!

But if "male" is unmarked and "female" is marked, why did "the sex" come to mean "the female sex"? You might think this is backwards -- if the default lion is a male lion, why wasn't "the (unspecified) sex" the male sex? Well, Claire's point is that we tend to put unmarked or default properties in the background, and bring just the marked properties -- or groups -- out to be noted or named. Thus to note the relevance of the category of sex is to imply that the sex in question is female, just as to note the relevance of the category of color is to imply that the color in question is "black" or "brown" or "yellow".

Except in ironic contexts, this usage of "the sex" seems to have died out in the 1930s or thereabouts.

There's some evidence, I think, that persons of gender found "the sex" offensive even in the 19th century. Elizabeth Barrett Browning, for example, used this phrase only once in her poetry. She puts it in the mouth of a sexist parodying someone even more sexist.

Here's the passage in question. In Aurora Leigh (second book), Aurora's cousin Romney, in the context of asking her to marry him, disparages her ambitions as a poet.

(Because the sequence of speakers can be hard to follow, I've put Romney in blue and Aurora in red).

220 ... Women as you are,
221 Mere women, personal and passionate,
222 You give us doating mothers, and perfect wives,
223 Sublime Madonnas, and enduring saints!
224 We get no Christ from you,---and verily
225 We shall not get a poet, in my mind."

226 "With which conclusion you conclude . . ."
226 "But this,
227 That you, Aurora, with the large live brow
228 And steady eyelids, cannot condescend
229 To play at art, as children play at swords,
230 To show a pretty spirit, chiefly admired
231 Because true action is impossible
232 You never can be satisfied with praise
233 Which men give women when they judge a book
234 Not as mere work but as mere woman's work,
235 Expressing the comparative respect
236 Which means the absolute scorn. 'Oh, excellent,
237 'What grace, what facile turns, what fluent sweeps,
238 'What delicate discernment . . . almost thought!
239 'The book does honour to the sex, we hold.
240 'Among our female authors we make room
241 'For this fair writer, and congratulate
242 'The country that produces in these times
243 'Such women, competent to . . . spell.'"

243 "Stop there,"
244 I answered, burning through his thread of talk
245 With a quick flame of emotion,---"You have read
246 My soul, if not my book, and argue well
247 I would not condescend . . . we will not say
248 To such a kind of praise (a worthless end
249 Is praise of all kinds), but to such a use
250 Of holy art and golden life. I am young,
251 And peradventure weak---you tell me so---
252 Through being a woman. And, for all the rest,
253 Take thanks for justice. I would rather dance
254 At fairs on tight-rope, till the babies dropped
255 Their gingerbread for joy,---than shift the types
256 For tolerable verse, intolerable
257 To men who act and suffer. Better far
258 Pursue a frivolous trade by serious means,
259 Than a sublime art frivolously."

It might be interesting to explore the death of this expression. Did it just gradually cease to be used? Was there an explicit campaign against it? (Perhaps someone has already studied this -- if you know of such work, please tell me.)

[Update -- Abraham John Rein writes:

Just a note on a phenomenon similar to referring to "the female sex" as simply "the sex": in the mid-twentieth century and beyond, it was quite common to refer to music traditionally performed by Blacks (or, I guess, persons of color) as "race music." The implication being, of course, that if there is a "race" to be noted, it's not white -- just as, when a "sex" is noted, it's not male.

I agree that this is exactly the same pattern, abstractly considered. But the sequence of citations for this sense of race in the OED suggests that it might also have had some function as a euphemism, which does not seem to have been true for the use of "the sex" to mean "the female sex":

1926 H. NILES in W. C. Handy Blues 31 Listen to the ‘race records’, for this craft is sui generis.
Jrnl. Abnormal & Social Psychol. Apr.-June 12 ‘Race blues’..are not always what they seem.
1935 Vanity Fair (N.Y.) Nov. 71/3 Negro bands play ‘race music’ (a curious euphemism spread by phonograph companies).
Collier's 30 Apr. 24/4 We were afraid to advertise Negro records. So I listed them in the catalogue as ‘race’ records and they are still known as that.


[Update #2 -- Coby Lubliner writes:

I don't think that the identity male=unmarked is always valid. Take 'goose,' for example. I also don't agree that the 'race' example is equivalent to 'sex'; the term 'race music' was mainly used by black people, just as many Hispanics refer to themselves as "la raza", while 'the sex' seemed to have been used primarily by men. My guess is that 'the sex' went out of use as the word 'sex' came to mean sexuality or sexiness. I wonder if this was around the same time as when the meaning of 'make love' shifted from 'woo' or 'court' to 'have sexual intercourse'.

The last point may well be valid -- certainly the early part of the 20th century, when "the sex" seems to have gone out of favor as a way to say "women", was also the time that Freudian talk about sex came into prominence.

(Dave Long adds: "...and around the time when Cambridge's "Sex Viri" had to add one to their number (becoming "Septem") after the publicity surrounding JBS Haldane's divorce correspondence". Dave's reference is explained (sans Haldane and divorce) here:

(1) a university court of first instance in disciplinary cases against senior members ; (2) a court of appeal against decisions of the Court of Discipline in the case of junior members . There is reference to 'septemviri' in the Elizabethan statutes of 1570, but the court as it survives today was first embodied in the new statutes of 1857 as a disciplinary court for senior members. It comprised six persons with the Chancellor presiding and was normally referred to as the Sex Viri (and ever so wittily as the Sex Weary). The 1926 statutes substituted the Vice-Chancellor for the Chancellor, and allowed appeal to the Chancellor, or his deputy, and two assessors. In 1939, without any necessity to revise the composition of the court, it was re-christened the 'septemviri'.

Whatever their number, we can judge the attitude of these viri towards sex by the fact that they expelled William Empson in 1929 because a college servant found condoms in his room.]

Posted by Mark Liberman at 08:53 AM

People in Glass Houses

I'm always amazed when people try to make political points based not on events or people's explicit statements but on subtle inferences that they are unequipped to make. The most recent example to catch my eye is this post by Robert Spencer at Jihadwatch, a blog to which I referred the other day.

Spencer quotes the following passage from an AP news item:

Benkahla was one of only two defendants who were acquitted in the government's prosecution of a dozen Muslim men who participated in what the government called a "jihad network" that used paintball games in the Virginia woods in 2000 and 2001 as a means to train for holy war around the globe.

IHe criticizes the Associated Press for questioning the truth of the government's allegation that the men were training for jihad:

The government called it a "jihad network." That's AP for you. That they gained ten convictions on that basis might suggest that there was something accurate in this designation, but you wouldn't get that impression from this story.

Spencer seems to think that putting "jihad network" in quotes calls it into question. He's wrong. There are a number of reasons for putting a phrase into quotes. Doubt about the validity of the characterization is one of them, but it is by no means the only one, or even the most common or default. Another reason is that the phrase is someone else's and is not a standard term. That's almost certainly what the AP intended here.

What is especially strange here is that although the quoted phrase itself admits of ambiguity as to the AP's intended meaning, the passage as a whole does not. The relative clause "who participated in what the government called a 'jihad network' that used paintball games in the Virginia woods in 2000 and 2001 as a means to train for holy war around the globe." is factive, that is, it is a clause that may only be felicitously uttered if the speaker believes the proposition it expresses to be true. The AP article therefore asserts as true the proposition that the men trained for holy war, which refutes Spencer's claim that the AP is casting doubt on the government's position.

[Comment by Mark Liberman:

There's an ambiguity here that Bill may have missed.

When the AP wrote

Benkahla was one of only two defendants who were acquitted in the government's prosecution of a dozen Muslim men who participated in what the government called a "jihad network" that used paintball games in the Virginia woods in 2000 and 2001 as a means to train for holy war around the globe.

did they mean for everything after called to be part of its complement, and thus explicitly flagged as merely as a government allegation? This would be an odd editorial choice, as Spencer says, since it suggests that the whole holy-war training activity might be a figment of the prosecutors' imagination, despite the convictions obtained in a number of other cases.

Or did they intend the complement of called to end after "jihad network", so that the rest of the sentence is in the AP's own voice? That would be more in line with (what I take to be) the normal journalistic practice.

Simplifying the sentence and drawing only the relevant structure, this is the difference between


Bill is relying on the second interpretation, I guess, but it seems to me that the first interpretation is the more natural one.

Also, the use of the term "factive" in this context may be confusing.

There's a 35-year-old neologism, due to Paul and Carol Kiparsky and widely adopted by linguists, that uses "factive" to describe verbs like know, regret, learn etc., which presuppose the truth of their complements, in contrast to verbs like believe, complain, hear, etc., which do not. This distinction came up in a couple of earlier LL posts ("Verb semantics and justifying war", 9/21/2003; "Bush's understanding of factive verbs", 9/23/2003). But there's no factive verb in the AP sentence under discussion. ]

[Response to Mark's comments:
With regard to "factive", I'm well aware that there is no factive verb here. That's why I talked about the clause being factive, not the verb. I don't agree that "factive" always suggests the use of a factive verb. Kiparsky and Kiparsky's "Fact" paper, while indeed a classic, and one whose authors are both good friends of mine, does not define the admissible uses of "factive".

As for the putative structural ambiguity, I don't think that it is really there. Not only do I consider the interpretation that I relied on, the second one, to be more natural, the first one, in which the relative clause is part of the complement of "call", is impossible as written. To get that interpretation the relative clause would have to be within the quotes, which it is not.

It is also important to note that even if one does get both readings of the relative clause, all that does for Spencer is to make his interpretation possible. His criticism is only valid if the interpretation on which the AP is doubting what happened is the only one. So even if Mark is right about the ambiguity, Spencer's criticism of the AP is unwarranted.]

Posted by Bill Poser at 03:51 AM

January 29, 2007

Snow-word progress: glacial at best

Geoff Pullum did his best to sound optimistic a few weeks ago when a reader sent in a reasonably well-informed treatment of the "Eskimo snow words" myth from the Holland Herald, the in-flight magazine of KLM Airlines. This respite from the usual drumbeat of media misinformation was notable enough to catch the attention of Michael Quinion at World Wide Words and Nathan Bierma at the Chicago Tribune, who both shared Geoff's sanguine sentiment that there was "progress at last" on the snow-word front. But the headline to Bierma's column is probably a more accurate assessment: "Hell will freeze over before Eskimo 'snow' myth melts." Just a week and a half after Bierma's piece appeared, this is how Jeff Lyon began an item in the Chicago Tribune's Sunday Magazine under the rubric "Cultural Riffs":

It's been said that Eskimos - known as the Inuit these days - have 40 words for snow, reflecting how profoundly connected their lives are with the white stuff.


The rest of the item is predictable enough for anyone who knows why we call snowclones "snowclones."

If so, what does the following huge vocabulary say about us?

Murder; kill; slay; assassinate; dispatch; hit; annihilate; eliminate; eradicate; rub out; liquidate; execute; ice; cool; do in; do away with; bump off; knock off; finish off; massacre; slaughter; waste; wipe out; zap; silence; cap; whack; snuff; extinguish; exterminate; decimate; shed blood; take for a ride; take out.

And, of course: Shoot; gun down; plug; fill full of lead; mow down; stab; knife; slash; flay, cut out the giblets of; eviscerate; garrote; hang; strangle; smother; suffocate; choke; asphyxiate; drown; defenestrate; bludgeon; crucify; poison; behead; guillotine; lynch; starve; gas; blow up; bomb; atomize; incinerate.

There's way more, but that's enough for a Sunday morning.

So what does that big list say about us? I think it says that we have newspaper staffers who are really good at using a thesaurus (and who don't bother to read the language column in their own paper).

For more coverage of the never-ending snow-word struggle, see the links in this post.

(Hat tip, Erin McKean.)

[Update: The Trib admirably issued a correction to the item. Details here.]

Posted by Benjamin Zimmer at 04:52 PM

Advances in cinematic xenolinguistics

Are you a fan of alien languages like Klingon but find that they somehow lack verisimilitude? Well, director James Cameron of Titanic fame would like you to know that he's making just the movie for you. In a recent article in Entertainment Weekly, Cameron boasted that his new film project Avatar will represent the gold standard in xenolinguistics:

It's the story of an ex-Marine named Jake who travels to the inhospitable planet of Pandora, where humans can survive only by — buckle up, kids — projecting their consciousness into genetically engineered bodies (a.k.a. "avatars"). Seems earthlings want to colonize Pandora in order to mine a valuable substance Cameron conspicuously dubs Unobtanium. Pandora's population — a fearsome alien race who lives in harmony with nature — isn't too keen on being exploited. Jake falls in love with a native, war ensues, and he must choose a side. Cameron is so committed to creating a fully formed Pandoran culture that he has linguistics professor Paul Frommer devising a new language: "[Paul] told me, 'We're going to out-Klingon Klingon!'"

Cameron offered further details in an online interview with EW:

Is it true you have developed a whole culture and even a whole language for the aliens in this movie?
Absolutely. We have this indigenous population of humanoid beings who are living at a relatively Neolithic level; they hunt with bows and arrows. They live very closely and harmoniously with their environment, but they are also quite threatening to the humans who are trying to colonize and mine and exploit this planet.

How long did it take to brainstorm the language? Did you work with people on that?
There's a guy named Paul Froemer
[sic] who I was lucky enough to encounter a year ago. He's the head of the linguistics department at USC. I talked with a number of linguistics experts, but he was the one who kind of got the challenge. He said, "We're going to beat Klingon! We're going to out-Klingon Klingon! We're going to have a more detailed and well thought out language than Klingon!" He's been working on this for a year. It began by riffing off things in the treatment, but from there, it went to how sentences would be constructed, and what the sound system would be. It would have to be something that was pronounceable by the actors but sounded exotic and not specific to human languages. So he's mixing bits of Polynesian and some African languages, and all this together. It sounds great.

For the record, Paul Frommer is director of the Center for Management Communications and associate professor of clinical management communication at USC. (He has a PhD in linguistics and has taught courses in USC's linguistics department, but he certainly doesn't head the department — James Higginbotham is the current chair.) According to his personal page, Frommer's specialties include professional writing and editing, cross-cultural communication, and language development for non-native speakers of English. Whether this background has prepared him for the task of constructing a language to out-Klingon Klingon, only time will tell.

Klingon was devised by Berkeley-trained linguist Marc Okrand, who drew on his research in Native American languages for some of Klingon's more off-beat morphological and phonological features. (See one linguist's description here.) A notable phonetic touch in Klingon is the lateral-release voiceless alveolar affricate, as found in Nahuatl (and at the end of the word Nahuatl itself). Similarly, Star Wars sound designer Ben Burtt supposedly mined the Quechua phonetic inventory for the alien sounds of Huttese. The way that Cameron tells it, Frommer has embarked on a similar expedition for "exotic" sounds from the world's languages, "mixing bits of Polynesian and some African languages" to forge Pandoran.

It's telling that Cameron describes the resulting tongue as something that is "pronounceable" yet sounds "exotic and not specific to human languages." But if the phonetic elements of Pandoran are all derived from actual languages of the world, then how are they "not specific to human languages"? A charitable reading of Cameron's quote is that the sounds of Pandoran aren't specific to any single human language. Less charitably, one might wonder if Cameron thinks that the far-flung languages contributing to Pandoran don't quite sound "human" to him.

A possible giveaway is that the humanoid Pandorans are described by Cameron as "living at a relatively Neolithic level ... very closely and harmoniously with their environment" but are also "quite threatening" to their colonizers. In other words, they're noble savages. Regardless of the relative level of sophistication that Frommer might impart to Pandoran, I have a sneaking suspicion that this alien language will serve much the same cinematic function as the language of the Skull Islanders in the original King Kong: primitivizing and exoticizing the linguistic "other."

Posted by Benjamin Zimmer at 03:55 PM

Inferences in perjury cases

As those who keep up with the news are aware, Lewis "Scooter" Libby is currently on trial for five counts of perjury. This means that the prosecutor believes Libby lied under oath. Everyone lies at one time or other, but lying isn't a criminal charge unless it's done under oath about issues material to the case. It's generally agreed that lying involves a falsehood, deliberate concealment, or misrepresentation of truth with the intent to lead a listener or reader into error or to disadvantage. Merriam Webster's Dictionary of Law (1996) defines perjury:

Knowingly making a false statement (as about material matter) while under oath or bound by an affirmation or other officially prescribed declaration that what one says, writes, or claims is true.

The key words here are "intent" and "knowingly." I don't know what all the evidence is in Libby's case but I do know that trying  to discover a person's intentions can be a complicated business. It will be interesting to watch how this plays out in Libby's case. I've seen prosecutorial efforts that have  done a questionable job of it. The 1983 case of Steven Suyat, a carpenter's union business agent, comes to mind.

Suyat was not even a target in the original investigation in 1981, when the head of his carpenter's union was accused of unfair practices after he had his union members picket  the site of a housing contractor who didn't pay union rates to his non-union workers.  Two of the union's business representatives (Suyat wasn't one of them) were required to file affidavits to the National Labor Relations Board, which eventually settled the dispute.

But eighteen months later the District Attorney filed perjury charges against these two union business agents, based largely on secret tape recordings the contractor had made of his conversations with them. Suyat was called to testify before the Grand Jury and at their trial but he was not charged with anything directly related to this case. Then, shortly after the two business agents were convicted, Suyat was indicted for perjury that he allegedly commiteed during his testimony.

Four of the counts in his indictment, taken directly from his trial testimony, are the following:

Count 1

Prosecutor: And one of the jobs of the business agent is to organize non-union contractors. Is that right?

Suyat: No.

Count 2

Prosecutor: So no part of your job is to organize contractors?

Suyat: No.

Count 3

Prosecutor: An no part of Mr. Nishibayashi's job is to organize contractors?

Suyat: That's right.

Count 4

Prosecutor: And no part of Mr. Torres' job is  to organize contractors?

Suyat: That's right.

Here's where intention comes in. What did the prosecutor intend in his questions and what did Suyat intend in his responses? Was Suyat lying? More to the point, what did  he think the prosecutor was asking him? Maybe the average person might understand that when the prosecutor said, "organize contractors," he really meant, "organize people who work for contractors." But not Suyat. And mabye the average person might understand that Suyat's answers were to the fact that unions organize workers, not their bosses, the contractors for whom they work. But not this prosecutor. At Suyat's trial the jury adoped the prosecutor's intention, not Suyat's, and convicted him of commiting perjury under oath.

These four counts illustrate the importance of context, something near and dear to the hearts of sociolinguists. For us, "context" refers to both linguistic and social context. The latter embraces the non-language factors surrounding a statement. It includes such things as where the statement took place, the educational and social status of the speakers, the conversational genre and routine in which the conversation took place, and other sociolinguistic factors.

So let's consider some of the social context. Suyat, a second-generation Filipino, was born and raised on the island of Molokai, which then had a population of about 12,000--a backwater area. Most of the inhabitants of the island worked in the cane fields or factories. Suyat was in a class of about a hundred students who graduated from the island's only high school. He then became a carpenter for seven years before taking the job as one of his union's business agents. Like his fellow Molokaians, he spoke Hawaiian Pidgin English. And like most of the rest of us, he little understood the discourse routines and genre found in a court trial. But he had enough schooling to know that he needed to listen carefully to what was asked of him and to answer carefully. So when the prosecutor put forth an inaccurate definition of what union business agents do, Suyat thought he was answering truthfully. Union business agents do NOT organize contractors. Most dictionaries, such as Merriam Webster's Collegiate Dictionary, agree with him:

contractor  1. one that contracts or is a party to a contract: as  a: one that contracts to peform work or supplies   b.  one that contracts to erect buildings.

Whether or not they realized it, this case presented the urban Honolulu jury with two different sets of intentions. The ultimate irony is that in his attempt to be accurate, Suyat was convicted of perjury.

The way words are used in both their linguistic and social context will be interesting to watch in the Libby trial.

[footnote] A fuller discussion of U.S. v Steven Suyat can be found in my 1993 book, Language Crimes: The Use and Abuse of Language Evidence in the Courtroom, published by Blackwell Publishers.

Posted by Roger Shuy at 11:34 AM

Monday morning mailbag

I've gotten a bunch of notes about missing prepositions.

From Jan Freeman: a literary reference.

So, Mark, here I am reading Trollope at bedtime, avoiding all thoughts of usage and grammar, and I find this sentence on page 88 of "The Warden" (Signet), where our hero is uncomfortably pondering his treatment by the local scandal sheet, the Jupiter:

"Was he to be looked on as the unjust griping priest he had been there described?"

That gets us to 1855, but no doubt someone will send you a respectable, scholarly 15th-century cite in a day or two.

The obvious search strategies on LION don't turn anything up for me, so I'll wait for people who are better at crafting such searches, or respectable scholars who have noted examples in their own bedtime reading. But meanwhile, it occurs to me that "as" is not the only thing that might have been elided in examples like this one -- Trollope's sentence might also be an elliptical form of something like:

"Was he to be looked on as the unjust griping priest he had been there described to be?"

Google gives 43,000 hits for the search {"described to be a|an"}. Among them is a news story on a "four course beer dinner", from which I can't resist quoting this sentence:

The beer lovers will then be treated to a Caribbean-infused duck, which will include ancho chili rubbed duck breast and coconut curry duck confit, which will be paired with Dogfish Head's Raison D' Etre, an ale that is self-described to be "a deep mahogany ale brewed with Belgian beer sugars, green raisons [sic] and a sense of purpose" and was voted American Beer of the Year by Malt Advocate Magazine in January 2000.

(That [sic] is mine -- the Dogfish Head folks spell it more conventionally as "green raisins" in their description of Raison d'Etre Ale. There's a joke in there somewhere about green raisons steeping purposefully, if I only had time to work it out.)

From Arnold Zwicky: an alternative term, an example from his own writing, and a list of examples from others:

Locally we call the phenomenon "absorption"; I'm not sure where the term came from.

A somewhat different case, which i think really *does* involve haplology:

It's fascinating as an art object as well as a presentation of large amounts of information. [AZ writing here]

I'm uncomfortable with "as well as as a presentation..." and so left the sentence as you see it, after some thought.

Back on the absorption front: the examples range from ones that are, for me, absolutely fine, though some grayish area, to truly wretched stuff. Here's a small collection of examples.

From Margaret Marks, a typically insightful observation:

I can't give an informed view of this, but when I was teaching English to Germans, I used to call 'as' in 'as X is known' a relative (Quirk does that in 15.55). I don't know if that's true or not, but there are definitely situations where 'as' does more than one would expect.

In the examples you mention, it seems to me not that the second 'as' is missing, but that the first 'as' is taking on a bigger function.

From Russell Lee-Goldman: some serious linguist-talk on the topic.

My ears prick up when anyone mentions AS, as you certainly did over the weekend.

The question of the missing AS in the sentences you described on the 27th is really interesting, and I actually gave a brief summary of that data in presentation on so-called movement paradoxes involving AS ("Parenthetical as* (and movement paradoxes)", Berkeley Syntax and Semantics Circle, 9/29/2006). Other well-documented paradoxes include "that linguistics is going down the tubes, we could talk about for hours," where the fronted that-clause could not appear after ABOUT. But the AS case is slightly different, since people do say (or at least write) "..., as X is known as, ...".

It's also different in that the first AS (the one that seemingly introduces the finite clause) is plausibly analyzed as a relative proform, not as a preposition, though the second one clearly is one. In any case two ASs are clearly different (though of course historically related), so this might really be more like haplology/RMC than whatever cannibalism might be.

AS also gives rise to other paradoxes in some of its other uses, like "as you may have heard (about) __" and "as you may be aware (of) __", where you can seemingly gap either a that-clause or (if you include those parenthetical prepositions) a noun phrase, though in both cases what is semantically missing is a state-of-affairs.

[As an aside, there are also some interesting semantic issues going on. The syntax and semantics of a use of a name and a mention of a name are different, but with this "as" (I call it Name-as, to contrast it with the uses in "as I said" and "as I can"), you can get a type-shifting effect. So for "Kcat, as she is known to her friends, reports...", Kcat is an entity for the matrix clause, but a name for the as-clause. [cf. Kcat as/which/*who I call her, reports...]]

[Update -- David Beaver comments on the last bit of Russell's note:

i) This has nothing to do with names per se. Parenthetical comments often address metalinguistic aspects of the rest of the utterance. E.g. I could have added "to use a technical term" anywhere in the phrase "often address metalinguistic aspects of the rest of the utterance" except right after "of" or "the". These comments may address naming conventions, as in your example, but may even address spelling. If I said "I'd like to introduce Mark, with a `k' not a `c', who will talk to us about...", would you want to say that "Mark" is type-shifted so that it has one contribution as a written form, and one as a referring expression? You could do, but in that sense, anything can have type-shifted meanings, including "metalinguistic".

ii) Chris Potts actually does analyze similar cases as involving type shifting, in his book "Conventional Implicature", and draws trees where names (etc.) first contribute something picked up by an apositive (like the "as" phrase), and then get shifted into their referential sense in order to combine with the rest of the main clause. For Potts the apositive contribution is a conventional implicature, and is kept separate from the "at-issue" meaning.

Russell's handout references Potts' work... But I have to apologize to non-linguist readers, for whom this is more "inside baseball" than we usually indulge in here.]

Posted by Mark Liberman at 08:22 AM

A lesson in the sweet science

Ladies and gentlemen, I give you the next ... Well, Dr. Louann Brizendine is selling books, not running for political office. But judging from a transcript that Geoff Nunberg just sent me, she could give even 21st-century politicians an advanced course in how to deal with a difficult interview.

Here's the background. One of the punchiest of the impressive factoids in Dr. Brizendine's book The Female Brain was this one: "A woman uses about 20,000 words per day while a man uses about 7,000." But Dr. B has never done any research herself on this topic, and none of her book's references backs her claim up with any research results. And in fact, the scientific literature on the subject of sex differences in communication finds that the differences between men and women are small compared to the variation among men and among women; and the small average differences between the sexes are as likely as not to favor men.

I went over this issue in an article in the Boston Globe ("Sex on the Brain", 9/24/2006), and in more detail in a series of weblog posts.

Since then, your more conscientious journalists have been questioning those numbers, or at least citing the controversy. So when Beverly Thompson interviewed Dr. Brizendine on Canadian television ("In utero brain alternation makes women more chatty than men", 11/29/2006) Dr. B was in a tough spot. The host apparently hadn't done any background research beyond reading the publisher's press kit, and what she wanted to talk about was exactly that problematic (because non-existent) "research" about how women really are gabbier than men.

The way that Dr. B handled this challenge is a classic of the rhetorical art.

THOMSON: Men and women alike think ladies have the gift of gab. And that theory is confirmed in Dr. Louann Brizendine's book, "The Female Brain". It turns out, women say about 20,000 words a day, while men only utter 7,000. To talk a little bit more about this, I am joined by Dr. Brizendine.
Good morning.
BRIZENDINE: Good morning, Beverly. How are you?
THOMSON: I'm well, thank you.
Tell me what made you decide to actually carry out this research. It's certainly something anecdotally that most people would say, oh yeah, that sounds right, women do talk a lot more than men. But what made you decide to actually go and research it and come up with the findings?

Now, at the moment that her book was published, back in August, the honest answer would have been "Actually, Beverly, I've never done any research myself. In this case, I'm relying on some numbers that I found in a pop-psychology book by Allan and Barbara Pease, 'Why Men Don't Listen & Women Can't Read Maps'. And frankly, I'm not sure where they got those numbers. They don't tell us, and for all I know, they just made them up."

And after her interview with Stephen Moss, a couple of days earlier, it would have been honest to add "A critic called these numbers into question, and I found that I was unable to substantiate them. So this claim will be removed from future editions of the book, as I recently explained to a reporter from the Guardian."

But actually answering the questions that an interviewer asks you is the route to oblivion, whether you're selling books or running for office. When direct and honest answers would be embarrassing -- or even boring -- you need to avoid the question and shift to less problematic and more entertaining talking-points. In this case, Dr. B handles the transition magnificently. Her response:

BRIZENDINE: I think, Beverly, one thing that's interesting to know, that's fascinating about the brain, is that when we are in utero the fetal brain is all female in both males and females, up until eight weeks old. And then the tiny testicles in the male fetus start pumping out testosterone. It goes up into the male brain, changes the female-type circuits of the brain, and then basically increases the size of the cells in the male brain for things like sexual pursuit and other male-brain circuits. And by the time we're all born we are born with a male-type brain or a female-type brain.

Beverly Thompson goes along with the neuroscience theme, while steering the conversation back to female gabbiness:

THOMSON: So, what is in the brain that, because I know in part of your research and your findings you say that women actually get a buzz, if you will, off talking a lot.

This one is easier -- Dr. B just needs to resist the temptation to explain that she hasn't done any research on how "women actually get a buzz .. off talking a lot", and no one else has either. And since the delicate issues are a bit out of focus at this point, she can edge in the direction of honesty by bringing up context and individual differences:

BRIZENDINE: Yes, it's interesting. Females like to do lots of what is called overlapping talk and will get together with each other. And in the social setting, in the home and places where women feel comfortable in a social setting with their friends, that is the context in which women tend to speak a lot more -- or speaking on the phone with their mothers or their girlfriends and lots of overlapping speech.
However, you've got to remember that if you are out on a date with a guy for the first time and he's trying to impress you, what will happen is that the guy will be talking, talking, talking and the girl may not get even a single word in edgewise. So, it's the context that counts. Sometimes then men will overtalk you completely.
THOMSON: Well, and certainly, as you mentioned, there's exceptions to every case. I mean, you might just get a chatty man.
BRIZENDINE: I think men in the media in particular, they've chosen that field because they tend to be more chatty than the average guy.

This kind of qualification isn't in her book, but putting it in interviews is both truthful and also (given the challenge that's out there, even if Beverly Thompson hasn't gotten the message) self-protective. However, after a joke at the expense of one of her co-workers, Ms. Thompson heads right back to that troublesome word-count business:

THOMSON: Did you hear that, Seamus? [laughter] Sorry.
Tell me how you did the research, though. Because when you think about it, 20,000 words? Did you sit there and record women over a period of time?

Uh oh. This is the most direct test yet of Dr. B's interview-fu: a simple yes-no question, inviting elaboration on an extremely embarrassing topic. Alas, she never recorded anyone, and she never counted any words in anyone else's recordings, and neither (apparently) did the people she took those numbers from. If it had been me in that CTV studio, I'd have been done for.

But not Dr. B! With a speed and confidence reminiscent of Muhammad Ali in his early fights, she brushes aside the question, flicks a stinging little reference count to distract the interviewer, and does a quick shuffle-dance backwards into evolutionary psychology and spousal spats:

BRIZENDINE: Actually all of the studies -- I reviewed 1,008 studies for my book, "The Female Brain", that looks at all kinds of aspects of how we behave as females and think as females differently than the male.
And you've got to remember that only about one-percent differences turn up. Because the male and female brain are more alike than they are different. So, the interesting part of that is that for millions of years we have evolved in a slightly different niche. The female niche has been more being pregnant, having babies, raising what we call nonverbal infants. So, the female brain tends to be better at certain things like picking up nonverbal cues and social cues, emotional cues.
And the female brain actually remembers the details of emotional events much more than a guy. For example, I don't know about you but many, many women will describe an event where they know some big argument that they had with their husband 10 years ago he doesn't remember at all. And she remembers every single detail. [laughter]
THOMSON: Oh boy. And that's a whole other study I think, Doctor.
BRIZENDINE: Absolutely.
THOMSON: Thank you very much for your time. I'm not sure Seamus heard us because I think he was talking behind me back there. [laughter]

Wow. Talk about your Sweet Science. As Ali once said about himself, she's not the greatest -- she's the double greatest.

Truly, a worthy recipient of the prestigious Goropius Becanus Prize.

Posted by Mark Liberman at 06:30 AM

January 28, 2007

World's longest official country name

Those who enjoy winning at the geography category in trivia games might like to file away for future use what I'm pretty sure is the world's longest official country name, beating "The Former Yugoslav Republic of Macedonia", "The Democratic People's Republic of Korea", and even "The Great Socialist People's Libyan Arab Jamahiriya" and "The United Kingdom of Great Britain and Northern Ireland". The Economist happened to mention recently (1/20/07) that in order to get into the World Trade Organization without making the People's Republic of China hopping mad, Taiwan (oops! I said it!) had to be extremely careful about how it was officially known (forget all about that "Republic of China" stuff). It was admitted as "the Separate Customs Territory of Taiwan, Penghu, Kinmen, and Matsu (Chinese Taipei)". That's the name. Don't get it wrong.

Posted by Geoffrey K. Pullum at 05:08 PM

Color names and color perception: the cartoon version

Two fascinating recent papers on the back-and-forth between language and perception in the domain of color, involving sometime Language Log poster Paul Kay: G. V. Drivonikou, P. Kay, T. Regier, R. B. Ivry, A. L. Gilbert, A. Franklin, and I. R. L. Davies, "Further evidence that Whorfian effects are stronger in the right visual field than in the left ", PNAS 104: 1097-1102 (2007); and Terry Regier, Paul Kay, and Naveen Khetarpal, "Color naming reflects optimal partitions of color space", PNAS 104: 1436-1441 (2007).

There's some background on earlier work by Gilbert, Regier, Kay and Ivry in some earlier LL posts -- "What would Whorf say?", 1/3/2006; "What Whorf would have said", 1/15/2006. The second one is by Paul Kay himself. A good summary of the recent work (as far as I understand things) can be found in "How grue is your valley?", The Economist, 1/18/2007.

More on this later. For today, I just wanted to note that today's Get Fuzzy strip is either a commentary on this research, or a big coincidence.

On reflection, I guess it's a coincidence.

But I gather it's true that dog's color vision is based on two photopigments rather than the three that humans have, and also that the proportion of retinal cones is much lower than in humans. Thus according to Neitz et al., "Color Vision in the Dog", Vis. Neurosci. 3(2) 119-25 (1989):

Measurements of increment-threshold spectral sensitivity functions and direct tests of color matching indicate that the dog retina contains two classes of cone photopigment. These two pigments are computed to have spectral peaks of about 429 nm and 555 nm. The results of the color vision tests are all consistent with the conclusion that dogs have dichromatic color vision.

So if dogs could actually talk, their color language would be different. And it's plausible for Satchel to be confused about what blue means.

Of course, there are plenty of human dichromats. Has anyone looked into their color language, or more specifically the effects of linguistic categories on their color perception?

Posted by Mark Liberman at 03:06 PM

Relevance of a different kind

As I noted in my last posting, sometimes including information leads people to search for the relevance of this information, so that what you say will implicate more that its face value.  Other times, information that in most contexts would not seem relevant is included as a bow to the intended audience -- in what I think of (thanks to Monty Python) as the "news for cats" presentation.  Here, excerpted from a review of films in a San Francisco film noir festival, are three descriptions that might strike many readers as odd:

With John Ireland and gay actor Raymond Burr...

With Burr and handsome, rugged Jeff Chandler, whom Esther Williams revealed was a heterosexual cross-dresser with a fondness for polka-dot blouses.

Handsome, reportedly bisexual Franchot Tone came from a wealthy family...

Until, that is, you learn where this review appeared.

It's from the 1/25/07 issue (p. 3) of the Bay Area Reporter, a weekly paper for the local lgbt audience: Tavo Amador, "Return to Dark City: Annual 'Noir City' film fest comes back to the Castro".  For the most part, the review could have appeared anywhere, but in at least three places Amador threw in details that have nothing to do with the movies themselves but might be thought to be of special interest to the readers of B.A.R.  (I'm inclined to be annoyed by this sort of thing, but I do enjoy occasional dish, and I'm delighted by Jeff Chandler's reported "fondness for polka-dot blouses", even though it's entirely off the point and absurdly specific.)

The Chandler sentence also provides another example of whom in a nominative context (serving as an extracted subject of an object clause -- ESOC, as I put in my extraordinarily geeky posting on who and whom):

... Jeff Chandler, [ whom Esther Williams revealed ___ was a heterosexual cross-dresser ...]

I didn't notice this until I typed the sentence in as an example of "gay relevance".  I'm beginning to think these things are fairly common and I miss a lot of them.  It might be worth searching through some corpora -- ah, but which ones? -- for all occurrences of whom, to see what the frequencies of the various types are.  Maybe -- I am ever hopeful -- someone's already done this.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:03 PM

Prepositional cannibalism

In reference to my post on "More missing prepositions" (1/27/2007), Jan Freeman sent this PS:

Forgot to mention that Fowler 1926/1965 calls this phenomenon "cannibalism" ("That words should devour their own kind is a sad fact"), with examples of disappearing "to," "more," "in," etc. I don't think I've ever seen it mentioned elsewhere in the usage literature, though.

We're talking about examples like "Kcat, as she is known to her friends", which arguably should be "Kcat, as she is known to her friends as". And "cannibalism" is an infinitely better term for this phenomenon than "morphosyntactic haplology", which is what I called it in yesterday's post.

A bit of internet search uncovers the fact that this term was echoed by Sir Ernest Gowers, "The Complete Plain Words: A guide to the use of English" (1954), in a chapter entitled "Troubles with prepositions":

(ii) Cannibalism by prepositions.

Cannibalism is the name given by Fowler to a vice that prepositions are specially prone to, though it may infect any part of speech. One of a pair of words swallows the other:

Any articles for which export licences are held or for which licences have been applied.

The writer meant "or for which export licences have been applied for", but the first for has swallowed the second.

To see what Sir Ernest is getting at, we need to focus on the second of the disjunctive relative clauses:

articles [ for which export licences have been applied ]

and note that a fully explicit, main clause version of the relative clause would have to be something like

export licenses for [those articles] have been applied for

from which one of the two instances of for has been lost in transit. Or alternatively, as Fowler's term suggests, one for has eaten the other.

For lagniappe, Sir Ernest starts his section by quoting Dean Alford's (mildly sexist but still amusing) argument against the preposition-stranding "rule":

Nearly a hundred years ago Dean Alford protested against this so-called rule. "I know", he said, "that I am at variance with the rules taught at very respectable institutions for enabling young ladies to talk unlike their elders. But that I cannot help."

And he also quotes what he takes to be the winner of the "the championship of the sport of preposition-piling" (Morris Bishop in the New Yorker, 27th September, 1947):

I lately lost a preposition
It hid, I thought, beneath my chair
And angrily I cried, "Perdition!
Up from out of in under there."

Correctness is my vade mecum,
And straggling phrases I abhor,
And yet I wondered, "What should he come
Up from out of in under for?"

Perhaps this will redeem him, in Geoff Pullum's eyes, for using the term "phrasal verb":

Sometimes, when the final word is really a verbal particle, and the verb's meaning depends on it, they form together a phrasal verb—put up with for instance—and to separate them makes nonsense.

(Geoff quite properly objects to this term -- see p. 234 of CGEL for the details -- on the grounds that the verb+preposition combinations do not form constituents in modern English.)

Posted by Mark Liberman at 12:47 PM

Implicature troubles

Sometimes you can get into trouble by providing Too Much Information, as this cartoon shows:

(Hat tip to Dave Borowitz.)

Where things started to go wrong here is where Jade refers to "two black guys".  We'll take her word for it that the guys in question were black, but why did she mention that?  Was it somehow relevant?  Important?

Not everything that's true is relevant (or important) in context.  She could have mentioned their ages, their heights, the country they grew up in, their sexuality, their marital status, the city they live in, their relationship to her, or a zillion other things.  (Even mentioning their sex could be problematic; the combination of black and male might raise a flag in our society.)  What if she had said one of the following?

two guys in their thirties
two guys of average height
two Americans
two straight guys
two single guys
two New Yorkers
two acquaintances from grade school

In each case, her audience would be sent on a hunt to discover what these properties might have to do with Jade and her Uno game.

The large principle at work here is part of H.P. Grice's account of "conversational implicature": the principle of RELEVANCE, that what you say should be relevant to the context, which means that if people are assuming you're behaving cooperatively, they'll assume that what you say is indeed relevant to the context.  Which means they'll read more into what you said than what you literally said; what you said will "implicate" more than its face value.

So the fact that Jade's fellow Uno-players were black (and also male) looms large.  Maybe she's telling us that she's cool, and hangs out comfortably with black men.  Whatever.  There's a message there, even if she didn't intend it.  (And nothing gets fixed at the end; the offered revision is even worse than her original.)

Some years ago, in a discussion of plant theft on a gardening newsgroup (plant theft is a distressingly common occurrence), one poster reported that a family of Laotian immigrants had come in the middle of the night, dug up her whole vegetable garden, and carted it away in their truck.  The poster was astonished at the sharp criticism she got from other people on the group, who perceived what she wrote as a slur on Laotian immigrants (or, possibly, Laotians in general or immigrants in general); but by providing these details about the thieves, right at the beginning of her account, she made them loom large in the discourse, overshadowing her intended main point, the monstrous effrontery of the theft.  (By the end of the discussion, I'd concluded that the "those people" tone of her original posting probably reflected her attitudes accurately, but that she wasn't consciously aware of those attitudes.)

This is a place where Strunk's advice to Omit Needless Words is (sort of) good advice -- except that what you should be omitting is not really needless specific words, but needless information.  And to do that well, you'll have to gauge your audience and the context pretty carefully (as well as examining your own intentions).  It's not at all like being careful to omit of with (certain uses of certain) prepositions -- "Kim walked out the door" instead of "Kim walked out of the door" -- which is a relatively mechanical adjustment in the use of very specific words.  (Of course, you might be inclined to just thumb your nose at this prepositional advice, even though it's in piles of handbooks and is often "justified" on the basis of ONW.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:55 AM

Cerebro de El Pais

¿Hay diferencias relevantes entre el cerebro de hombres y de mujeres? ("Are there significant differences between the brains of men and women?") That's the lead of Mónica Salomone's article in today's El País ("Cerebro de mujer", 1/28/2007). This is of course a review of "el libro de una prestigiosa neuropsiquiatra norteamericana", Louann Brizendine's The Female Brain. And like several other articles on Brizendine's book, this one starts with a joke:

Un señor con una esposa muy habladora lee en el periódico un estudio científico que asegura que las mujeres usan cada día unas 20.000 palabras, mientras que a ellos les bastan 7.000; el hombre enseña la noticia, feliz de poder demostrar que ella es un loro. “¿Lo ves?”.“¿Y no será porque tenemos que repetir mucho lo que decimos?”, dice ella. “¿Cómo?”, responde él.

The version from mistupid.com that I reprinted back in August ("Sex-linked lexical budgets", 8/6/2006) goes like this:

A husband looking through the paper came upon a study that said women use more words than men.
Excited to prove to his wife that he had been right all along when he accused her of talking too much, he showed her the study results. It read "Men use about 15,000 words per day, but women use 30,000".
The wife thought for a while, then finally she said to her husband "It's because we have to repeat everything we say."
The husband said "What?"

So far, we're more or less on the same track as many of the other reviews in the popular press.

But things pick up from there. The article cites the work of Melissa Hines on toy preferences; it sketches the debate between Pinker and Spelke over the Larry Summers flap; it describes Ben/Barbara Barres' article in Nature. And near the end of this long (2,100-word) article, we get:

La obra ha sido superventas en Estados Unidos, pero varios científicos han puesto serias pegas. La autora ha tenido que admitir que algunos datos de la primera edición de El cerebro femenino no son correctos. En concreto, los relativos al lenguaje. Según Brizendine, ellas usan al día unas 20.000 palabras (y hablan el doble de rápido), y ellos, 7.000. Mark Liberman, especialista en fonética en la Universidad de Pensilvania, buscó las fuentes de tal afirmación “y simplemente no las encontré”. Sí halló, en cambio, varios trabajos que muestran que no hay diferencia alguna en aptitud lingüística. Brizendine aceptó la crítica y eliminó las cifras de ediciones posteriores. No obstante, Liberman –autor de un blog donde aparece el chiste del principio– teme que acabe siendo otro caso de desequilibrio informativo que ayuda a fortalecer un tópico: decenas de titulares han recogido el 20.000 vs 7.000 de Brizendine, pero no su rectificación.

The work has been a bestseller in the United States, but several scientists have raised serious questions. The author has had to admit that some data in the first edition of The Female Brain are not correct. Specifically, those related to language. According to Brizendine, women use about 20,000 words a day (and speak twice as fast), while men use 7,000. Mark Liberman, specialist in phonetics at the University of Pennsylvania, searched for the sources of this assertions "and simply did not find them". Instead he found various works that show that there is no difference in linguistic aptitude. Brizendine accepted the criticism and eliminated the numbers in later editions. However, Liberman -- author of a blog where the joke at the beginning [of the article] appeared -- fears that this ends up as another case of information imbalance that helps to strengthen a cliché: tens of publications have repeated her 20,000-vs.-7,000, but not her retraction.

I'm not sure whether Dr. Brizendine has really retracted the numbers. According to Stephen Moss in the Guardian ("Do women really talk more?", 11/27/2006),

When I reach Brizendine, just as she is crossing the Golden Gate bridge, she tells me that she has accepted the criticism of the numbers quoted in the book - on both volume of words and rate of speech - and will be deleting them from future editions. Nor will they appear in the UK edition, to be published by Bantam in April.

But in an interview with Deborah Solomon in the NYT Magazine ("He thought, she thought", 12/10/2006), Dr. Brizendine said something different:

Q: Your book cites a study claiming that women use about 20,000 words a day, while men use about 7,000.

A: The real phraseology of that should have been that a woman has many more communication events a day — gestures, words, raising of your eyebrows.

The "communication events" version seems to be equally unsupported -- see "Sex differences in 'communication events' per day?", 12/11/2006, for some discussion. In any case, the copies in the bookstores around here haven't changed, so whatever the change, it seems that Ms. Salomone should have written eliminará rather than eliminó.

And unfortunately, it's not just the language-related numbers in this book that are suspect -- see for example ""Every 52 seconds": wrong by 23,736 percent?" (10/13/2006). The review in Nature (Young and Balaban, "Psychoneuroindoctrinology", Nature 443(7112), p. 634, October 2006) says that the book "fails to meet even the most basic standards of scientific accuracy and balance", "is riddled with scientific errors", and "is misleading about the processes of brain development, the neuroendocrine system, and the nature of sex differences in general".

But overall, the El País article strikes me as an excellent piece of science journalism. I don't just say this because it cites me -- being mentioned in the press, even favorably, can be a trying experience if the writer gets things mixed up, as happens all too often. Unlike some of the journalists who've written about this book, Mónica Salomone is not just re-wording half-understood jacket blurbs and press releases. She's obviously done a good deal of independent research -- reading as well as talking with experts -- and tried to integrate the results in a thoughtful way. It's a bit more even-handed than the topic perhaps deserves, but that's a common journalistic stance with respect to controveries where both sides are viewed as socially licensed.

Ms. Salomone writes frequently about scientific topics for El País, and I'll look for her byline in the future.

[Tip of the hat to Martin G.].

[Update -- Anatol Stefanowitsch writes:

For a while I was on the lookout for Brizendine-related German press reporting. Most of the stories repeated her figures without giving them a second thought (like Thomas Klebl, "Frauen reden mehr", Hamburger Abendblatt, 12/03/2006).

I stopped paying attention after a while, but this month ZEITWISSEN, a popular science magazine published by the staff of Die ZEIT, ran a story about male-female differences that mentions her figures in passing and also mentions that they are wrong ("Frauen sind auch nur Männer", Zeitwissen, 01/2007).

Your posting today also reminded me of a much earlier story from Die Welt, published in October, which presents her claims at length but then mentions that these claims are "contested", they even cite you. In case no one has told you about this story yet, it is Axel Bojanowski, "Der feine Underschied", Die Welt, 10/16/2006).

Keeping up with the international Brizendine industry would be a full-time job, so I'm afraid that the only German-language story that I've read was the later -- but much more credulous -- one by Heike Stüvel, "Das Schweigen der Männer", 12/22/2006, which I discussed in an earlier post ("The silence of the men", 12/29/2006). But I'm interested to learn that Stüvel's failure to do any fact-checking extended to failure to check the recent archives of Die Welt itself. ]

Posted by Mark Liberman at 09:36 AM

January 27, 2007

Scrambling in internet folklore

Back at the dawn of modern time, when LLP still had that new plaza smell, Mark Liberman examined a widely circulated item about what was said to be research at Cambridge University on the comprehensibility of text in which letters inside words had been scrambled, leaving the first and last letters in place.  The claim was that such scrambled text was astonishingly comprehensible: TIHS IS AZANMIG!

Apparently, nothing dies on the internet.  Things propagate and then retreat, but are always ready to revive in force.  We might be entering a resurgent phase of the Cambridge Scrambling Tale; my e-mail suggests that after three or four years in hibernation it's awake and abroad again.

So if you've recently gotten one of the versions of this internet folktale, go back and look at Mark's 2003 posting and at the piece by Matt Davis (which Mark cited), especially at its "update 2" section, where Davis looks at relevant psycholinguistic research and notes that the material in the mailings seems to have been carefully chosen to be comprehensible.

I have little to add to Davis's discussion, except to note that (as I wrote to a friend at the time):

... lots of the versions have typos in them!  For example, the second word in your version [which began: "The phaomnnehil pweor of the hmuan mnid"] is "phaomnnehil", which lacks one "e" and has an extra "h".  It looks like someone was doing the letter transpositions by hand, rather than using a random-transposition scheme, which is what any actual researcher would do.

The second sentence had "rscheearch" (with an extra "ch") and "iprmoetnt" (with "e" instead of "a").  That's three erroneous words in the first 28 words, and most of those 28 words were little ones.  At that point I gave up checking the text.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:08 PM

More missing prepositions?

In response to our recent discussion of missing prepositions in quotations derived from Luke 12:48, John Cowan writes:

Well, now that "from to whom much is given much is expected" has been beaten to death with a clue stick, can something be done about the bizarre missing "as" in "John-john, as he was frequently known, [...]". That unpacks to "He was frequently known John-john", which nobody would swallow.

Let's clarify what's bothering John. In a sentence like

I wanted to give you a little update on Lil´Bit, or Tritill as we call him in Icelandic.

the as-clause has a structure that we can suggest with a pairing like this:

we call him Tritill in Icelandic ⇔ Tritill as we call him __ in Icelandic

However, in a sentence like

"Wulfie" as he is known at home is currently Barry’s favorite jumping horse.

the comparison

he is known "Wulfie" at home ⇔ "Wulfie" as he is known __ at home

is problematic: the left-hand side is defective. On the other hand,

he is known as "Wulfie" at home ⇔ "Wulfie" as he is known as __ at home

has an equally defective right-hand side. What's going on here?

In the first place, it's not obvious that anything needs to be done about this, in the sense of mounting a campaign to correct people's usage. The construction that bothers John is very common, even in well written and well edited English, as the examples below attest.

"Weighing the universe", The Economist, 1/25/2007:

Another aspect of Einstein's work to be tested is the existence of gravitational waves. General relativity views gravity not as a force but as a consequence of the curved geometry of space and time. Space-time, as it is known, has four dimensions: the three familiar spatial ones of length, breadth and height, and time. It can be distorted or curved by massive objects, such as stars.

Sonia Kolesnikov-Jessop, "Spotlight: Julian Metcalfe, founder of Pret A Manger", International Herald Tribune, 1/26/2007:

A self-described perfectionist with a vision "for what is right and what is wrong," Metcalfe, 47, has turned his fixation on the small stuff into a large influence on the British lunch hour. Pret, as it is known, specializes in freshly made sandwiches, served up in cheerful surroundings in high-traffic areas. Since its founding in 1986, the privately owned chain has grown to 180 outlets, mostly in Britain but also in the United States and Asia.

Jonathan Fildes, "Mobiles navigate the future", BBC News, 1/24/2007:

Countries like Japan are well known for their early adoption of technology, while in the US, the mass up-take of GPS was down to legislation.

In 1999, the Federal Communications Commission pushed through an act that requires all handsets to incorporate the technology. The E911 system, as it is known, allows the emergency services to pinpoint the exact location of a mobile phone caller.

Arnold Zwicky, "A recipe for WTF coordination", Language Log, 6/21/2005:

Kcat, as she is known to her friends, reports:

Furthermore, the same construction seems to arise with (all?) the other verbs that take a similar complement: not just "know X as Y", but also "refer to X as Y", "describe X as Y", even "denominate X as Y"... Here are some examples from reputable sources indexed on the web:

Anna Mae He, or AMH as she is referred to in court documents, has been living with Jerry and Louise Baker in suburban Memphis since she was three weeks old.
The issue isn't his religious beliefs, but the fact that at some point this very gentle soul, as they describe him, decided to take up arms.
Rousseau's Bomston, "or the Englishman," as he is denominated in the instructions for the engravings for Julie (1761), suggests a sentimental update of this figure, a melancholic source of stoic advice to the young impetuous lovers.

This reminds me of the case of still unpacked used to mean "not yet unpacked" ("'Still un-X-ed is not yet unspreading", 6/14/2006; "The condescension of descriptivism", 5/21/2005; etc.). We have a widely-used construction, sanctioned by excellent writers and careful editors, which on analytic reflection seems incoherent, at least to many people. Should our response be to persuade people to reflect and sin no more? or should we work harder to justify the ways of norma loquendi, illogical as they may at first seem to us?

Other examples of the same sort are overnegations and could care less.

The case of the missing as needs more analysis, it seems to me, not a hasty decision by the virtual Linguistic Academy. For example, does the fact that the clause starts with as make this a sort of morphosyntactic haplology? Is this case connected to the apparently missing prepositions in phrases like "the place that I went (to)"?

(It's likely that there's some literature on this topic -- if you know of any, tell me.)

Note that John salted the mine by including the adverb frequently, which collocates with "called" rather than "known", as these Google web hits suggest:

  [nil] frequently widely
he|she|it is __ called
he|she|it is __ known

This pattern (which may have something to do with the idea that called evokes a set of events, while known describes a more or less general state) is still present in the as-clauses seen in the current Google News index, though in an attenuated form:

  [nil] sometimes often widely
he|she|it is __ called
he|she|it is __ known

[Update -- Jan Freeman writes:

Last spring, I included it on a reader poll made up of actual quotations each of which included a mistake, or "mistake," in English. Even readers looking for an error were not very sensitive to this one, I found. Here's my tiny contribution to the research file:

In the April 30, 2006, reader poll, I offered this choice:

4. "I was told in interviews with American and European intelligence officials, however, that the laptop was more suspect, and less revelatory, than it had been _____."

A. depicted.
B. described as.
C. rumored to be.

May 6, I reported the results:

4. "[T]he laptop was more suspect, and less revelatory, than it had been depicted" (Seymour Hersh, The New Yorker).
If Hersh had written "the laptop was depicted revelatory," his editor would have inserted an "as." When that "as" would fall at the end of the clause, though, it's often dropped, and many readers don't mind at all: Fifty-two percent of you chose his version. But 41 percent preferred "had been rumored to be," and 7 percent put the "as" back where it belonged, despite the awkwardness of ending with "described as."

OK, so add the the New Yorker to the list of publications that somethings pass a missing "as": The Economist, the International Herald Tribune, and Language Log. And just to round out the set...

Warren St. John, "Refugees Find Hostility and Hope on Soccer Field", NYT, 1/21/2007:

The mayor’s soccer ban has everything to do with why, on a scorching August afternoon, Ms. Mufleh — or Coach Luma, as she is known in the refugee community — is holding tryouts for her under-13 team on a rutted, sand-scarred field behind an elementary school.

"The Code of the Street: Hustling for Status", WaPo, 12/20/2006:

James, or "A.J." as he is known on the street, has spent his adult life "in the game," earning many scars and building a long court record.


Posted by Mark Liberman at 10:04 AM

Rice v. Mair

Geoff Pullum has been making some headway against the Eskimo snow-words trope, and now the WaPo has quoted Victor Mair about how crisis is not really danger + opportunity in Chinese (Glenn Kessler, "Rice Highlights Opportunities After Setbacks On Mideast Trip", Washington Post, 1/19/2007):

At one point, Rice said that the difficult circumstances in the Middle East could represent opportunity. "I don't read Chinese but I am told that the Chinese character for crisis is wei-ji, which means both danger and opportunity," she said in Riyadh. "And I think that states it very well. We'll try to maximize the opportunity."

But Victor H. Mair, a professor of Chinese at the University of Pennsylvania, has written on the Web site http://pinyin.info, a guide to the Chinese language, that "a whole industry of pundits and therapists has grown up around this one grossly inaccurate formulation." He said the character "ji" actually means "incipient moment" or a "crucial point." Thus, he said, a wei-ji "is indeed a genuine crisis, a dangerous moment, a time when things start to go awry."

The WaPo doesn't give you a direct link to Victor's discussion on the pinyin.info site, but we will: "danger + opportunity ≠ crisis: How a misunderstanding about Chinese characters has led many astray". And we'll also reveal a speculation about where Condi got this comforting bit of rhetoric. This comes from a friend of Victor's, in response to the Kessler reference:

"I''ll bet Condi got WEI JI from former U.S. Secty. of State George Schultz at the same seminar I sat in about 17 years ago at the Hoover Inst. This was also the first time I heard the "Crisis-Opportunity" version of WEIJI -- from George Schultz himself. Condi was then a Hoover fellow, and she was in the room."

(If George Schultz's talk was in 1990, he might well have been talking about Saddam Hussein's invasion of Kuwait.)

We've blogged about this rhetorical point a couple of times, both on the specific issue ("Crisis ≠ Danger + Opportunity", 4/29/2005; "Hollywood glamour, activist passion, false rhetoric", 4/24/2006) and on the more general theme of (often false) etymology as argument:

"Etymology as argument", 6/18/2005
"Etymologyas argument again", 6/19/2005
"(Hallucinatory) etymology as argument", 7/11/2005
"Minorities as legal minors?", 7/19/2005
"Ayn Rand, linguist?", 3/15/2006

In the case under discussion, though, Glenn Kessler didn't bring Victor into the article just make a linguistic point by debunking the false analysis of 危機 (which of course is two characters, not one as Condi is quoted as suggesting). Instead, Kessler uses Victor's linguistic explanation to make a political point: perhaps the current Middle East situation is not a dangerous opportunity, but simply "a dangerous moment, a time when things start to go awry".

Condi's interpretation is the sentimental favorite: let's hope that her analysis of diplomatic opportunities is better than her analysis of Chinese compound words.

[Update -- David Denison writes:

Does the stuff about the Chinese word for crisis have anything to do with a very noticeable semantic change *purely within English* of the word crisis? (I've got nothing at all to add about the Chinese.) My point is merely that crisis used to mean "A vitally important or decisive stage in the progress of anything; a turning-point" (part of OED definition in sense 3), a figurative extension of the medical sense (sense 1), "the turning-point of a disease for better or worse". A crisis was therefore something momentary, almost punctual in linguistic terms, and it carried no value judgement. You could argue that a crisis (in this sense) represents both an opportunity and a danger, but it doesn't actually MEAN either of those things.

Many people now use the word in a sense that has been further extended -- as the OED puts it: "now applied esp. to times of difficulty, insecurity, and suspense in politics or commerce" (the continuation of the definition under sense 3). In this usage, it's not punctual at all and it is value-laden: crises are bad.

Your quotation from Professor Mair suggests that somebody might have consulted a Chinese-English dictionary which used the English word crisis in its older sense, and that at some stage, then or subsequently, the entry was interpreted wrongly with the current sense in mind.

Perhaps the historical development of the negative affect associated with English crisis is part of the story, but the central linguistic issue is that the association of the positive-affect English translation opportunity with Chinese JI -- in the context of the compound WEIJI -- is false.]

Posted by Mark Liberman at 08:53 AM

January 26, 2007

The tangled history of a mangled maxim

A couple of days ago, I noted a noble but unparsable sentiment in President Bush's State of the Union speech ("Our work in the world is also based on a timeless truth: To whom much is given, much is required"), and I reprinted some discussion with Barbara Partee and Geoff Pullum about an earlier and similarly mistaken reproduction of Luke 12:48 as one of two "simple values" listed by the Gates Foundation ("To whom much has been given, much is expected"). Jan Freeman referred me by email to her comments on an equally incoherent 1997 version by JFK Jr. ("To whom much is given, much is expected, right?")

This made me curious about the social and linguistic history of the many versions of this quotation.

Let's work backwards from the summer of 1997. John F. Kennedy Jr., then the publisher of George magazine, published an editorial with a personal sting in the tail:

"Two members of my family chased an idealised alternative to their life. One [Rep. Joe Kennedy] left behind an embittered wife, and another [Michael Kennedy], in what looked to be a hedge against mortality, fell in love with youth and surrendered his judgment in the process. Both became poster boys for bad behaviour. Perhaps they deserved it. Perhaps they should have known better. To whom much is given, much is expected, right?" [emphasis added here and throughout]

The family criticism caused an enormous fuss. Nadine Brozan ( "Chronicle", NYT 8/12/1997) picked up some juicy backbiting, including a nice variation on the "Ask not..." trope:

JOHN F. KENNEDY JR. has broken the Kennedy family tradition of solidarity no matter what, by openly criticizing two of his cousins, Representative JOSEPH P. KENNEDY 2D and his brother MICHAEL KENNEDY, in the September issue of George, the political magazine that he edits. Yesterday, Representative Kennedy struck back.

''I guess my first reaction was 'Ask not what you can do for your cousin but what you can do for his magazine,' '' Representative Kennedy said in Chelsea, Mass., according to The Associated Press.

And MoDo impersonated John-John to pieces, in her all-too-imitable fashion (Maureen Dowd, "Letter from the Hunk", NYT 8/13/1997). But as far as I can tell, it was only Jan Freeman who pointed out that his envoi was ungrammatical ("Hunk flunks a writing assignment", Boston Globe, 8/17/1997).

... a couple of particular blunders emerge from the haze with hideous clarity.

First, there's the Quayle-like mangling of a familiar quotation: "To whom much is given, much is expected, right?" asks JFK Jr. Aiming higher than Quayle did, John-John is misquoting Jesus; but you don't have to know the New Testament, or be a pedant, to notice that something is wrong here. "To whom much is given, from him much is expected" is the least this sentence needs to stand on its own.

(The Bible does it better, of course: "For unto whomsoever much is given, of him shall be much required," reads Luke 12:48 in the King James Version.)

William Safire was among the missing, instead (8/17/1997) taking "responsibility for the first [grammatical] mistake made by an earthling on an extraterrestrial body" ["1969 AD" instead of "AD 1969" on the Apollo 11 plaque], and telling us (9/3/1997) that the word paparazzi "perhaps meaning 'waste paper,' was formed from Signore Paparazzo, a sidewalk photographer in Federico Fellini's 1960 'La Dolce Vita'".

What Bill might have pointed out -- as Jan did -- is that the expression of this sentiment in English needs two pairs of verbs and prepositions. However you decide to connect everything up, somewhere in there you need to tell us that much is expected from people, when much is given to them.

Now, you can tell us instead that much is expected of them, or that much shall verily be required from them, or that much has been given unto them. But you wouldn't want to leave out the preposition connecting expect or require with those from whom things are expected or required, and tell us (as President Bush in effect did) that much is required people when much is given to them; or (as the Gates Foundation in effect does) that much is expected people when much has been given to them; or (as JFK Jr. in effect did) that much is expected people when much is given to them.

None of the (ghost-) writers of the mangled maxims would have been fooled for an instant by those straightforward versions. So why did they compose and accept the equally faulty (if more biblically resonant) versions that they used? There are some speculations in my earlier post, but frankly, I don't know.

However, one possibility is that they didn't compose the mangled versions at all, but just accepted them from another source. And a possible source is suggested by a news story from 1994, where a similarly-mangled version of Luke was attributed to Ethel Kennedy, via Robert F. Kennedy Jr. (Bob Morris, "The Night; When the Stars Come Out", NYT, 5/29/1994):

There was even Robert F. Kennedy Jr., who received the Cool Water Environmental Achievement Award and invoked Saint Paul's admonition, as taught to him by his mother, Ethel Kennedy, who was at his table. "To whom much is given, much is expected," he said in support of actors getting into politics, Ronald Reagan notwithstanding.

And the blame for (a differently mangled version of) the quote was assigned to Rose Kennedy, in a (2005?) story about Christopher Kennedy Lawford by Ronald Sklar:

The family matriarch, Rose Kennedy, had once said, “To those to whom much is given, much is expected.”

Cherchez la mom. Or maybe not -- President John F. Kennedy used a coherent version of the quote in a speech in 1961: "Of those to whom much is given, much is required". Then again, he had Pierre Salinger to check his speeches.

[By the way, on July 20, 1999, on the occasional of JFK Jr.'s death, Senator Tom Daschle attributed yet another different mangling of this quote to President Kennedy (according to the Congressional Record):

I do know that John F. Kennedy, Jr. believed deeply in public service. He believed what his father had said: "to those whom much is given, much is required.''

This corresponds to saying "much is required to those whom much is given", which I doubt that Senator Daschle would have approved.]

So it's clear that this quote has circulated actively through at least three generations of Kennedys, perhaps in both coherent and incoherent forms. We can't tell whether Ethel and Rose actually used the confused versions, or whether they've been betrayed by the faulty memory of their offspring, more recent politicians, and/or the fourth estate.

But even if Rose Kennedy and Ethel Kennedy inculcated an incoherent version of this quotation -- which I doubt, though I have no evidence either way -- they were surely not its (only) originators.

Back in 1986, the version used by President Bush was printed in a quote from a non-Kennedy (Chrystal Nix, "The Shuttle Inquiry: National Mourning Continues; New York Honors the Challenger's 7", 2/3/1986):

Comdr. Hugh Wolcott of the Navy, who knew Commander Smith for 23 years, said: ''Great challenges require great risks. Great advances must almost unavoidably spring from setbacks.

''Mike knew this and lived quietly with it each day. He knew that to whom much is given, much is required.''

And similarly in 1984 (Joyce Purnick, "Alvarado violated ethics rules, City Investigation Dept. charges", 3/23/1984):

The Commissioner, in remarks he made before taking questions, also invoked the Bible. ''As the Scriptures tell us, 'To whom much is given much is expected.' The Chancellor of the New York City Board of Education is one to whom much is given, as are others in high public office,'' he said, in a crowded room at 130 John Street.

As usual, we don't know whether this is what Wolcott and the Commissioner actually said, or whether it's just what the reporter misquoted them as saying -- I reckon the chances are about even either way -- but in any case, it's starting to look like variably-mangled versions of this maxim spring up spontaneously all over the place.

A look at the ProQuest American Periodicals Series database confirms that this has been happening for almost 200 years. The APS has 315 hits for "whom much is given". I only had to look at four of these, working backwards from 1915, to find a botched version (The Rev. William Barnes Lower, "The Suburban Church Problem", New York Observer and Chronicle, Mar. 2, 1911):

The wealthy, naturally, take life easy, hence take religion easy. They follow the line of least resistance in religious matters. Said a banker to me but a short time ago, when I inquired as to his absence from church: "I have nothing against the church; my only excuse is, I am following the line of least resistance." The wealthy Christian so often forgets the oft-repeated words of the Master, "To whom much is given, much shall be required."

The Rev. Lower repeats this version tirelessly, for example "Bible Study for School and Home', New York Observer and Chronicle, Aug. 11, 1910:

Greatness is to be measured by service. Florence Nightingale moved other women most when she herself went to minister on battlefields. To whom much is given much shall be required. The greater men are in intellect and culture the more imperative it is that they become helpers.

And "The Sin of Nadab and Abihu", New York Observer and Chronicle, Aug. 1, 1907:

The higher the position you occupy the more grievous is the sin and more extensive will be the mischief wrought by it. [...] To whom much is given much shall be required.

Several other incoherent variants are found in even earlier articles by other authors, for example "To whom much is given, much will be required"; and "Unto whom much is given, much will be required". The last one can be found in an article by JC Brigham, "Mr. Brigham's Report Respecting the Religious State of Spanish America", The Missionary Herald, Nov. 1826.

And the LION database turns up George Henry Boker's The Lesson of Life, 1848:

517 "But woe to you who love the gilded cage,
518 Who pander basely to the present hour,
519 Who build not on that firm foundation, Truth!
526 Who seek, with untaught power of mighty verse,
527 To lure their weaker brothers far astray;
528 Or praise their blinded errings. Each one knows,
529 Within his heart, himself a hypocrite;
530 Sees the sad tears the ravished muses shed
531 O'er their undoing; hears a potent voice
532 Thunder within his hollow soul---"Thou Traitor!
533 Unto whom much is given, much is required."
534 How back in horror draws the shuddering mind
535 When pondering the fate of erring genius!

It's also worth noting that the coherent versions are extremely diverse. A few relatively recent ones from the APS:

Henry Sloane Coffin, "Present-Day Life as as Live Preachers See it", New York Observer and Chronicle, May 26, 1910:

... while God's Christlike purpose omits no one, it is His method to use one individual to reach others, and one nation to fulfill His world'plan. He admits Abraham to His friendship that He may make him a blessing to many, the first of a nation of friends, He endows Israel with special religious instincts that it may be His servant people in spreading faith, Greece with esthetic faculties htat it may be His minister of beauty, Nineteenth Century Germny with the philosophic spirit that it may lead the world into larger thought; Nineteenth Century America with fuller freedom that it may be a refuge for the oppressed and afford richer life to millions from Europe. God's gifts are never meant to be monopolized by their recipients, nor His choices to be considered as privileges to be selfishly enjoyed. To whom much is given of him shall much be required.

"Thanksgiving, 1914", The Youth's Companion, Nov. 26, 1914.

We have a warrant, it seems, to be thankful that this year we are the most fortunate people in the world, the least endangered, the least distressed.

And that position implies a very serious responsibility. To whom much is given, from them much shall be required.

Miss Lillian Johnson, "The Broken Things in Life", Herald of Gospel Liberty, Apr. 29, 1915:

Our natures are different. To some their moral carerr weems almost an even tenor of goodness; its fair Elysian fields are never stained with the blood of battle; its quiet peace is hardly broken by the noise of tumult or rebellion. Such even-tempered natures have the more energy to spare for positive virtue -- for he to whom much is given, of him is much required.

I read 25 of the 315 examples, about half from the beginning and half from the end of the time period covered by the archive. There were 5 incoherent versions:

To whom much is given, much shall be required. (3)
To whom much is given, much will be required.
Unto whom much is given, much will be required.

20 versions were coherent, at least in the sense that each verb has the preposition it needs:

To whom much is given, of him much will be exacted.
To whom much is given, of him much will be required.
To whom much is given, of him much is required.
To whom much is given, of him indeed much is required.
To whom much is given, of him shall much be required. (4)
To whom much is given, of them much is required.
To whom much is given, of them much will be required.
To whom much is given, from him much will be required.
To whom much is given, from them much shall be required.
Unto whom much is given, from him shall much be required.
Unto whom much is given, of them there will be much required.
From him to whom much is given, much is due.
From him to whom much is given, much will be expected.
Of him to whom much is given, much will be required.
Of them, to whom much is given, much shall be required.
Of those, to whom much is given, much will be required.
He to whom much is given, of him is much required.

Exercise for the reader: construct a simple automaton for generating the rest of the set implied by this sample (ignoring interpolated verily and so forth). What is the size of the implicit set of variants?

If you want to spread the net a bit wider, a search for "to whom much has been given" returned 46 hits, equally diverse and in many cases, equally tangled (though parsable), for example:

Much is required from those to whom much has been given.
We Presbyterians are of those servants of God to whom much has been given -- numbers, social standing, intelligence, means and money. Of such much is required.
Much will yet be demanded from those to whom much has been given.
Mrs. Gurley is one of those to whom much has been given and from whom much is received.
...to whom much has been given, of him also will be much required...
To whom much has been given, from him much is rightfully expected.

Some of this variation comes from variation in bible translations, but much of it seems to come from creative memory or intentional rephrasing.

The sentiment has played an important role, over the past 200 years, in the development of a sense of social consience among American elites.

For comparison, here are some of the versions of Luke 12:48 from various bible translations:

From everyone who has been given much, much will be demanded
From everyone who has been given much, much will be required
To each man to whom much is given, much shall be asked of him
To every one to whom much was given, much shall be required from him
To whomsoever much is given, of him shall much be required
To whomsoever much has been given, from him much will be required
Unto whomsoever much is given, of him shall be much required
Everyone to whom much is given, from him much will be required
Everyone to whom much is given, of him shall much be required
When someone has been given much, much will be required in return
Much is required from those to whom much is given
If God has been generous with you, he will expect you to serve him well
Great gifts mean great responsibilities

Any way you say it, it's true. Well, at least it ought to be.

Posted by Mark Liberman at 08:24 AM

ONW and Legalese

The other day Arnold Zwicky told us about Strunk's ONW ("omit needless words") injunction to writers. That reminded me of the late great David Mellinkoff, who wrote eloquently about legal language (in The Language of the Law, 1963; Legal Writing: Sense and Nonsense, 1983; and elsewhere)---observing, for instance, that the law thrives on gobbledygook.

In Legal Writing: Sense and Nonsense he pleaded for clear and unadorned language with no superfluous verbiage. My favorite example was his analysis of Richard Nixon's resignation letter, August 9, 1974. It's only one sentence long: "I hereby resign the Office of President of the United States." But, as Mellinkoff points out, it's still much longer than it needs to be: all he needed to say was "I resign."

(I don't know why I started this: I have a lot of work to do this morning, and I just noticed a lot of unnecessary words above. I am going to get a lot of sneers around Language Log Plaza today.)

Posted by Sally Thomason at 07:52 AM

Army scammer? Let's check the grammar

I've received my first Nigerian-style dubious-money-transfer-scheme-assistance solicitation email purporting to come not from an African widow but from a fellow countryman: a soldier serving as an attaché in Iraq. "Hello Pal, he says very informally (too much so?): "My name is Sgt. Jarvis Reeves Jr. I am a military attache with the Engineering unit here in Ba'qubah Iraq for the United States, we have about $14 Million dollars that we want to move out of the country." Always ready to help our brave men and women in uniform, Sgt. Reeves; but first we're going to do a little grammatical analysis on your message to make sure you're not just another illiterate foreign scammer.

Here's the full text of the message:

From jmrjr009@bellsouth.net Tue Jan 23 11:27:55 2007
Delivery-date: Tue, 23 Jan 2007 11:27:55 -0800
X-Mailer: Openwave WebEngine, version (webedge20-101-1106-101-20040924)
X-Originating-IP: []
From: Jarvis Reeves Jr
Reply-To: jmrjr008@yahoo.dk
Subject: can I trust you?
Date: Tue, 23 Jan 2007 14:25:34 -0500
Message-Id: <20070123192534.KLBQ26200.ibm60aec.bellsouth.net@mail.bellsouth.net>

Hello Pal,

I need of your assistance. My name is Sgt. Jarvis Reeves Jr. I am a military attache with the Engineering unit here in Ba'qubah Iraq for the United States, we have about $14 Million dollars that we want to move out of the country.

My partners and I need a good partner someone we can trust to actualize this venture.The money is from oil proceeds and legal.But we are moving it through diplomatic means to your house directly or a safe and secured location of your choice using diplomatic courier services.

But can we trust you? Once the funds get to you, you take your 30% out and keep our own 70%. Your own part of this deal is to find a safe place where the funds can be sent to. Our own part is sending it to you.

If you are interested I will furnish you with more details. Awaiting your urgent response.

Your Buddy.
Sgt Jarvis Reeves

God Bless America!!!!!!

God bless America indeed, with six exclamation marks to add to her 50 stars and 13 bars. But, Jarvis (may I call you Jarvis?), I think we have a problem. Well, quite a few problems, actually. Let's count them up (I won't bother to try and segregate the grammatical points from lexical or punctuational or other suspicious errors):

  1. An attaché is an officer working in an embassy, not a sergeant serving in an Army unit.
  2. You're with "the engineering unit"? That is not the way Army engineers talk. They'd name a battalion or a company or something. There isn't just one engineering unit.
  3. Why are you mailing from Yahoo in Denmark (jmrjr008@yahoo.dk) if you're a sergeant serving with my country's military in Iraq? R&R in Copenhagen? Who's looking after the money?
  4. Pal should not be capitalized.
  5. I need of your assistance is ungrammatical.
  6. we have about $14 Million dollars is the start of a bad run-on sentence.
  7. Million should not be capitalized.
  8. The word dollars is redundant since you have "$".
  9. The phrase need a good partner someone we can trust is an even worse run-on (not even a comma).
  10. The sequence "legal.But" should have a space. Omitting space at places where there is a parenthesis or a punctuation mark is the hallmark of a person with too little education to rise above the rank of private. You have several other occurrences of the same thing.
  11. The money is "from oil proceeds and legal"? Even allowing that oddly disjointed coordination of source and status descriptions, how did a sergeant come to have access to legally earned money from an oil company's bank account?
  12. Moving the money "through diplomatic means"? There is something slightly wrong about through (with which one expects a path) and means (which connotes an instrument): by would be more normal here. And are you suggesting that as an Army sergeant you get access to the diplomatic bag? Wouldn't your normal means be something more like an Army post office?
  13. The phrase to your house directly or a safe and secured location of your choice is not grammatical (the adjunct directly modifies the verb moving but illicitly interrupts the coordinate structure to your house or a safe and secured location of your choice).
  14. The phrase you take your 30% out and keep our own 70% is ill-phrased: it sounds like I keep all the money. You mean "hold for our collection" rather than "keep".
  15. Awaiting your urgent response is an ugly ill-placed modifier: I think you mean Urgently awaiting your response.
  16. The period after Your Buddy is wrong: you don't use a period after the "Yours" line before the signature.
  17. Buddy should not be capitalized.
  18. Both Hello Pal and Your Buddy sound like Army usage as imagined by someone who has seen Second World War movies and Hogan's Heroes episodes but has not interacted with the military. Real soldiers tend to be rather formal when addressing people they don't know; they say "Sir" rather than "Pal".

So, let's face it: you're not really U.S. Army at all, are you, "Sergeant Jarvis Reeves Jr."? In the original version of this post I suggested that perhaps you were a Danish hacker who didn't learn his grammar well enough to be a convincing spam-scammer, but then someone showed me* that showmyip.com would permit me to check on you by the very simple device of pointing my browser at http://www.showmyip.com/?ip= And guess what: you're actually in Nigeria, Jarvis! With a Danish zombie address (John Cowan reports to me that it only took him a few tries and he had one too; it is free, it is easy to sign up, and to start sending an email from the new account you hit the button marked Skriv).

Well, Jarvis, now your email address has been shown on Language Log, and every Language Log reader in the world is going to email you and pretend they want to help your transfer the money. Only Language Log readers are smart, and they won't send the advance fees you will tell them they need to pay. They will just play you like a fish on a line and waste your time.

*Thanks to Hao Wang for the tip. Thanks also to John Cowan, Michael Andresen, and Jim Gordon.

Posted by Geoffrey K. Pullum at 01:10 AM

January 25, 2007

The importance of scoring

Dr. Holbrook Gerund, Nike Professor of Linguistics at Vine-Covered University, explains an obscure scholarly point about Sports Swedish to Tank McNamara:

Comics don't have footnotes, so I'll step in to provide the key reference to the seminal work of Yalt (1970), originally published in Cleese et al., Episode 25.

I'll add a slightly more serious observation. When a coach describes his game plan by saying "We're going to try to outscore them", or when an analyst says about (say) the Chicago Bears that "They win by outscoring their opponent", this is not as witless as it might seem to some. (These are both direct quotes from articles currently indexed by Google News.)

Unlike Professor Gerund, I'm ignorant of Sports Swedish, but I know that in Sports English, the verb outscore often means "defeat <someone> by focusing on the offensive rather than defensive aspects of the game".

Some examples from the current Google News index:

"We never try to outscore anybody," Collegiate coach Chance Lindley said. "We always try to outdefend them."
“We don’t try to outscore people, that’s not what we’re about. We won 17 games off of defense, and we lost one game off trying to score more points.”
"I don't think you can win in this league thinking you can outscore everybody," Barone said. "You have to guard."
“When you think of good teams, you think of good defenses,” Gates said. “It's very seldom you have a St. Louis where you just try to outscore everybody."

In general, Sports English is under-represented in dictionaries. The word outscore is missing altogether from the AHD and Encarta, and this sense is missing from the OED and M-W's 3rd, which give only the glosses "to surpass (another) in scoring points; to score more than (another player or team)" and "to score more points than", respectively.

Note, though, that this meaning of "winning by focus on offense" is not just a lexical fact about outscore -- it's a general (and natural) extension of the concept of registering a higher score than the opponent, however that might be expressed. Thus

When it came to discussion about the actual game, Grossman similarly disliked a suggestion that the Bears might employ a ball-control, clock-eating attack to keep the high-octane Saints offense sidelined.
"I'm looking to score more points than they do," Grossman said. "I'm not looking to hold the ball. That's just my perspective right now on Wednesday. That just seems to make sense to me."

But this was the first shootout, marching up and down the field, the first time USC had to win just by putting more points on the board than the other guy.

It is admirable that coach Scott has the ability to motivate and teach his players to play great team defense. However, the object of the game is to put more points on the board than your opponent!

[Update -- Daniel Ezra Johnson points out that some comics do have footnotes...]

Posted by Mark Liberman at 07:43 AM

January 24, 2007


That's short for "omit needless words", Will Strunk's famous injunction to writers.  Until today, I hadn't realized there was an ONW poem, but there is: the preface to Maurice Sagoff's book ShrinkLits is the following "condensed" version of The Elements of Style:

"Omit needless words!"
Said Strunk to White.
"You're right,"
Said White,
"That's nice
But Strunk,
You're drunk
With words --
Of those
You chose
For that
Would fill
The bill!

Would not
The thought
-- The core --
Be more
If shrinked
(Or shrunk)?"

Said Strunk:
"Good grief!
I'm brief
(I thought)
P'raps not ...
Dear me!
Let's see ...
Just say
'Write tight!'
No fat
in that!"

"Quite right!"
Said White,
"Er -- I mean 'Quite!'
Or, simply, 'Right!' "

(Thanks to Rowyn McDonald for supplying me with the poem.)

A somewhat different, but also entertaining, take on ONW came from Geoff Pullum last July, when he wrote me:

I offer you a Goedelian paradox... Consider the advice, "Omit needless words".  Why not omit the word "needless" here?  You should, if the word "needless" is needless in this context.  Is it?  Well, if it is needless, you should omit it, and say simply, "Omit words".  But then you have not given the right advice about which words to omit.  So the word "needless" must be needed.  Why is it needed?  The sentence "Omit words" does make sense.  The problem is that it clearly gives bad advice unless it is understood in terms of omitting words that are NOT NEEDED.  But if it HAS TO be understood thus, then "needless" is predictable in this context, just from common sense.  Therefore "needless" should be omitted.  But then you have not given the right advice about which words to omit.  So the word "needless" must be needed...  Do you get a sense of where I'm going with this?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 04:25 PM

Tired food

I was puzzled about why Babel Fish translated the Dutch phrase "Harm Beukers, hoogleraar geschiedenis van de geneeskunde" ("Harm Beukers, professor of the history of medicine") as "Harm tired cherry, hoogleraar history of medicine". So was Ruud Visser, and so he (and others commenting on his blog) investigated and solved the problem.

It seems that the Fish is confident enough to split Beukers into

beu (ik ben het beu) = tired
kers = cherry

which (I gather) Dutch speakers do not find to be a plausible decomposition. On the other hand, hoogleraar (= "professor") is not in the Fish's lexicon, although it's quite a common Dutch word, with 2.8M Google hits; and the Fish is also unwilling to split it into the parts hoog (= "high") and leraar (= "teacher"), though these are commoner in Dutch than beu and kers.

Ruud observes that

The “tired cherry” pattern also holds for other fruits, including those with more than one syllable: beupeer (pear), beuappel (apple), beubanaan (banana), beumandarijn (mandarin) and even beusinaasappel (orange) are all translated as tired X. Don’t like fruits? Babel Fish provides tired vegetables as well, like beusla (lettuce) and beuwortel (carrot). That goes with a beubiefstuk (steak) and some beuaardappelen (potatoes); beupatat (fries/chips) is not on the menu, unfortunately. All of this is served by be(a)utiful, though somewhat weary, beumannen (men) and beuvrouwen (women) in your local beurestaurant.

The fact that hoogleraar is missing, and that Beukers is (unwisely) split while hoogleraar is not, means that the Babel Fish Dutch/English system was not constructed with adequate attention to lexeme frequency, not even in the obvious first-order sense of checking the translation dictionary against a frequency-ordered list of word forms.

A Dutch-language news search at news.google.nl gives these counts:


I mean, even without going all the way to a statistical MT system (which would require more bilingual text than might easily be available), at least you could make common-sense use of first-order word frequencies in populating your lexicon. Digital texts and word-frequency lists in Dutch have been available for a long time.

[Update -- Bertil, commenting on Ruud's site, explains the real etymology of Beukers -- "cod-beater", roughly:

The site of the Meertens Instituut says that the name Beukers is related to a profession: http://www.meertens.knaw.nl/nfd/detail_naam.php?naam=Beuker.

My 1970 Dikke Van Dale writes that "beuken" means to hit the stockfish until it becomes soft. Apparently, it used to be someone's profession to do this all day.


[Update -- Tako Schotanus wrote to clarify that the decomposition of beukers into beu+kers is not just semantically implausible in Dutch, but also seems to him to be morphologically impossible as well, roughly as if Winston Churchill were to be translated as "victorious weight chapel sick".]

Posted by Mark Liberman at 03:39 PM

Get your knock-out germs here

In the aftermath of my recent post on how to thaw a jacket, Michael Andresen sent me his own candidate for the mysterious commercial output of the month. He's a satisfied customer at his local JungaJuice outlet but he also looks carefully at the menu. The JambaJuice menu he sent me is headed by an offering called Coldbuster®:

knock-out germs with a tangy mix of fresh squeezed oj, peaches, bananas, orange sherbet, vibrant c boost, ice

Now it's unclear why JambaJuice would want to add a dollop of powerful germs to an otherwise nourishing drink. And of course they wouldn't. The problem is that the menu writers somehow got topedoed by their mistaken notion about the function and use of a hyphen. They probably wanted to relate the parts of the compound word to each other, the way we distinguish "new-car salesman" from "new car-saleman." And they did this pretty well in the rest of the menu:

as adjective modifiers such as creamsicle-like mix, protein-packed delight, protein-rich mix, apple-strawberry juice, passionfruit-mango juice, all-natural orange juice, nutrient-rich orange juice, and as proper nouns such as Orange-A-Peel and Mango-A-Go-Go.

The only problem is with those nasty little "knock-out germs" that they seem to have added to the Coldbuster®.

Posted by Roger Shuy at 12:59 PM

State of the Union word count tools

See this New York Times page for tools to analyze last night's State of the Union address (and previous ones by this president) on the basis of the frequency with which particular words are used. From the examples given on the page one cannot tell whether it is lexemes or word forms that they count, but a small amount of testing with the search facility soon reveals that it is word forms, indeed, simply character strings, because those are easy to count; so there are separate counts for hero and heroes, for example, rather than a figure for the lexeme hero that embraces both. But if you want to study (say) insure, you can get some of the way by looking for insur, which is matched by insures, insuring, etc., though it will also be matched by insurance (a different lexeme).

Posted by Geoffrey K. Pullum at 11:50 AM

Ungrammatical timeless truths

Pat Schwieterman wrote:

While listening to the President's State of the Union Address tonight, I was struck by this sentence: "Our work in the world is also based on a timeless truth: To whom much is given, much is required." I instinctively want to expand that timeless truth to something like this: "Much is required to those to whom much is given." But that sounds ungrammatical. The obvious problem here is that "required" isn't accompanied by "of" or "from," so my native speaker instinct starts looking for a parallel to the "to whom much is given" construction – and that doesn't fly as far as I can tell.

At first I thought that a speech-writer had decided to signal Bush's commitment to bipartisanship by quoting a speech by John F. Kennedy; under great time pressure, they hadn't checked to make sure that the allusion was grammatically compatible with its new environment. Here's what Kennedy said in a speech before the Joint Convention of the General Court of the Commonwealth of Massachusetts on January 9, 1961: "For of those to whom much is given, much is required."

But Kennedy needn't have been the source; a bit more Googling showed that he was apparently alluding to Luke 12.48: "From everyone to whom much has been given, much will be required; and from the one to whom much has been entrusted, even more will be demanded." (Oxford NRSV Bible)

I believe that the proximate source is probably the Gates Foundation, which presents a similarly abbreviated version of the same quotation on its web site:

There are two simple values that lie at the core of the foundation’s work:
* All lives—no matter where they are being led—have equal value.
* To whom much has been given, much is expected.

Barbara Partee, Geoff Pullum and I discussed this in email about a month ago. Barbara wrote:

I've just learned from the Sunday NYT Magazine (the cover article) that one of the "simple values" listed on the Gates Foundation website is the following:

"To whom much has been given, much is expected."

I'm writing not to knock it -- I think it's nice, it's certainly quotable and "sounds like" a good aphorism, and I don't know any equally nice way to say it that I would consider actually grammatical.

I'm writing more as a challenge -- how do we explain it to ourselves, and how might we defend it to the prescriptivists who will surely notice it too?

My own first reactions were just funny: My first impulse, quickly rejected, was "No, it's "FROM whom much has been given, much is expected." But while that's fine for the 'expected' part, it's no good for the 'given' part -- no one preposition is good for both. So that led to my second impulse: ""From to whom much has been given, much is expected." -- no, garbage! Maybe "From whom much has been given to, much is expected"? Conceivably that one is possible, but it could hardly be uglier, and I'm not sure I can even parse it. Of course you could make the last two clearly grammatical with an added "those" after "from", but that's not anything anyone would want on their website, any more than Winston was inclined to substitute a pedantic "as" for their incorrect but catchy "like".

But how did they get to that formulation, and why, in spite of its evident impossibility, does it "sound" just fine to me, and in fact pleasing (notwithstanding the fact that it did immediately make me sit up and then come running to my computer!)?

My response:

About the quotation -- the original is Luke 12:48, which in the KJV is

But he that knew not, and did commit things worthy of stripes, shall be beaten with few stripes. For unto whomsoever much is given, of him shall be much required: and to whom men have committed much, of him they will ask the more.

The Vulgate has

qui autem non cognovit et fecit digna plagis vapulabit paucis omni autem cui multum datum est multum quaeretur ab eo et cui commendaverunt multum plus petent ab eo

So the critical section would be

cui multum datum est multum quaeretur ab eo
"to whom much has been given, much will be asked of him"

[Update -- Martin Hardcastle supplies the Greek, which is the original version and the source of most English translations:

panti de hôi edothê polu, polu zêtêthêsetai par' autou
"to each one (but) to whom much has been given, much will be required from him

Martin also observes that "The panti here (which appears as omni in the Latin) probably helps to explain why so many of the English versions you quote end up being in the plural". I guess that means I should have started the Latin version with omni autem cui "but to all whom"...]

There's a version attributed to Oliver Wendell Holmes that reads

"Of those to whom much has been given, much is expected."

Those are all grammatical. The Gates foundation version is the Holmes version with the initial "of those" deleted, which makes it ungrammatical.

As to why it nevertheless works well enough to fool the Gates people and the NYT, I guess it's an Escher sentence.

Of course, that's supposing that there isn't some clever parse that is escaping me.

Barbara answered:

...it looks like the KJV is using something like a correlative relative construction, which we don't usually have in English (are you sure you find it grammatical in English, and not by using your knowledge of languages in which it would be?), and I don't know whether to call the Latin Vulgate version a correlative construction or something like a donkey pronoun -- your translation doesn't sound quite like normal English to me, though; and the Holmes one is what to me sounds perfectly grammatical and ordinary but pedantic, so I support the Gateses' choice in rendering it ungrammatical but pleasant-sounding the way they did.

And Geoff commented:

Lovely observation. They are of course caught in a trap trying to avoid preposition stranding (your "From whom much has been given to"is the only roughly grammatical version). But there's one more thing about it that makes it difficult and clumsy: at root, they're trying to do a fused relative construction (absurdly called a "headless relative clause" by some, who have not noticed that it is neither headless nor a clause, and obscurely called a "free relative" by others who are STILL trying to make the decision to get themselves a copy of The Cambridge Grammar of the English Language) with "who" as the head.

Fused relatives with "who" as head are now very rare, almost extinct. Shakespeare's Iago says "Who steals my purse steals trash", but that's Elizabethan; Jespersen cites "Who decides is the wife", but it has a real 100-years-old feel to it; and E. E. Cummings wrote "Who pays any attention to the syntax of things will never wholly kiss you", and also "Always the more beautiful answer who asks the more beautiful question" (both quoted by Ross), but Cummings was a bit weird (he certainly provides examples that those who pay attention to the syntax of things will want to pay attention to). But in general, this has gone: you don't say things like

?*Who Mike spent last night with stole his wallet.

So clauses with "who" as subject are not fused relatives anymore. You have to rise above that to get to this:

?*Who much has been given to will sometimes not admit it.

And fronting the preposition is outright forbidden:

**To whom much has been given will sometimes not admit it.

(It gives you a PP subject, for one thing.) Now try and put the same kind of phrase in another PP:

***We should expect generous charitable giving from whom much has been given to.

And now do a repeat occurrence of the forbidden fronting:

****We should expect generous charitable giving from to whom much has been given.

It's certainly not surprising that the effort to construct the slogan crashed and burned. What's surprising is that the writer didn't actually get sent to jail.

What the writer ended up doing was resorting to a sort of pidgin version, with a whole preposition left out, and a sort of Cummingsy feel to the thing; he just blushed and struggled and didn't pay attention to the syntax of things, and hoped no one who knew anyone at Language Log Plaza would notice. I see he lost his wager with fate.

Geoff promised to blog this, and eventually he probably will, but for now you'll have to make do with the strictly-among-us-linguists version that he sent in email.

Let me just add here that President Bush appeared to recognize the problem in real time, because he stumbled on the key phrase, just after "to whom much is":

There's still a psycholinguistic mystery: why does a plainly and straightforwardly ungrammatical sentence go down so easy for so many people, including Barbara Partee, one of the greatest linguists of the age? I don't think this is a matter of artificial prescriptive grammar differing from the true grammar implicit in general usage (whether formal or vernacular). Calling it an "Escher sentence" at best gives it a name, not an explanation. (And I'm not sure that the same things are going on here as in sentences like "More people have written about this than I have".)

My current guess is that we encounter fused relatives in historical sources -- Shakespeare, some bible translations, and so on -- and we grasp the intended meaning without being able to process the form (in the unreflective psycholinguistic sense, not the draw-the-tree-on-the-blackboard sense). From this experience, we learn that there's a sort of grammatical get-out-of-jail-free card given to high-sounding old-fashioned sentences in which relative clauses serve as noun phrases. Thus if you come across such a sentence, you should figure out what it ought to mean, and not worry too much about how it gets there.

[Turning briefly from form to content, I don't think there's any political -- or theological -- issue lurking in the distinctions between "expected" and "required", or "much has been given" vs. "much is given". But I could be wrong.]

[Update -- according to Jan Freeman in the Boston Globe today ("The Bush-Kennedy Bible", 1/24/2007), the original quote-mangler may have been JFK Jr.:

Maybe President Bush thought he was quoting the Bible (or maybe not) in last night's State of the Union. But actually, he was quoting John F. Kennedy Jr. Back in 1997, in a Word column, I rapped an editor's letter in JFK Jr.'s magazine, George, for its execrable prose, including the syntactically defective misquotation "To whom much is given, much is expected, right?"


[Update 1/26/2007 -- no, it seems (see here) that incoherent versions of this maxim have been appearing in print for almost 200 years...]

Posted by Mark Liberman at 07:19 AM

January 23, 2007

All proverbs are better with lions in them

It's sort of like adding "in bed" to a Chinese fortune cookie fortune...

From "Questionable Content" by Jeph Jacques (#792: "Poor Lion's Almanac"); thanks to Rowyn McDonald for the pointer:

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:30 PM


That is, a Sign (or Signs) of the Apocalypse.

Here's Ted Widmer on the 75th anniversary of The American Scholar (published by Phi Beta Kappa), in that magazine's Winter 2007 issue, p. 33:

Will there be a 100th anniversary?  All indications from the publishing industry suggest that content is declining, paper is vanishing, and erudite sexless commentary is especially suicidal.  Sound bites are shrinking, attention spans narrowing, and public language is degraded 24/7, from the vapid ad slogan to the lying speech to the vowelless text message.  The ultimate paradox of our instantaneous, borderless world may be that we have achieved a perfect system of communication only to discover that we have nothing to say.

Supply here wailing, gnashing of teeth, rending of garments, and clanging of the bells of the Church of the Trope of Decline.  If only people would supply evidence; I truly doubt that ad slogans were less vapid or lying speech less prevalent fifty or a hundred years ago.  Vowelless (or vowel-scanty) text messages I'll grant -- but is this actually a degradation?

In any case, we here at Language Log Plaza are committed to continuing our erudite sexless commentary.  And some other stuff too.  Call us suicidal.

Posted by Arnold Zwicky at 08:24 PM

A collaborative in 1945

Arnold Zwicky has pointed out to me that the Oxford English Dictionary does indeed miss collaborative as a noun, and an anonymous correspondent (who cannot be named because he monkeys with Language Log projects while ostensibly at work) has alerted me to the fact that the word goes back at least to 1945. There is even a Wikipedia article on the Architects' Collaborative:

The Architects' Collaborative (TAC) was an American architectural firm formed by Walter Gropius and seven younger architects in 1945 in Cambridge, Massachusetts. The other partners were Norman C. Fletcher (b. December 8, 1917), Jean B. Fletcher (1915—September 13, 1965), John C. Harkness (b. November 30, 1916), Sarah P. Harkness (b. July 8, 1914), Robert S. McMillan (April 3, 1916—March 14, 2001), Louis A. McMillen (October 21, 1916—May 8, 1998) and Benjamin C. Thompson (July 3, 1918—August 21, 2002). TAC have created many successful projects, and have been well-respected for its broad range of designs. One of TAC's specialties was designing public school buildings.

This should surely have made it into the OED by now. But it hasn't. C'mon, you Oxonian wordanistas! Get on it! Webster's is ahead of you: the entry for collaborative at least says "adjective or noun".

Syntactic note: no, it is not grammatical for Wikipedia to say *TAC have ... been well-respected for its broad range of designs: even in those (mainly British) dialects where TAC have is grammatical (plural agreement for nouns denoting collective groups of humans and similar institutions), you can't then switch to its broad range of designs: either it is a group of people, hence pronominalizable as they, or it is a school in a more abstract sense, pronominalizable as it, but not both in the same sentence. For example, (1) and (2) are both OK, not (3) or (4):

  1. The boat righted itself after the storm, and it was soon back in port.
  2. The boat righted herself after the storm, and she was soon back in port.
  3. *The boat righted itself after the storm, and she was soon back in port.
  4. *The boat righted herself after the storm, and it was soon back in port.

But that's OK; Wikipedia is a voluntary association, and we should cut it some grammatical slack.

Posted by Geoffrey K. Pullum at 07:14 PM

Het Woordenboek der Nederlandsche Taal op internet

You read it here first, via a note from Ruud Visser in Leiden:

Let me start with a word of thanks to you and all the other writers at Language Log. I discovered LL in November of last year and I was hooked right away.

I saw a news report just now that I thought might interest some people at Language Log Plaza. The "Woordenboek der Nederlandsche Taal" (WNT), or the "Dictionary of the Dutch Language", will become freely available on the internet on January 27. The WNT is the Dutch equivalent of the Oxford English Dictionary, containing some 400,000 words on almost 50,000 pages of print. The first part was published in 1864, the last one in 1998. Three appendices were added in 2001, mostly covering words originated in the 20th century.

The WNT is based on sources dating back to 1500. The online edition will contain all words in two spelling schemes: the original one from 1863 and the current, modern one. The 1.7 million source quotes used in the printed version will also be available and searchable.

The WNT can be found at http://wnt.inl.nl/.

Ruud continues:

The news report in question is in this week's newsletter of the Leiden University "Grootste woordenboek ter wereld gaat online", 1/23/2007 [I think this means "Biggest dictionary in the world goes online" -- myl]

Unfortunately I haven't been able to find an English story on this anywhere, but if you are interested, I'd be happy to translate. (There is an English Wikipedia page on the WNT, but it says little more than what I just did.) The article ends with a size comparison between the WNT, the OED, the Deutsches Wörterbuch and the Dai Kan-Wa jiten (a Chinese-Japanese dictionary). For such a relatively small language, Dutch has a pretty big dictionary.

I have no doubt that you Language Log readers can all read Dutch, by Roman Jakobson's method if in no other way. If you'd like to try it, here's how the Leiden University newsletter starts:

Met ingang van zaterdag 27 januari is het Woordenboek der Nederlandsche Taal (WNT) voor iedereen gratis op het internet te raadplegen. Is dit nieuws alleen van belang voor neerlandici, filologen en taalkundigen? ‘Magnifiek’, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde.

You could try Babel Fish, which renders the lede as

As of Saturday 27 January the dictionary is for free consult language (WNT) for everyone of the Nederlandsche on the Internet. Is this news important only for neerlandici, philologists and linguists? ` magnificently, react Harm tired cherry, hoogleraar history of medicine.

Then again, maybe Jakobson's method is better. At least, Professor "Harm tired cherry" is likely to think so.

Roman Jakobson divulged his method, legend has it, to a member of an audience (in New York City) who objected to his proposal to deliver a lecture on Bulgarian poetry in the Bulgarian language.

"But Professor Jakobson, none of us know Bulgarian!"
"You are linguist, no? So listen, and try to understand."

Still, perhaps some would find a translation helpful in focusing their linguistic attention...

[Update -- Jesse Sheidlower writes:

True story: I arrived at the University of Chicago having studied Latin and Greek in high school, but no modern languages. Nonetheless, I was very interested in historical linguistics, and I signed up for Eric Hamp's seminar in Indo-European. The text (such as it was, this being a class with Eric Hamp) was Meillet's _Introduction a l'Etude comparative des langues indo-europeenes_.

After class I nervously approached Prof. Hamp, and said, "Uh, about the Meillet book--I don't actually know French yet." He replied, "Oh, just learn it. It's easy."

How true.]

[Update #2 -- Ruud Visser wrote:

Thanks for sharing my news of the Woordenboek der Nederlandsche Taal going online with the Language Log audience. I see Babelfish is still not quite fluent in Dutch, so I took an extended coffee break this morning and translated the Leiden University newsletter article. I posted it to my weblog (which you already found):


By the way, your translation of the article title, "Grootste woordenboek ter wereld gaat online", was perfect. Dutch isn't as difficult as some people say it is!

Let me also draw our readers' attention to Ruud's investigation of the vagaries of Babel Fish in this case, including especially the mysterious translation of "Beukers" (a name that apparently means "Bashers" if it means anything) as "tired cherry".]

Posted by Mark Liberman at 05:12 PM

Vaux and Cooper, 2nd edition

A while back when we were talking about books on field work I mentioned that a new edition of Bert Vaux and Justin Cooper's Introduction to Linguistic Field Methods was in the works. With the addition of a third co-author, Emily Tucker, it has now appeared under the title Linguistic Field Methods and, amazingly, costs less than the first edition. It is available from Wipf and Stock. ISBN: 1-59752-764-5.

Posted by Bill Poser at 02:28 PM

On the offensive language beat: use vs. mention, avoidance

Yesterday the New York Times reported on the "Grey's Anatomy" flap, in which an actor got into hot water for what he said during a backstage meeting with the press after the Golden Globe award ceremony.  The story begins:

LOS ANGELES, Jan. 21 -- Executives at ABC and its parent, Disney, are mulling the future of the actor Isaiah Washington, a star of the hit series "Grey's Anatomy," after Mr. Washington last week publicly used an anti-gay slur for the second time in roughly three months, a Disney executive said Friday.

Two remarkable things here. 

First, Washington didn't actually use the slur; he mentioned it, in denying (on Monday, January 15) that he had used it on the previous occasion (back in October).  Despite that, some people (including the president of GLAAD, the Gay & Lesbian Alliance Against Defamation) were deeply offended that he had uttered the word at all, and on Thursday Washington issued an apology asserting that the word is so toxic that it shouldn't ever be uttered:

I  apologize to [co-star] T.R. [Knight, the object of the October slur], my colleagues, the fans of the show and especially the lesbian and gay community for using a word that is unacceptable in any context or circumstance.

Second, the Times, in its usual modest fashion, managed to print a story of about 24 column-inches about this word without telling its readers what it was.  Instead, it's referred to as "the remark" and "the slur".

What Washington said last Monday was, according to a (plain-speaking) AP article on Thursday:

"No, I did not call (co-star) T.R. (Knight) a faggot," Washington told reporters. "Never happened, never happened."

The Times version has no direct quotes:

Mr. Washington moved to the microphone and denied that he had ever used the slur to describe Mr. Knight, at the same time repeating the word.  Fellow cast members who were with Mr. Washington appeared shaken, quickly going from jubilant to solemn.

On Wednesday, GLAAD president Neil Giuliano issued a statement in which (according to the AP) he

said he had contacted Washington's representatives in hopes of meeting the actor to discuss "the destructive impact of these kinds of anti-gay slurs."

"Washington's repeated use of it on-set and in the media is simply inexcusable," Giuliano said in the statement.
(GLAAD has a very, very thin skin.  They do not speak for me.)

ABC followed the next day with its own statement, calling Washington's behavior "unacceptable", and Washington issued his apology.

The Times's modesty isn't news, though in this case it's particularly annoying.  What is notable, though, is the assumption that some words are so bad that they can't even be discussed (even to be repudiated), and the claim that faggot is one of those words.

Here on Language Log, where we're willing to discuss anything having to do with language, no words are off-limits.  I've talked about fag (as wielded by Ozzie Guillen), for instance.  I'm even on record (in Out magazine, June 2003) as believing that there's nothing wrong with faggot and fag, in the right contexts, though I'm sure that insulting T.R. Knight is not such a context.

Surely Washington was wrong to let himself be drawn into talking about the October incident (which he apologized for at the time; Knight's response to the incident was to come out of the closet), and I can't imagine what possessed him to deny having insulted Knight then.  But if he was going to issue a denial, the natural way to do it would be to specify the alleged offense.  (Ok, he could have issued a blanket denial, like "I never insulted T.R.")

Still, I can't see that mentioning the word faggot (as I did myself just above, and as the AP story did in quoting Washington) is offensive in itself.

Believing that some words are so intrinsically offensive that they should never be uttered, even to describe their offensiveness or to report on offensive uses, is believing in verbal magic.  We try to steer clear of verbal magic here on Language Log, so we're willing to discuss uses of any word, right up to and including nigger, as in Geoff Pullum's provocatively titled posting, "Nigger, nigger, on the wall".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:19 PM

Too hip for the room

Tank McNamara continues his interview with Dr. Holbrook Gerund, Nike Professor of Linguistics at Vine-Covered University:

My own favorite Joe Morgan quote: “No, I don't want you to draw any conclusion. I want you to listen to what I just said.” But Joe is still far behind Yogi Berra in terms of career (linguistic) hits.

[Update -- Don Porges writes:

I spent quite a few seconds expecting the strip to attack "less than two outs" on the grounds that it should be "fewer than two outs". But not in Sports English, it shouldn't.

Nor in most other sorts.]

Posted by Mark Liberman at 01:30 PM

Whom shall I say [ ___ is calling ]?

Commenting that "a little knowledge is a dangerous thing", Gene Buckley offered me the following example from the New York Times of 1/15/07:

(1) The answer, shaped in the National Security Council, is for the American military to make targets of Iranians whom they believe are fueling attacks, a decision that Mr. Bush made months ago that was disclosed only last week.

It's been a while since we looked at the who/whom thing here at Language Log Plaza, and Buckley's example happens to be #25 on a list I've been adding to over the years, so this might be a good time to dip once again into the murky waters of who and whom.

Here's a version of what I've said to my senior seminar at Stanford (on innovation and spread in the lexicon and syntax) this quarter.  (That's "senior" in the sense of 'fourth-year undergraduate', not in the sense of 'senior citizen'.)

1.  We're interested here in interrogative and relative WHO, with non-possessive forms who and whom; and similarly with WHOEVER, with forms whoever and whomever (references to the former should be taken as covering the second).

2.  I'll call the two forms Form1 and Form2, respectivelyThe ordinary personal pronouns have two corresponding (and roughly comparable) forms: I/me, she/her, he/him, we/us, they/them.  (For YOU and IT, Form1 and Form2 are identical.)

3.  The languages of the world have several strategies for constituent interrogative clauses and for relative clauses (sometimes more than one strategy in the same language).  Two really common ones involve special interrogative and relative pro-elements, either (a) "extracted" from the usual locations of their phrase types (in English, they appear at the beginning of their clauses, and nothing appears in the usual location; sometimes the missing constituent is referred to as a "gap"), or (b) left "in situ", in their usual locations.

English uses extraction almost entirely (extracted material is in bold face; the position of the gap is indicated by an underline; and square brackets set off embedded clauses):

main question: What did you see ___?
embedded question:
object of V: I wonder [ what they saw ___ ].
object of P: I wondered about [ what they saw ___ ].
subject in pseudocleft: [ What they saw ___ ] was a rhinoceros.
ordinary (headed) restrictive relative:
WH relative: The rhinoceros [ which they saw ___ ] was angry.
(gap without extraction: that relative: The rhinoceros [ that they saw ___ ] was angry.)
(gap without extraction: zero relative: The rhinoceros [ they saw ___ ] was angry.)
non-restrictive relative: The rhinoceros, [ which they'd just noticed ___ ], charged them.
"free relative" (headless):
object of V: I noticed [ what(ever) she had ___ in her hand ].
object of P: I looked at [ what(ever) she had ___ in her hand ].
subject: [ What(ever) she had ___ in her hand ] sparkled.

But English also has a few special types of in-situ interrogative clauses, notably in "reclamatory" and "incredulity" questions, where you're asking about the words someone just used (because you didn't quite hear them) or the substance of what they just said (because you can't believe it):
You saw WHAT?!

4.  For interrogative and relative WHO, there is a special relationship between Form2 and the syntactic function of object (the object of a "governor", V or P); for most speakers, Form2 rarely, or never, occurs as the (complete) subject of a finite clause.  On the other hand, for the ordinary personal pronouns, there is a special relationship between Form1 and the syntactic function of subject (of a finite clause); for most speakers, Form 1 rarely, or never, occurs as the (complete) object of V or P.

This asymmetry shows up in another place.  For WHO, Form1 is the all-purpose form, used when the conditions for Form2 are not satisfied.  So if someone says
I saw someone interesting yesterday.
you can ask
but even if you regularly say
Whom did you see?
you can't respond to the other person's assertion with

But for the ordinary personal pronouns, it goes the other way; you get Form2 all over the place.  So, for instance, if someone asks
Who did it, Brad or Janet?
you can reply with
but not with
even though you'd never say
*Him did it.
(There are many more intriguing facts like this.)

This is just a fact of life.  Though they are to some degree parallel, WHO and the ordinary personal pronouns differ in the way Form1 and Form2 are distributed.

5.  Languages that use extraction in interrogatives and relatives can differ in the details.  In particular, they can carry over case-marking from the gap to the extracted position -- the WH pronoun "inherits" the case appropriate to the gap -- or, since questioned and relativized constituents bear a kind of focus, the WH pronouns can bear a case appropriate to this focussing -- either a special case, or a defaulting to an all-purpose case.

English has (had) both systems.  Pure inheritance produces the Prescriptive System:

extracted subj: Form1
Who did you say [ ___ stole the tarts ]?
A extracted obj of V: Form2
Whom did you say [ they saw ___ ]?
B extracted obj of P, P stranded: Form2
Whom did you say [ they went to ___ ]?
C extracted obj of P, P fronted with obj: of course, Form2
To whom did you say [ they went ___ ]?

And defaulting to the all-purpose Form1 produces the Standard System:

extracted subj: Form1
Who did you say [ ___ stole the tarts ]?
A extracted obj of V: Form1
Who did you say [ they saw ___ ]?
B extracted obj of P, P stranded: Form1
Who did you say [ they went to ___ ]?
C extracted obj of P, P fronted with obj: Form2 (it's the whole PP that gets focus)
To whom did you say [ they went ___ ]?

6.  Assuming that the Prescriptive System was predominant for some time, what would move people to innovate the (now) Standard System?

Speculative answer: it has to do with the much greater frequency of main-clause subject questions (and relatives), versus all other types.  That would lead people to see Form1 as the one appropriate for questions and relatives and so to (mistakenly) extend Form1 to cases A and B (but not C).

7.  Two consequences:

First, once the Standard System had spread some, the Prescriptive System would increasingly be seen by many speakers as old-fashioned, formal, even pretentious -- and on the positive side, as serious and emphatic.

Second, in the Standard System, whom now occurs with any frequency only with fronted Ps -- a construction that is itself associated with a high level of formality.

Put these two together and you get whom itself seen as old-fashioned, very formal, serious, and emphatic.  And so available for situations in which you want those connotations.

(Note again the contrast between WHO and the ordinary personal pronouns.  For the ordinary personal pronouns, Form1 has, for many speakers, come to be seen as formal, serious, and emphatic -- a development that leads some of these speakers to prefer "between you and I" and the like in serious contexts.)

When you add in prescriptive injunctions against who in cases A and B (which began in the 18th century), which instructed writers and speakers to replace (some occurrences of) who by whom, the way was open for hypercorrection, which would produce whom in contexts not available for it in the Prescriptive System.

And so it has come to be.  See Mark Liberman's Language Log piece on

(2) [ Whomever controls language ] controls politics.

Mark refers there to a tongue-in-cheek piece by James Thurber on whom, in which Thurber recommends: "'Whom' should be used in the nominative case [i.e., for a subject -- AMZ] only when a note of dignity or austerity is desired."

Earlier, and more stunningly, Geoff Pullum displayed a photo of an American flag with this protestors' legend printed on it:

(3) Cheats Murderers Rapists Thiefs Terrorists [ Whom Captured Killed Enslaved Millions Of Africans ] [ Whom Killed More Natives Than Nazis Did Jews ] ...

Meanwhile, in e-mail from John Singler, 11/1/05, this item from a Brooklyn neighborhood website (punctuation as in the original; I forbear to indicate embedded clauses):

(4) Can anyone help me with information about Lillian Krum whom may have married a "Cunningham" ,whom had a son named Albert James Cunningham that married a Lillian Smith ,with whom they had 4 children and moved to Florida . . .

Finally, one I collected myself:

(5) Key theorists, [ whom are almost all white men ], control...
  (Stanford Humanities Center fellow, in lunchtime conversation, 10/31/05 -- possibly reaching for more formality and seriousness, in conversation with Lani Guinier).

8.  These involve straightforward subjects with no obvious factors favoring whom.  From them we shade into some cases involving predicatives, a case not in the lists above:

(6) Thank you so much [ whomever you are ___ ].
  (letter to Palo Alto Daily News, 9/17/03, p. 10, thanking a good samaritan)

(7) Who I am today is [ whom I've always wanted to be ___ ].
  (cited by Wilson Gray on ADS-L, 6/8/06)

  (cartoon cited by Geoff Pullum here)

For predicatives, the Prescriptive System insists on Form1 for ordinary personal pronouns ("It was I"), on analogy with the system of Latin, while the Standard System is very strongly in favor of Form2 ("It was me"), as the default form, and that might influence the choice of forms for WHO.

9.  Now we come to two cases where the appearance of whom for a subject has some structural motivation.  People have been noticing examples like these for a hundred years at least (there is some discussion in MWDEU of these precedents), and it's fairly easy to find new ones.

There are two main cases. 

9.1.  In the first, we have an object clause (usually the object of a P) with WHO as its subject.  The pronoun then immediately follows the governor, and could easily be mistaken for its object (even though it's the whole clause that's the object).  In fact, I believe there are languages in which a WH pronoun in this position regularly (or optionally) has its case determined by the governor.

I'll call this case "in-situ subject of an object clause (ISOC)".  Examples:
(9) This is not a picture of a political tide running in one direction. It is a picture of voters venting their frustration on [ whomever [sic] happens to be in power ].
  (Wall Street Journal quoting USA Today, reported by Ron Hardin in sci.lang, 11/6/03; that's the WSJ's "[sic]", by the way)

(10) ... and works to ascertain God's leading as to [ whom should fill certain positions within our congregation ], the full congregation radifies these appointments in ...
  (here, cited in eggcorn database, 8/29/05, for "radify")

(11) This month's social has an Academy Awards theme and two prizes will be given away.  One prize will be awarded to [ whomever successfully predicts the most winners for this year ].
  (e-mail to the QUEST (Queer University Employees at Stanford) mailing list, 2/22/06)

(12) ATHENS, Ga. - Authorities are searching for [ whomever posted a long list and description of supposed sexual encounters between dozens of high school students on the online networking site MySpace.com ].
  (AP story reported by Ron Hardin on sci.lang, 10/2/06; the Washington Post version had whoever)

9.2.  The second case is a bit subtler.  Again there's an object clause, but this time its subject has been extracted and now appears at the front of a higher clause.  Still, the gap of extraction immediately follows the governor (most often, a V), so it's in a position where some languages (I believe) allow the governor to determine the case on this element; if this case is inherited by the extracted element, whom would be predicted.

I'll call this case "extracted subject of an object clause (ESOC)".  Examples, beginning with the Buckley example, repeated here:

Restrictive relative:
(1) The answer, shaped in the National Security Council, is for the American military to make targets of Iranians [ whom they believe [ ___ are fueling attacks ] ], a decision that Mr. Bush made months ago that was disclosed only last week.
  (from the New York Times of 1/15/07, here)

Main question:
(13)   [Robert Coren:] Well, I think what works best for Steph is what works best, but, much as I'd dearly love to have Sim there, I think cons work best in pleasant outdoor weather.
         [Sim Aberson, a meteorologist:] And whom do you think [ ___ would be responsible for the POW ]?  I hereby declare that if I can't go, only UOW will occur.
  (exchange on soc.motss, 1/18/05)

Non-restrictive relative (twice):
(14) Now there's antiwar Connecticut Senate candidate Ned Lamont, [ whom Moulitsas predicts [ ___ will defeat Joe Lieberman in the party primary ] ]. He'll lose. And there's Montana's senatorial candidate Jon Tester, [ whom Moulitsas predicts [ ___ will beat incumbent Senator Conrad Burns in November ] ].
  (Ben Shapiro column, reported by Ron Hardin on sci.lang, 6/14/06; see here)

Restrictive relative:
(15) Bobby Hodges, a former Texas Air National Guard general [ whom "60 Minutes" claimed  [___ had authenticated the memos ] ], says that when he was read them over the phone he assumed they were handwritten and wasn't told that CBS didn't have the originals.
  (Wall Street Journal, reported by Paul Kriha in sci.lang, 9/13/06; see here)

Non-restrictive relative:
(16) The 77-year-old Chomsky, [ whom Chavez mistakenly thought [ ___ was dead ] ], is famous as a linguist and as an opponent of U.S. foreign policy.
 ("Chomsky still best seller", Mercury News "Celebrities" section, from Chris Waigl in e-mail 9/27/06; see here)

(MWDEU has similar examples from Shakespeare -- note, from well before the age of the prescriptive grammarians.)

10.  Still more subtle (and much less frequent) examples involve subjects that are understood as denoting affected participants in some event -- that is, subjects that have some of the semantics of objects.  I have one case with the verb GET taking recipient subjects, and one with the subject in a passive:

(17) ... Hillary, or [ whomever gets the nomination ], gets a shot.
  (John Meacham, senior editor of Newsweek, on the Imus radio show, reported by Ron Hardin in sci.lang on 6/13/03)

(18) The employee, [ whom has been fired ], did not have the authority to take the equipment or the data home.
  (News editor on the Imus show, reported by Ron Hardin on sci.lang, 7/1/06)

11.  Finally, combinations of these factors. 

First, an affected subject (subject in a passive) in combination with ESOC:

(19) Currently I am reading Barry Strauss's The Trojan War.  Strauss is the type of classicist [ whom in Who Killed Homer? we once thought [ ___ were desperately needed for a dying profession ] ].
  (Victor Davis Hanson, quoted by Ron Hardin in sci.lang, 12/3/06; see here.  The wonky number agreement is a bonus.)

Then, a predicative counterpart to ISOC -- a predicative fronted in an embedded clause, where it immediately follows a governor (in this case, P):

(20) ... the courage to be open about [ whom I was ___ ].
  (James McGreevey, cited by Mark Liberman here)

12.  Some lessons.

12.1.  People struggle to discern system and meaning, on very imperfect evidence.  Yes, they thrash about and make mistakes, but mostly what we see is an attempt to find a system in what they're confronted with.  People come up with systems that are possible as languages -- they are attested in other languages -- but are not, in fact, necessarily the predominant systems of other speakers around them.

Then those systems can spread.

12.2.  The "same" grammatical category in different word classes might have quite different principles of distribution.

12.3. "Nominative" and "accusative" (or "subject case" and "object case") aren't bad names, but the labels aren't definitions, and they aren't descriptions.  We choose the labels because of the ways the forms are frequently used, not the other way around.  That's why I insisted on making the case names arbitrary and strange -- "Form1" and "Form2" -- so that you wouldn't import expectations about what "nominative" and "accusative" forms SHOULD do.

We inventory the distinct forms and list the ways they're used.  Then we pick names.  But the names are just expository icing, not anything of significance in the description.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:31 AM

Milton, Star Trek, and Romulean rules

A few days ago, I made fun of Christopher Orlet for arguing that "a letter written 367 years ago by John Milton to Benedetto Bonomatthai reads much like one composed by a good writer today", when in fact Milton wrote the letter in Latin, and Orlet's quotation was from an 1847 translation by Robert Fellowes ("Scholarship is hard, let's go drinking", 1/19/2007).

But Milton -- at least the 29-year-old Milton who wrote that letter -- did indeed strongly support Orlet's beloved linguistic prescriptivism. His letter to Bonomatthai touches on several of the standard (though internally inconsistent) prescriptivist themes, such as the importance of upholding the standards of an admired past, and the value of establishing arbitrary rules. And Milton suggests strong punishment -- even a metaphoric death penalty-- for disobedience.

In the 1638 Latin original, and Fellowes' 1847 translation:

Nam qui in civitate mores hominum sapienter norit formare, domique & belli praeclaris institutis regere, illum ego parae caeteris omni honore apprime dignum esse existimem. Proximum huic tamen, qui loquendi scribendique rationem & normam probo gentis saeculo receptam, praeceptis regulisque sancire adnititur, & veluti quodam vallo circummunire, quod quidem ne quis transire ausit, tantum non Romulea lege sit cautum.

For I hold him to deserve the highest praise who fixes the principles and forms the manners of a state, and makes the wisdom of his administration conspicuous both at home and abroad. But I assign the second place to him, who endeavours by precepts and by rules to perpetuate that style and idiom of speech and composition, which have flourished in the purest periods of the language, and who, as it were, throws up such a trench around it, that people may be prevented from going beyond the boundary almost by the terrors of a Romulean prohibition.

If you're like most modern readers, the phrase "a Romulean prohibition" will not mean much to you. Asking around at Language Log Plaza, I got responses like "um, something about Star Trek cloaking devices?" and "what, did the FDA crack down on robo-tripping?"

But given the source, we know that this must have something to do with Romulus, the legendary co-founder of Rome -- an example of communication by classical allusion, common in classical Greek and Roman writings, and imitated in the vulgar languages of Europe into the 20th century. The idea of communication by allusion was developed at charming length in an episode of Star Trek, explained in a marvelous post over at TstT ("Darmok", 12/11/1006; more information here). In this episode, the Enterprise encounters an alien species known as the Children of Tama, who seem to communicate by "stating the proper names of individuals and locations". As Data explains,

The Tamarian ego structure does not seem to allow what we normally think of as self-identity. Their ability to abstract is highly unusual. They seem to communicate through narrative imagery, a reference to the individuals and places which occur in their mytho-historical accounts.

To understand the Tamarians -- or John Milton -- you need to know the "mytho-historical accounts" that are referenced. Thus "Darmok and Jalad at Tanagra" means "friendship as a result of shared struggle". In Milton's letter, the "trench" and the "Romulean prohibition" refer to an episode from the legend of the founding of Rome, and to understand what they mean, we have to tell the tale.

We pick the story up at the point where the twins Romulus and Remus, with their followers, have decided to build a city. According to Plutarch -- as presented in THE LIFE OF ROMULUS . English'd from the Greek, By Mr. James Smalwood, Fel. of Trin. Col. in Cambridge. (in Plutarch's lives Translated from the Greek by several hands. In five volumes. Vol. I. To which is prefixt The life of Plutarch., London : printed by T. Hodgkin for Jacob Tonson, at the Judges-Head in Chancery-lane, near Fleet-street, 1688.):

Their minds being fully bent upon Building, there arose presently a difference about the Place where. Romulus he built a Square of Houses, which he call'd Rome, and would have the City be there; Remus laid out a piece of Ground on the Aventine Mount, well fortifi'd by nature, which was from him call'd Remonius, but now Rignarius; concluding at last to decide the Contest by a Divination from a flight of Birds, and placing themselves apart at some distance, to Remus they say, appear'd six Vultures, to Romulus double the number; others say, Remus did truly see his number, and that Romulus feign'd his [...]

When Remus knew the Cheat, he was much displeas'd; and as Romulus was casting up a Ditch where he design'd the Foundation of the City-Wall, some pieces of the Work he turn'd to ridicule, others he trampled on and spurn'd at; at last as he was in contempt skipping over the Work, some say, Romulus himself stroke him; others, that Celer, one of his Companions; however there fell Remus ...

So when Milton wrote (with a closer translation than Fellowes'):

& veluti quodam vallo circummunire, quod quidem ne quis transire ausit, tantum non Romulea lege sit cautum.

...and surround it with such a ditch that anyone who would dare to cross it will be warned off by an all but Romulean law.

what he was saying, in the language of classical allusions, was this: grammarians should assume by fraud the authority to draw an arbitrary linguistic line in the ground, and respond with murderous rage if anyone should cross it. Let this be a warning to us all.

Now in fairness to Milton, there's a tension between what he says and the implication of the legend he alludes to. What he says we should do is

...loquendi scribendique rationem & normam probo gentis saeculo receptam, praeceptis regulisque sancire adnititur,

...strive to fix unalterably with rules and regulations the principles and norms of speaking and writing that were accepted in the superior age of our nation

Depending on your notions of which saeculum of our gens was most probus, this might mean trying to make everyone speak and write like William Shakespeare, or Jane Austen, or Thomas Jefferson, or Mark Twain, or Virginia Woolf. A wide choice of worthy goals, in principle. But you're not, I'm afraid, likely to succeed in many cases. And this is probably a good thing, because if your chosen saeculum is more than a century or so in the past, your pupils are going to have a hard time of it in later life.

Imagine, for example, offering to modern readers a plan for reforming high school and college written in the style of John Milton's Of Education (1644):

I call therefore a compleate and generous Education that which fits a man to perform justly, skilfully and magnanimously all the offices both private and publike of peace and war. And how all this may be done between twelve, and one and twenty, lesse time then is now bestow'd in pure trifling at Grammar and Sophistry, is to be thus order'd.

First to finde out a spatious house and ground about it fit for an Academy, and big enough to lodge a hundred and fifty persons, whereof twenty or thereabout may be attendants, all under the government of one, who shall be thought of desert sufficient, and ability either to doe all, or wisely to direct, and oversee it done. This place should be at once both School and University, not needing a remove to any other house of Schollership, except it be some peculiar Colledge of Law, or Physick, where they mean to be practitionets; but as for those generall studies which take up all our time from Lilly to the commencing, as they term it, Master of Art, it should be absolute. After this pattern, as many edifices may be converted to this use, as shall be needfull in every City throughout this land, which would tend much to the encrease of learning and civility every where. This number, lesse or more thus collected, to the convenience of a foot company, or interchangeably two troops of cavalry, should divide their daies work into three parts, as it lies orderly. Their studies, their exercise, and their diet.

Although the content of the plan might be excellent, no modern audience would take it seriously if a contemporary writer were to present it in a form like that.

But whether or not upholding the linguistic standards of the distant past is a good thing to do, the fact is that real-world prescriptivists don't try to do it. Sometimes they claim to be preserving the language against modern corruption, but they rarely even bother to try to figure out what the standards of the past really were. And a majority of the commonest prescriptive bugbears are modern innovations: singular "they", split infinitives, stranded prepositions, less with countables, which hunting, the morphology of comparatives, many alleged word-sense changes; and so on.

Although Milton invokes the authority of the admired past, his "narrative imagery" communicates, in a Tamarian mode, the arbitrariness of prescriptive grammar. Remus argued on pragmatic grounds for a city on the Aventine hill; but the boundary of his twin's "square of houses" was an arbitrary choice, with no particular historical or practical justification. Nevertheless, Romulus took the view that once he had gained -- though fraud -- the right to determine that boundary, it should be treated as sacred and inviolate. And he was willing to murder his twin brother to enforce it.

The supposed value of linguistic arbitrariness was argued eloquently by Mark Halpern almost a decade ago ("A war that never ends", The Atlantic, March 1997), reponding to an earlier article by Geoff Nunberg ("The decline of grammar", The Atlantic, December 1983):

One of the points that Nunberg, like all of his school, was most eager to make is that the "rules" of grammar, and of good usage generally, have no scientific basis; they are just someone's idea of what is proper, and that idea changes from generation to generation. The descriptivists are so eager, indeed, to make sure this point has registered that they seldom stop making it long enough to hear the reply: "Yes, we know this; we do not contend that the rules we propose for the sake of clarity and richness of communication were handed down from on high. They are ordinary man-made rules, not divine commandments or scientific laws (although many have support from historical scholarship), and we agree that they, like all man-made things, will need continual review and revision. But these facts are no more arguments against laws governing language usage than they are against laws governing vehicular traffic. Arbitrary laws -- conventions -- are just the ones that need enforcement, not the natural laws. The law of gravity can take care of itself; the law that you go on green and stop on red needs all the help it can get."

Despite the way they like to label themselves, most prescriptivists are really doctrinaire cultural radicals, not conservatives. And their harshness and intolerance are typical of true believers committed to instituting what Friedrich Hayek called "made orders".

[Some more comments on the role of "grown orders" vs. "made orders", in linguistic matters, are here.]

[Update -- Craig Russell writes:

I am sure that the reference is the incident you mention, where Romulus and Remus are founding the city of Rome. The Roman historian Livy, however, tells it with the following detail:

Volgatior fama est ludibrio fratris Remum novos transiluisse muros; inde ab irato Romulo, cum verbis quoque increpitans adiecisset, 'Sic deinde, quicumque alius transiliet moenia mea', interfectum.

The more common story is that, for a joke, Remus had jumped over his brother's new walls; then, by an angered Romulus, after he had said (also scolding him with words), "This same thing to anyone else who jumps over my walls," Remus was killed.

So I imagine it is this actual prohibition that Milton was making reference to.


Posted by Mark Liberman at 06:04 AM

Text laundering

Yesterday I asked:

Is there a word for thesaurus-driven mis-substitution to disguise authorship? I've used the neologism "thesaurusizing" to describe the process of replacing words with fancier equivalents in order to impress readers. You could use the same word here, but the motivation is different, and it would be nice to have a word that expressed more directly the dishonesty involved.

I've gotten several excellent suggestions, including some words and phrases that I plan to start using right away.

Jim Roberts suggested:

Would "sinonymizing" be too puritanical? I suppose it's a bit cludgy as well and could be taken for an alternate spelling of "synonymizing," the term used to describe the practice by plagiarists and anti-plagiarists alike. (http://www.plagiarismtoday.com/?p=137)

Jim's link describes "synonymized plariarism", an obvious and useful phrase which was new to me (though I'm familiar with the phenomenon it describes):

... the plagiarism war is entering a new, and frightening, territory as thieves discover its usefulness in gaining search engine ranking.

One of the critical tools in this new war is synonymizing software, which is software that takes a work and modifies it using synonyms of key words, producing a work that says practically the same thing but in a way that can’t be easily detected by search engines. This aids the plagiarist by greatly reducing the odds of their copyright infringement being discovered and prevents them from absorbing the “duplicate content” penalty some believe search engines apply.

Could such synonymizing software have played a role in developing the mysteriously botched presidential biographies at goppresident.org? A key aspect of the mystery is why a multi-million-dollar political fund-raising effort would have hired such an incompetent writer to create the copy for their site. But maybe they hired a competent spammer, not an incompetent writer.

(Well, a semi-competent spammer, anyhow. There are obvious algorithms that would do a lot better than thesaurus-driven subsitutions, and I imagine that the more advanced designers of splogs and so on have thought of them.)

Brett Spyker suggested:

How about "camouplage"? Camouflage + plagiarism

That's a nice blend, with echoes of "plague" as well, but I don't think it's going to make it.

Michael Covarrubias wrote:

Of course every teacher of introductory composition has come across this doubly deceptive plagiarising technique. Not only trying to take credit for someone's work without attribution, but also covering up the tracks in case someone tries to find the source.

My wife, who teaches Spanish, has noticed a related technique when grading writing assignments. Students apparently love Babel Fish. So a story about "hanging out" with friends becomes "colgando afuera" in Spanish: hanging/suspended outside. To "drop the book" is to "gota el libro": drop(noun) the book.

I would guess that "Babel Fishing" has been used for this translating method from language A to language B. If there's to be a counterpart when "translating" from language A into language *A, might we call it synonym fishing? Or synfishing if we want to highlight the damnability of such fraud.

Synfishing is a promising combination -- perhaps too promising. It could also refer to virtual fishing games, for example; and as Michael observes, "syn" might also be "sin".

Susan Harrelson:

I would go with "thesaurism." While thesaurisizing  conjures up harmless self-aggrandizement, and even fantasizing about how impressed people are going to be, thesaurism, besides having the right ending, describes something a little reptilian and nasty.

I like Susan's suggestion as a term for thesaurus-driven misuse of words, like "works hard to ascend Medicare and Social Security" in place of "works hard to improve Medicare and Social Security".

But my favorite neologism comes from Ran Ari-Gur: text laundering. He observes that it can be applied more generally to the practice of "superficially modifying a text so as to obscure its source". Beyond thesaurisms and other word substitutions, this would also cover minor rephrasing, such as changing "the end of the draft" to "the draft's end", or "strengthening local control" to "make local control stronger"; and other forms of disguise as well. And you could do your text laundering using synonymizing software, or with old-fashioned human labor.

Is the phrase already in use, in this or some other meaning? Not to any significant extent. Google has only 124 hits for {"text laundering"}. One source uses the phrase to mean "copy as plain text":

You've heard of "money laundering" — how about "text laundering"!

How many times has this happened to you? On your Windows computer, you copy some text from one document into another; and when the text appears in the new document, the fonts are all wrong. Or, you copy some information from a Microsoft Excel spreadsheet into an e-mail message, and everything is enclosed in little boxes.

There’s an easy way to fix problems like these. Just paste the text from the original document into a Notepad document; then copy the text from Notepad into the document where you want it. It will show up as plain, unformatted text, and you can then apply any formatting that you want.

But I think that "copy as plain text" expresses this concept pretty well already, and Jeremy Gillick's "copy as plain text" add-on for Firefox, which I use frequently and enthusiastically, is much more convenient than copying through Notepad or other plain-text editor.

Others appear to use text laundering to mean things like "removing errors from the text of non-native writers", "removing possibly-offensive material", or "removing possible privacy violations". These are all plausible meanings, but none of them are very common. And none of them make as nice an analogy to "money laundering", it seems to me, as Ran Ari-Gur's suggestion: disguising the source of plagiarized text by removing the string-identity that would permit easy discovery by web search.

Posted by Mark Liberman at 05:53 AM

January 22, 2007

When languages die

That's the title of this book, to be published this month by Oxford University Press -- a hardcover at the rare price of £17.99 (about $35.50 at today's exchange rate). [Update: you can order the book for $29.95 from Amazon.com -- free shipping!] The author is K. David Harrison, a phonologist and field linguist at Swarthmore College who in his (so far) brief but remarkable career has conducted extensive fieldwork and research "on endangered and little-documented Turkic languages of Inner Asia (Central Siberia and Western Mongolia)" and more recently on languages in Northeast India and Oregon. Some of Harrison's work is featured in the documentary The Last Speakers from Ironbound Films.

Harrison was recently interviewed on National Geographic World Talk. You can hear that interview here. (When I loaded the page today, Harrison's interview was third in the list of those available.) A special treat for Language Log readers: at about the 12 minute mark, host Patty Kim asks Harrison about the "20-some-odd words for snow" in "the Inuit language".

[ Comments? ]

Posted by Eric Bakovic at 08:51 PM

X ist das neue Y

An exercise in meta- (or trans- ?) snowclonology, in the Sueddeutche Zeitung ("Erschütternd ist das neue geil", 1/22/2007). Julia Hoceknmaier sent in the link, and pointed out that Benjamin Zimmer's 12/28/2006 post "On the trail of 'the new black' (and 'the navy blue')" is translated as "Zur Genese -- des satzes" (i.e. "to the genesis -- of the sentence"). The whole linkage from the bottom of the article:

Zur Genese - des Satzes
Der Blogger Matthew Paul Thomas - hat auch eine verdienstvolle Liste
Die Seite: "x is the new y" - der unsere Grafik nachempfunden ist
Und noch ein Archiv - "X=new Y"

In fact there have been an astonishingly large number of Language Log posts about this phenomenon, including Geoff Pullum's original " Snowclones: lexicographical dating to the second", 1/16/2004.

Posted by Mark Liberman at 03:43 PM

Breakthrough Collaborative

The Breakthrough Collaborative is a worthy educational cause that sponsors public radio segments so I hear about them occasionally in station breaks during NPR's morning news. And you know, not only is it unclear whether "The" is part of their name (some of their web pages have "The" in the banner, some don't), the rest of the syntax of their name was for some time a mystery to me. Was collaborative (which I knew as an adjective) now a noun, the head noun of this noun phrase, with breakthrough (also a noun) being used as an attributive modifier of it? Or was breakthrough the head noun with the adjective collaborative being used as a post-head modifier, as in rare cases (usually calques of French) like court martial?

The answer to the puzzle does not come from reflecting on the meaning (that's what non-linguists and beginning students of syntax always think). The key is to investigate using morphosyntactic criteria: see how the words inflect. If collaborative is being used as a noun (this usage may be well established, but it apparently had passed me by), then, I reasoned, it will inflect for plural number. So I googled collaboratives.

The number of hits approaches a million now. Sentences like "Collaboratives are designed to achieve dramatic improvements in the quality and outcomes of care in a short period of time...", and also "A collaborative focuses on a single technical area (for example, prevention of mother-to-child transmission of HIV)...", with the indefinite article and no modifier, make it crystal clear. The noun-adjective pattern (like court martial) is very rare and not really productive of new phrases, whereas collaborative as a noun is clearly well established (I didn't know it), and nouns can always be used to modify other nouns (as in London fog or April showers, which show that even proper nouns can be modifiers).

So with overwhelmingly greater than chance probability, I have decided, Breakthrough Collaborative is a noun phrase with the relatively new noun collaborative (formed from an original adjective, which itself is formed by adding -ive to the stem of a verb) as its head, and the noun breakthrough (formed from a verb-preposition compound!) as modifier.

And I didn't even need to look up collaborative in the Oxford English Dictionary as many Language Log colleagues would have urged me to do (I'm sure the new use is probably covered in the Second Edition): you can figure some of these things out from the data on your own these days, which means you can often solve the cases of analysis where the usage is too new to be in the OED yet. Linguistics is sometimes hard, but it's not so hard that you have to give up and go shopping. You can do linguistics together with intelligent friends, and sometimes have a collaborative breakthrough.

Update: Lots of people are mailing me to point out that collective, also an adjective that turned into a noun, is probably the model on which collaborative is based, and some of them report knowing of collaboratives from way back in their murky activist pasts.

Posted by Geoffrey K. Pullum at 11:47 AM

Native language? Plagiarism

Keith Handley used empirical methods to determine the origins of the mysterious texts at The Presidential Coalition's web site goppresident.org, and found... plagiarism!

From Keith via email:

I looked at goppresident.org and my first impression was that the biographies were done by a 6th- or 7th-grader, using an encyclopedia and a thesaurus. Then I remembered that the Internet has replaced the encyclopedia, so I did a little Googling.

I figured that there couldn't be too many sites with much good to say about Nixon, so I started with him. And here's what I found:

From http://www.whitehouse.gov/history/presidents/rn37.html:

His accomplishments while in office included revenue sharing, the end of the draft, new anticrime laws, and a broad environmental program. As he had promised, he appointed Justices of conservative philosophy to the Supreme Court. One of the most dramatic events of his first term occurred in 1969, when American astronauts made the first moon landing.

From goppresident.org:

Nixon carried out revenue sharing, the draft’s end, modern laws against crime, and a large environment agenda as well as the appointment of protective, philosophic Judges to the Supreme Court. He was also in office when American astronauts landed on the moon for the first time.

The Nixon entries have more similarities than this paragraph.

But I took a look at Hayes's biographies from the two sources, and found no such similarities. Same with Coolidge.

But... Reagan, from whitehouse.gov:

Dealing skillfully with Congress, Reagan obtained legislation to stimulate economic growth, curb inflation, increase employment, and strengthen national defense. He embarked upon a course of cutting taxes and Government expenditures, refusing to deviate from it when the strengthening of defense forces led to a large deficit.

and from goppresident.org:

During his first term, Reagan activated the development of the economy, curbed inflation, upgraded employment, and enhanced defense spending as well as made tax cuts, decreased executive spending, and reduced the federal budget.

The Eisenhower biography is similar to the whitehouse.gov version, but not so blatantly so. You'll notice that the Eisenhower biography is quite a bit better than the typical biography at goppresident.org. I see a much better plagiarist's hand at work here.

So my vote is for a native speaker of English in 6th grade. I'll guess that he or she used an old print encyclopedia from about 1958, which, of course didn't cover the entire Eisenhower administration. So then he or she had to go to whitehouse.gov to get the rest.

Finally, at least some of the portrait images at goppresident.org are the same and have the same names as the ones at whitehouse.gov, also.


[Above is a commentary by Keith Handley.]

[I'll note that people who are too lazy to rewrite copied material often disguise their copying by replacing words, here and here, with thesaurus-derived semi-synonyms. Other methods for disguised copying are also in evidence in the passages that Keith cites, for example changing "the end of the draft" to "the draft's end".]

[Stephen C. Carlson wrote to supply another source (or an independent echo of some third text), this time for the Bush bio itself:

I suspect I've found the source text of the bio for George W. Bush on the Republican Presidential Coalition site.


In his first term, Bush made it his duty to advance the works of public schools, beseech accountability, and make local control stronger, and also signed tax assistance, improved betterments and income for the U.S. military, and works hard to ascend Medicare and Social Security.

with this:

Since taking office, President Bush has signed into law bold initiatives to improve public schools by raising standards, requiring accountability, and strengthening local control. He has signed tax relief that provided rebate checks and lower tax rates for everyone who pays income taxes in America. He has increased pay and benefits for America's military and is working to save and strengthen Social Security and Medicare.

Note how both texts list the same accomplishments of Bush in the same order. It looks like the goppresident.org's biography of Bush was adapted using a lazily executed thesaurus look up technique to make it appear more original.


Is there a word for thesaurus-driven mis-substitution to disguise authorship? I've used the neologism "thesaurusizing" to describe the process of replacing words with fancier equivalents in order to impress readers. You could use the same word here, but the motivation is different, and it would be nice to have a word that expressed more directly the dishonesty involved.]

[And see this post for a note about another possibility -- maybe the text was processing by the type of "synonymizing" program used by sploggers and other to try to prevent search engines from detecting and discounting duplicate pages.]

Posted by Mark Liberman at 11:41 AM

More on the Cartoon Beat

Today's Tank McNamara cartoon shows that linguistics has made it to the world of sports:

Posted by Sally Thomason at 07:32 AM

January 21, 2007

What's the (native) language?

Here's a puzzle for Language Log readers, from Zeno at Halfway There. He quotes the capsule biography of George W. Bush at the website of the "Presidential Coalition" (http://www.goppresident.org/history.html),

George W. Bush followed in his father’s footsteps to Presidency and was elected in 2000. In his first term, Bush made it his duty to advance the works of public schools, beseech accountability, and make local control stronger, and also signed tax assistance, improved betterments and income for the U.S. military, and works hard to ascend Medicare and Social Security. Then, after the September 11th irruptions, he announced war on terrorism, and with that made victory and proceeded individual opportunity and his Administration’s precedence. We look forward to four more years with Bush in the White House, and no one will ever forget “Dubya” or his greatest achievements.

and asks by email: "I can't tell what language the writer grew up speaking, but I'm pretty certain it's not English. Can you tell?"

I'm not sure either, though my working hypothesis is "Martian". Some of the signs and symptoms:

  1. Presidency with no article;
  2. beseech to mean seek;
  3. betterments to mean benefits;
  4. ascend to mean improve;
  5. irruptions to mean attacks;
  6. war with no article;

Well, you get the idea. The phrase "with that made victory and proceeded individual opportunity and his Administration's precedence" seems designed to prove Quine's idea about the impossibility of radical translation -- presumably the original meant "temporal stage of undetached presidential parts", or something like that.

If you have some well-founded (or sufficiently amusing) conjectures about the linguistic background (or neurological state) of the author, please tell me and I'll sumarize for the blog.

[Well, the results so far are mostly pretty much what I expected -- except for one, which suggests that Zeno asked the wrong question.

To start it off, here's what I wrote to Zeno when I posted the query:

I don't get a clear impression, though the lack of articles might suggest someone with a slavic background. The misuse of words might even be a native speaker with a poor vocabulary over-using the thesaurus -- I see that from time to time in student essays. But I agree that it's probably someone who grew up speaking another language.

Anyhow, I blogged your question and we'll see if anything interesting comes back.

Several readers proposed "slavic"; for example, Bill Scott:

I'm guessing Russian because of the lack of "the" before many nouns.

As for looking forward to four more years, the writer is (a) two years behind the times and (b) out of his gourd.

(Well, being merely two years out of date would make this one of the less stale pages on the web. And two years ago, more than half of the American electorate chose W, so the position was clearly not insane by the social norms of the time.)

Several others proposed "native speaker armed with a thesaurus", for example Kyle Gorman, who also suggested a Romance source:

i think it's written with a thesaurus.

the two missing articles don't strike me as that strange in a textual source. "presidency" is weird, but more so because it's capitalized without any real reason. many, many literate english speakers think that content words can be capitalized indiscriminately, a fact which is often a source of parody. here's an example from a parodic website called "The Philadelphia Diaries":

"My dad is literally, getting a new Wife, and she is a blatant Bitch. I just learned of this new News when I was just trying to drink my Lotte and like get Jake a new pare of alternative style Sneekers at Ubik, like Marc Jacobs Vans with the valkrow straps? I mean I was having a very Positive day and then there is Dad and the Blonde Slut Lady walking down Walnut St. with there arms intanglid together, like their in LUV."

i used thesaurus.com and looked up the low-frequency words you noted. each one of them linked directly to your glosses. only "beseech" does not pick out a thesaurus entry named after the gloss you give (i.e. "seek", though it's listed lower).

the one thing that strikes me as L2 is "made victory". obviously, that phrase would work as a calque from french or any language which uses a form like "fait victoire" has 754 attestations on google (low, but many more if you include intervening words). i'm sure there are many other languages which use constructions like this though i'm blanking on which ones.

Jay McCurdy thinks he knows the author:

I'd almost forgotten this particular student.  I believe I taught her many years ago in Comp.   She was a shy, sweet, girl and was absolutely mortified to be doing badly in my course, as she had consistently received ‘A’ grades in English at the high school level. 

She worked at it, but there was just too much of this stuff in her head for her to simply forget it and try to learn the English language in a semester.  In many ways, she was a more difficult student to teach than many of the other less-privileged kids I taught. Many of those other kids recognized that they did not think, or write, in the ‘dominant’ code.  It was clear to them that what I was looking for was essentially a translation, and they felt more comfortable trying to provide that than she did, because she had been misled into believing that her thinking and writing was better than, or equal to, the dominant code.

I would not be surprised if it is the same woman.  I think I’ve seen that beseech before.

Me too.

Suzette Haden Elgin gives a more precise account of how Thesaurian is generated:

I suspect that the author's native language _is_ English, and could provide you with additional examples in the same register, collected from local newspapers. Sequences like these come from people who are not academics but have as their goal the production of sizable stretches of Academic Regalian all the same. Their technique is simple but effective: Write the stuff in ordinary human-being English; take a thesaurus and choose a more AR-sounding word to replace as many of the ordinary human-being English words as possible; make the substitutions and send the result to the "Letters to the Editor" section of a newspaper. Matters such as meaning and appropriateness are not part of the process, and the authors are _very_ proud of the results.

I agree that this is the most plausible source for the strange word choice -- though non-native speakers also sometimes misuse the thesaurus. And Marie-Lucie Tarpent points out that the rest of the presidential biographies are rather, well, odd -- and in ways that transcend merely Thesaurian norms:

Thank you for passing along the interesting paragraph about the current president.  I looked up the site listing all the Republican presidents and discovered that a) the bio for G.H. Bush is actually Reagan's, and b) most of the other bios are barely better written than that for GW, which seems to be the worst one.  Not only the grammar and vocabulary of those texts but also the paragraph structure are unusual.  In one case the final sentence of Chester Arthur's bio is tacked on at the end of another president's.  etc, etc. Who hired the person(s) who produced those texts?????

She feels that there was good internal evidence against a Romance language, and she argues against Germanic on the grounds of national character:

I quite agree that these were not written, let alone reviewed, by native speakers of English (only one seemed to be relatively free of linguistic errors or incongruities).  I don't have precise suggestions but the lack of articles in many cases and the often aberrant vocabulary should narrow the search somewhat - Slavic, Oriental?.  It can't be Spanish or French, which would have more articles and where the abstract vocabulary would be more similar to that of English.  And a Germanic-speaking writer would probably have double-checked the vocabulary.

But Jim Gordon detects a Romance and even more specifically Castillian vibe:

To me seems English traduced for a person of speaking Castillian. I can't tell you why, but it has a certain flavor. Many of the goppresident.org web pages have the same flavor and type of errors. I suppose it could be influenced by any of the Romance languages.

Paul Bickart suggested South or East Asian connections:

My guess is that the management of the web site was outsourced to some Mumbai jobber, although the syntax reminds me rather more of Chinese-produced VCR manuals...

Richard Parker attempted an analysis in terms of national political preferences:

Try getting a native English-speaker to translate it into Hebrew, then a native Hebrew-speaker to translate it back again.

I can't think of any other country where the man might be popular.

I can't back this up with a citation to opinion polls, but my impression is that W has been no more popular in Israel than in the U.S., and that the recent world maximum for his popularity would be found in Iraqi Kurdistan.

The key insight, in my opinion, came from Mae Sander:

I'm not a linguist at all -- though I'm a fan of Language Log. But I think the most revealing page of GOPpresident.org is the donation request page.

This page wants you to fill in all your possible credit card info, your employer, etc. So I think the native language of the GOPpresident.org is SPAM. They just tried a little harder than some spammers -- maybe. Some of their bios of presidents verge on the incoherent or irrelevant, as well as revealing clues of nonnativeness.

I had noticed the similarity of the biographical pages to some spam text, but dismissed the thought immediately without adequate reflection. It makes sense -- during the last election, perhaps some Russian spammers decided to set up a phony PAC to target unwary wingers? Quite apart from the weird language, the site seems to be perfunctory at best -- there's basically nothing there but the front page, the botched presidential biographies, and the donations page. For all we know, the same crew set up dozens of other sites out there, targeting political segments from wingnut to moonbat...

It's the best hypothesis yet. After all, we know that spammers and phishers are notoriously careless about getting their stuff proofread.

The site responseunlimited.com ("Mailing Lists and Creative Services for Evangelical and Conservative Mailers") has a page for The Presidential Coalition, but this only tells us that they're willing to sell their mailing list (which is said to number 122,054 donors). At campaignmoney.com, they think that the "Republican Presidential Coalition" is a duly registered "527" Political Organization, with a contact person named "Matthew J. Palumbo".

I doubt that this was the "Matthew J. Palumbo" who gave a talk on "Ilex vomitoria: An overlooked North American caffeine source" at the 47th Annual Meeting of the Society for Economic Botany in Chiang Mai, Thailand. It seems more likely to be the "Matthew J. Palumbo" referenced on the web site for ALP Digital Media Services:

Digital Media Services is a fully integrated web services firm. We specialize in web design, marketing for the web, email marketing campaigns, banner advertising, e-commerce, hosting services and database design.
Established in 1997, we have been developing websites and marketing plans for our customers for many years. President and CEO, Matthew J Palumbo, has over 12 years experience in marketing and web development.
Headquartered in Garden City, Long Island, NY we service a broad range of industries including financial services, education, nonprofit, healthcare, retail, software and consumer products.

and presumably this just means that "The (Republican) Presidential Coalition" hired ALP to set up its web site. (Or maybe not -- see Stephen J. Carlson's comment below.)

Back at compaignmoney.com's page for "The Republican Presidential Coalition", the listings of contributions and of expenditures are both empty. So could it be that some spammers set this site up, registered a 527 -- and then took the money and ran? Update: apparently not -- in fact they seem to have spent about ten times more than they took in, according to politicalmoneyline.com. See below for details. ]

[Information about the laws governing "Section 527 Organizations" is here, and the wikipedia page is here.

The site politicalmoneyline.com has several mentions of the string {"Presidential Coalition"}, in particular this item from 10/16/2006:

David Bossie's Presidential Coalition, a Section 527 group, reported raising $175,945 and spending $1,194,051 during the third quarter. They paid Issue Advocacy Group (WV) $493,200 for survey/polling work. They paid Infocision Management Corp (OH) $157,108 for fundraising.

and this item from 7/18/2006:

David Bossie's Presidential Coalition reported raising $147,390 and spending $1,210,129. They paid IAG (WV) $385,375 for polling; Uniontown Fullfillment Services (OH) $296,972 for mail and postage; Infocision Management Corp (OH) $154,464 for telemarketing. They also paid $5,000 to the Libby Legal Defense Trust.

and this item from 4/19/2006:

The Presidential Coalition LLC reported raising $108,147 and spending $1,588,073 in the first quarter. The group paid IAG (WV) $650,974 for surveys and polling. They paid Infocision $258,551 for fundraising. They paid UFS (OH) $362,981 for postage and direct mail.

and this item from 1/27/2006:

The Presidential Coalition LLC (DC) reported raising $63,700 and spending $935,231 in the last six months of 2005. Major expenses included $492,790 to IAG (WV) for survey/polling; $174,073 to Infocision (OH) for telemarketing; and $210,323 to UFS (OH) for postage/direct mail. David N. Bossie is president and manager of the 527 group. The group registered on 6/30/05, but does not indicate where it got the other $871,153 used to pay expenses. Citizens United is listed as an affiliated organization.

Wikipedia believe that there is a real political operative named "David Bossie" -- whether this is the same "Presidential Coalition" isn't clear to me. If it is, then the web site's management is American -- but of course they might have contracted the site creation out to anyone at all. Still, with expenditures of $4,927,484 over 5 quarters -- almost a million bucks a quarter -- you'd think they could afford better writers and editors. If this is really their site, it suggests a casual and undisguised contempt for the suckers who give them money.

(And the fact that they've only raised $495,182 -- roughly a tenth of what they've spent -- might suggest a similar contempt for creditors. Or something -- perhaps this kind of bookkeeping is normal in the world of political fundraising.)


[More on this here.]

[Stephen C. Carlson wrote:

Thank you for your puzzle of a post.

I doubt that either of the "Matthew J. Palumbo"s you have found (the Florida biologist or the New Jersey webdesigner) is the same person who is listed as the contact information for the Washington DC based 527 group.

Here's some more information I've been able to glean:

According to Bob Novak, the (Republican) Presidential Coalition is an organization by a former congressional staffer David N. Bossie
that is intended to go after Hillary ("Pelosi goes over the heads of energy, environmental committee chiefs", Chicago Sun-Times, 1/21/2007):

Dick Morris' appeal

People on Republican mailing lists this week received an appeal for funds from Dick Morris, President Bill Clinton's political strategist in 1995-1996, asking for a contribution of between $25 and $100 or more to finance a film documentary critical of Sen. Hillary Clinton.

Signing the letter as ''Former Clinton Adviser,'' Morris wrote: ''If you liked how the Swift Boat Veterans turned the tide against John Kerry, you understand how a top Clinton aide can turn the tables and stop a Clinton-style liberal from becoming the next president of the United States.''

Morris' appeal was made through the Presidential Coalition, run by conservative activist Dave Bossie. The letter described Morris as dedicated to electing presidents like Ronald Reagan and George W. Bush. Since 1996, Morris has been an author, columnist and television commentator.

In this connection, there is a documentary film on the illegal alien issue, "Border War," which not only was produced by Bossie but also credits a "Matt Palumbo" for research. I suspect that this Palumbo is the same one who is listed on the contact information.

This does not get us any closer to the curious text of the presidential biographies, I'm afraid, but it might tell us a little more about who is behind them.

But I suspect that it's not Matt Palumbo the actor, either. I think it's probably the Matt Palumbo who was appointed vice president of marketing in the political division of InfoCision Management Corporation of Akron, OH, in June of 2003. ]

Posted by Mark Liberman at 05:34 PM

Error message iced on cake

Not much about the way we are victims to the disaster of special character encoding failures really amuses me (it wastes too much of my time and ruins too many attempts at submitting manuscripts for publication), but even I find the picture below hilarious. Courtesy of Boing Boing, it shows the handiwork of some trusting cake manufacturers who took the multilingual birthday cake message text (English and some kind of mixture of Italian and Friulian) that someone had attempted to email in, and simply reproduced in icing the garbage that spewed off the printer.

The customer is always right... I cannot say whether proprietary Microsoft HTML extensions are implicated, as Boing Boing alleges. Who am I to criticize the Enron of software? But that's a heck of a birthday cake message. I hope Aunt Elsa has a sense of humor.

[Acknowledgments: Special thanks to John McChesney-Young for alerting me to this and to Stefano Bertolo for pointing out that there was some attempted Friulian in there along with the rather idiosyncratic Italian.

Posted by Geoffrey K. Pullum at 05:22 PM

Sentence with no subject?

The blogger Callimachus at Winds of Change quotes the Los Angeles Times as saying:

Using the word "surge" to describe President Bush's forthcoming plan for reshaping U.S. efforts in Iraq has ignited a fiery political brouhaha.

and comments scathingly that this attempt to smear the Bush regime for use of the term "surge" overlooks the fact that the media use it more than Bush does; and it fails (Callimachus says) for grammatical reasons: "What's missing there? The sentence has no subject."

Callimachus is wrong. The incident provides an interesting little example of how people who attempt to use grammatical analysis in the service of their rhetorical analysis often lack the necessary grammatical knowledge. The sentence Callimachus quotes (in red above) does have a subject. The sentence as a whole has the form of a declarative tensed clause, and thus must have a subject to be grammatical. And it has one: the subject is a non-finite (gerund-participial) clause, namely using the word "surge" to describe President Bush's forthcoming plan for reshaping U.S. efforts in Iraq. (The predicate following that subject is the verb phrase has ignited a fiery political brouhaha.)

What Callimachus appears to mean is that the non-finite subject clause does not itself have a subject (this is grammatically permissible, of course, where omitting the subject in a declarative main clause would not be). Use of the word "surge" is referred to in a way that does not mention whose use of it we are talking about. It's actually a valid point. Later Callimachus points out that the very same L.A. Times article admits that the Bush regime have been avoiding use of the verb surge in the relevant context, which really does look like a contradiction in the article's argument. But the Callimachus's grammar has let him down.

The fact is that the teaching of grammar in our culture has fallen to such a low ebb that in general even people who write about language, in blogs or for print media, simply cannot tell what the subject of a sentence is, or what a verb is (see here and here), or what a tense is (see here and here), or what a noun is (see here), or whether a clause is in the passive or not (see here and here and here), or whether something is an adjective or an adverb (see here), or whether subjects are being confused with objects (see here)...

Mark usually says when pointing out such things that we linguists ought to blame our discipline. We teach about language, but we do it so ineffectually that hardly any of the general public or the journalistic profession can deploy even the most elementary analytical terms for talking about language.

Well, maybe the blame lies with us. I'm not sure. I'm a part of the discipline, and I'm doing what I can: this quarter I'm teaching a big lower-division undergraduate course on English grammar with no prerequisites. I'll be trying to teach the students to identify subjects of clauses and tell actives from passives and so on. If I fail, we'll have to decide whether the fault lies with me or with them. We'll see how it goes. I'll try to keep you informed.

[Hat tip to Linda Seebach of Rocky Mountain News for a pointer to the Callimachus post.]

Posted by Geoffrey K. Pullum at 01:39 PM

Doing Meta: from meta-language to meta-clippy

The theme of the January/February recent issue of Technology Review is "software", and the cover story is "Anything You Can Do, I Can Do Meta", by Scott Rosenberg. The subhead tells us that "The space tourist and billionaire programmer Charles Simonyi designed Microsoft Office. Now he wants to reprogram software."

Rosenberg starts the article by describing Simonyi's planned visit to the International Space Station (as a tourist, for a $20M fare), and explains that:

This has always been Simonyi's preferred vantage. In a career spanning four decades, every time he has confronted some intractable problem in software or life, he has tried to solve it by stepping outside or above it. He even has a name for his favorite gambit: he calls it "going meta."

According to this page

"Anything you can do, I can do META!" was a quip that Samuel Hahn, then of ESL, tossed out once at a dinner (ca. 1991, San Jose, California). In the conversation Sam was responding humorously to some rather philosophical musings about programming higher-order functions (functions that build other functions) in LISP or Scheme. He said it as a cute play on the words "Anything you can do, I can do better," and claimed to have heard it from someone else, probably in the Stanford computer science department.

But I'm pretty sure that I heard this phrase in the late 1970s -- can anyone give me a pre-1991 citation?

The pun is a natural one, which might well have been invented several times. The original phrase is from a 1946 lyric by Irving Berlin, "Anything You Can Do, I Can Do Better", (Annie Get Your Gun); and a quick web search reveals that Anything you can do, I can do X has become one of the phrasal templates that we call snowclones. Among the many attested substitutions for X are: perky, beta, bigger, later, badder, maybe, while looking cooler, lower, vegan, badass, cheaper, louder, in microsoft paint, smarter, cuter, veto.

For members of the generation that used meta in everyday conversation ("you're getting bogged down in the details here, let's take this meta"), and even had meta keys on their keyboards (giving us jokes about "control meta cokebottle"), it would be particularly hard to resist the quasi-rhyming subsitution "meta" for "better".

I guess that the source must have been the language/metalanguage distinction in logic, though exactly how this usage came into proto-computer science in the 1950s and 1960s is not clear to me. The OED has an 1890 dictionary citation for metamathematics:

1890 Cent. Dict., Metamathematics, the metaphysics of mathematics; the philosophy of non-Euclidean geometry and the like.

Metalanguage is cited in a logical context from 1936, and in a linguistic one from 1948:

1936 K. GRELLING in Mind 45 486 The concepts analytic and contradictory in the language L, for instance, cannot be defined in L, as Carnap has shown. In order to escape from these restrictions one must build up a new language (a so-called meta-language) disposing of more means of expressing thoughts than the former.
1948 L. HJELMSLEV in Studia Linguistica 1 75 This would mean, in logistic terms, that linguistics is a metalanguage of the first degree, whereas phonetics and semantics are metalanguages of the second degree.

In any case, by the mid-1970s, meta certainly had reached among hackers the status for which the OED's earliest mundane citation is 1993:

1993 Boston Globe 8 Aug. (Electronic ed.), When anchorwoman Connie Chung made a guest appearance on sitcom Murphy Brown to advise anchorwoman Murphy not to sacrifice her journalistic integrity by making a guest appearance on a sitcom, that was just plain meta.

As for where Simonyi is going with this meta business, Rosenberg recounts a telling (though probably unfair) anecdote:

On a gray afternoon last October, I sat down with ­Simonyi in Bellevue, WA, in front of two adjacent screens in his office at Intentional Software, the company that he founded after he left Microsoft in 2002 to develop and commercialize his big idea. Simonyi was racing me through a presentation he was preparing for an upcoming conference; he used Microsoft Office PowerPoint slides to outline his vision for the proposed great leap forward in programming. He was in the middle of moving one slide around when the application just stopped responding.

In the corner of the left-hand screen, a goggle-eyed paper clip popped up: the widely reviled "Office Assistant" that Microsoft introduced in 1997. Simonyi tried to ignore the cartoon aide's antic fidgeting, but he was stymied. "Nothing is working," he sighed. "That's because Clippy is giving me some help."

I was puzzled. "You mean you haven't turned Clippy off?" Long ago, I'd hunted through Office's menus and checked whichever box was required to throttle the annoying anthropomorph once and for all.

"I don't know how," Simonyi admitted, with a little laugh that seemed to say, Yes, I know, isn't it ironic?

It was. Simonyi spent years leading the applications teams at Microsoft, the developers of Word and Excel, whose products are used every day by tens of millions of people. He is widely regarded as the father of Microsoft Word. (I am, of course, using Word to write these sentences.) Could Charles Simonyi have met his match in Clippy?

Simonyi stared at his adversary, as if locked in telepathic combat. Then he turned to me, blue eyes shining. "I need a helper: a Super-Clippy to show me where to turn him off!" Simonyi was hankering for a meta-Clippy.

Words, as the expression goes, fail me. Even hyperlinks are inadequate.

[Update -- Fernando Pereira dates meta-language in CS to 1962:

The earliest use of meta-language I can remember in computer science is in John McCarthy's 1962 LISP 1.5 Programmer's Manual:

The second important part of the LISP language is the source language itself which specifies in what way the S-expressions are to be processed. This consists of recursive functions of S-expressions. Since the notation for the writing of recursive functions of S-expressions is itself outside the S-expression notation, it will be called the meta language. These expressions will therefore be called M-expressions.

This approach to programming language specification should be contrasted with that of the equally famous Revised Report on the Algorithmic Language Algol 60 by Backus et al., 1962, who introduce the well-known BNF syntactic meta language (although they don't call it a meta language) but use informal natural language to describe the semantics of Algol 60. The notions of abstract machine and formal operational semantics needed still a few years to develop beyond the relatively simple recursive definitions needed for LISP 1.5.


[Update #2 -- Mike McMahon writes:

Then there's, "Every day, and in every way, we're getting meta and meta," usually credited to John Wisdom. E.g. here [Paul Greenberg, "The semio-grads: How an obscure Brown concentration trained graduates to crack the codes of American culture -- and infiltrated the mainstream", Boston Globe, May 16, 2004].

A comment by Leigh Klotz from a while ago on Slashdot recalls that the Lisp Machine editor had a special check in the error routine that adjusted the message when the undefined command Meta-Beta was entered. The team was split about half and half on whether that was funny. I believe I have the only space-cadet keyboard with a working USB interface. In addition to showing that message in the CADR emulator, it really will enter an uppercase lambda (into XEmacs) like the jargon file implies it should. The Lisp Machine character set had the SAIL characters, but was still only eight bits, so there wasn't room for everything.


[And Jesse Sheidlower writes:

Slightly antedating 1962 for "metalanguage" in a computing context (well, maybe it's not exactly in a computing context, but it's in a book about machine translation, so I figured it's worth the few electrons to send it to you):

1960 E. Delavenay Introduction to Machine Translation vii. 110 Between metalanguage and pure poetry, from the clear and distinct expression of a scientific representation to the synthetic expression of the vibrations of the poet's ego at the centre of his individual universe, there exists a whole vast range of untranslatables.

That's one of those interesting sentences that I believe I could translate, but don't believe I can understand.]

[Fernando adds by email:

I'm about to leave for a trip so I can't check Harris 1951 in the library, but I suspect the term might be used there. But my guess is that the term got into CS from logic, esp. Tarski, not from linguistics. Kleene's 1952 "Introduction to Metamathematics" puts in textbook form much of the machinery that was developed in mathematical logic by Goedel and Tarski. The formalization of programming language semantics starts with the same kind of machinery.

I'm now curious about how McCarthy came upon the M-expression presentation of LISP. I suspect there's evidence for the sources out there, but I don't have time to track it down until next week.


[Update -- Keith Ivey writes:

From a 21 Jan 1986 message by Walter Hamscher in AIList Digest:

The name means `Metalevel Reasoning System' because you can write meta-level axioms, axioms about the base level knowledge -- usually these meta axioms are used to guide the search-based inference procedures. I hear the latest version lets one write meta-meta- axioms, meta-meta-meta-axioms, etc ("Anything you can do, I can do Meta," as Brachman says).

That would be Ron Brachman, I guess, my former colleague at Bell Labs, who recently moved from DARPA to Yahoo. Ron might be the author, but I believe that I heard the same line from people at PARC in 1977 and MIT in 1978, used as if it were already a proverbial expression.]

Posted by Mark Liberman at 08:30 AM

Lumps in the melting pot

Racist language is always news. Right now, the big international incident is some vapid Big Brother contestant calling a Bollywood star ``Shilpa Popadum''. Ho hum. I mean, yes, it's disgusting, but do we really need to watch TV to find out we're all racists? I'd been trying so hard to forget the Michael `Krahmer' Howard ``nigger'' incident, and then it all came flooding back. And sandwiched in between was the big ``ching chong'' ding dong, which UT Austin grad student Elaine Chun was just discussing with me last week. She convinced me that some really interesting pragmatics was involved, and has rekindled my childhood interest in insulting people. Well, not in insulting people per se, which is child's play. But in understanding why they get so mad. That's a tough one.

So here's the ``Ching Chong'' incident. Rosie O'Donnell is on ``The View'' discussing an earlier well-publicized show in which Danny deVito was drunk, how it was big news just about everywhere. She looks straight into the camera with painfully staged timing like she's been practicing this in front of her huge dressing room mirror a few too many times and says what in the media is cited using variants of:

``you can imagine in China it's like: Ching chong, ching chong chong, Danny DeVito, ching chong chong chong, drunk, `The View,' ching chong.''

I'll leave narrow phonetic transcription of Rosie's utterance as this week's homework exercise:

She doesn't pull her eyes sideways in an unforgivable stereotyped slant, but you can imagine it. Public outcry follows broadcast, though not as big as for the  ``popadum'' and ``nigger'' incidents. Rosie gives half-assed apology. Asian American community is deeply unimpressed. The show goes on.

So what is it that makes Rosie's ``ching chong'' so offensive?

Answer A: Imitation is the sincerest form of mockery. "Ching chong" is offensive in the same way that any linguistic or dialectal stereotyping is.

No, that can't be it. Sure, humor often relies on laughing at the other guy, even if the other guys feelings are hurt. If there were a First Law of Comic Dynamics, it would say: you can't make one person laugh without making another one cry. (A Second Law? All comedy eventually descends to slapstick.) But isn't it worse to ching chong an Asian American than to bork bork a Swedish American, a la the Muppet's Swedish chef? Yes! But why?

Answer B: Painful weight of history. ``Ching chong'' has been used by bigots for a long time (C18 Australian gold rush?), and is still used when taunting those of East Asian descent. There's no doubt that hearing that specific phrase, as opposed to arbitrary poor imitation of Chinese, causes pain to many hearers.

But hey, sticks and stones, right? Doubtless, some Asian Americans feel that pain. But most are surely made of sterner stuff. There's more to it.

Answer C: "Nigger" jealousy. Every contemporary English speaker knows the word ``nigger'' can hurt, so why do many people not recognize the offensive potential of  ``ching chong''. In a curious way, O'Donnell's (claimed) lack of awareness sharpens the ``ching chong'' needle, making her insensitivity all the more unbearable. She said in a later semi-apology: ``Some people have told me it's as bad as the N-word. I was like, really? I didn't know that.'' And Asian Americans are like, really? You call that an apology?

OK, hypothesis C may or may not be right, but it can't account for the exreme anti-Rosie reaction. Surely people wouldn't be quite so angry at her just for being ignorant? (Headline news: ROSIE  IGNORANT.) Unless, of course, they think she's lying.

Answer D: Lumps in the melting pot. Elaine Chun points out to me that many Chinese Americans (and East Asian Americans more generally) are particularly sensitive to being marginalized, to not being seen to be integrated as core members of American society in the way that say Italian Americans or Jewish Americans are. Now Rosie's attempt at humor makes essential use of the crudity of ``ching chong'' to suggest extreme otherness. Indeed, if she'd instead used plausibly Chinese phonology, that would have blunted part of her comic intent, since it would have implied familiarity with the Chinese language, and so reduced distance. When Rosie said ``in China it's like: Ching chong...'', the use of ``like'' implied: this is what a Chinese talkshow would seem like to you or me. But who are you and me? By implication, we're people for whom the Chinese language is completely unfamiliar, lacks any character. It may as well be children's alliterative nonsense. So, if The View's audience, by design, is mainstream America, then Rosie's implicit message is: you Chinese Americans, you ain't part of it. You're just lumps floating about in the scum at the top of the melting pot.

Now that's a message I haven't seen discussed anywhere. Rosie and her team of handlers didn't even think of apologizing for anything like it. Lumps in the melting pot. Better a lump than a ``chink'', perhaps. But not by much.

Posted by David Beaver at 01:35 AM

January 20, 2007

Zippy retrospective 2

Here's a Zippy miscellaneous retrospective, following up on my last Zippy posting.

Punctuation: the apostrophe

Animal language: talking dogs



Playful morphology: -orama


Proper names: Ned

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:08 PM

Zippy on formulaic language

Here's a Zippy retrospective on formulaic language: cartoons on snowclones (in addition to the three listed in my last Zippy posting), clichés, catchphrases, and idioms.

Snowclones: X3 ("Location, location, location")

Snowclones: Proportional Analogy (X is to Y as Z is to W)

Clichés: a four-pack

Clichés: play on "Misery loves company"

Catchphrases: "Can we talk?" and "Are we having fun yet?"

Idioms: "know the score"

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:51 PM

The evolution of Zippytainment

Today's Zippy strip takes us from entertainment to infotainment to confrontainment to laundrotainment.  The words are all attested, and in rapidly descending frequency in the order Bill Griffith gives them.

First the cartoon:

I get almost 200 million raw Google webhits for "entertainment"; over a million hits for "infotainment", including a Wikipedia page (there are also over a million for "edutainment" and again a Wikipedia page); 61 hits for "confrontainment" (two below); and only 2 for "laundrotainment" (both below).  The rate of descent in frequency ratios isn't constant -- the ratios are 200:1, 16,393:1, and 30.5:1 -- but Griffith certainly has the words in the right order.


The second, of course, is the growth of so-called "trash TV" - or, as one of its principals calls it, "confrontainment." (link)

[about Andy Kaufman] A pioneer of the "confrontainment" style of comedy, his stand-up routines and professional wrestling appearances often transformed audiences into riotous, ... (link)


Over the last several years, this "bigtainment bang" has had a profound impact on virtually every one of our society's business and social institutions. Consumers increasingly expect and are drawn toward infotainment, edutainment, eatertainment and retailtainment. When a coin-operated Laundromat installs big-screen TVs, a bar and a cool kids play area, it too becomes an entertainment attraction.
Laundrotainment. (link)

I threw out my TV....
and now I watch my front-loading washer.
Now that's laundrotainment!
by Don Quixote on Wed Aug 17, 2005 at 08:02:29 PM PST (link)

When I searched for "laundrotainment", Google asked if I meant "wondertainment" -- 109 raw hits for that one, many for the first below:

Wondertainment: Bird Adoption and Placement Center [for exotic birds] (link)

Master illusionists, Skibber & Zip present Wondertainment every evening and Sunday afternoon on the Fountain Plaza Stage. (link)

Then I started finding oddities:

That's Entrail-Tainment!
What's worse -- getting tortured at Abu Ghraib, or volunteering for Fear Factor?
by George Smith
August 3rd, 2004 9:50 AM (Village Voice)

eater-tainment noun. A restaurant that also offers entertainment such as wall-mounted memorabilia, video displays, or live music.

Example Citation:
"Eater-tainment has become the industry buzzword for restaurants such as Hard Rock Cafe, Dave & Buster's and Jillian's, which bring together dining and play under the same roof."
--Lornet Turnbull, "Theme Restaurants Looking for a Hook," The Columbus Dispatch, August 21, 1999 (WordSpy)

WordSpy supplied links to "irritainment" and "promo-tainment".  There was obviously a big "tainment" iceberg here.  Googling on "tainment" pulled up the items above, and also (in the first hundred hits. in alphabetical order):

archi-tainment [architecture]

auto-tainment [automated robots; automobiles]

Centertainment [a booking office]

China-tainment [entertainment in China]

clutter-tainment [having fun de-cluttering]



Doy-tainment [guy called Doy on movies]

econo-tainment [entertainment on a budget]


etainment, eTainment, E-tainment

extra-tainment [a variety of uses]

histo-tainment [history]

Inter-tainment [internet]

itainment, iTainment

M-tainment, m-tainment, mtainment [mobile entertainment; MTV]

m-tertainment [mobile entertainment]



Politi-tainment [(black) politics]

promo-tainment [a promotional ad presented as a form of entertainment]

psycho-tainment [psychology]





(My usual caution: I'm not proposing to inventory all occurrences of the phenomenon.  The point is only to demonstrate that -tainment is versatile and popular.  People love to play with language.)

In any case, Zippy has, once again, touched on matters of linguistic interest, in this case, the spread of -tainment (from entertainment) as a compound-forming element.  The strips often take off on linguistics and linguistic phenomena, as we've noted here six times so far:


X's World (It's X's world; we just live in it): here

The New Y: here and here

other topics

non-standard pronunciations: here

Chomsky; language as music: here

odd words from old comic strips: here

I have a pile of other linguistics-related Zippy cartoons stashed in my iPhoto files (along with some art-related ones that I love but aren't really relevant to Language Log [yes, I know that's a somewhat odd coordination, of a type we've looked at here before, but on re-reading it I've decided I like it and I'm sticking with it]), and now that I have a lot of fresh storage space for publicly accessible files, I'll assemble some Zippy retrospectives on language.  Stay tuned.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:57 PM

And sometimes a cigar...

I'd been meaning to mention (and now I will) that I very much liked the anecdote that actress Laura Dern told on NPR's Weekend Edition Sunday a few weeks ago, about working on Inland Empire with director David Lynch. One of the producers reported having received a terse instruction from Mr Lynch: "Get me a one-legged woman, a monkey, and a lumberjack, by 3:15 this afternoon." Was he joking? Ms Dern says she told him: "Yeah, you're on a David Lynch movie, dude. Sit back and enjoy the ride."

"But what did it mean?", the man wanted to know. She told him: "It means you have to find a one-legged woman, a monkey, and a lumberjack, by 3:15 this afternoon."

Language use is often obscure, or indirect, or metaphorical, or hyperbolic, or linguificatory. But sometimes a prefix is just a prefix, and a cigar is just a cigar; sometimes "Snow is white" is simply an assertion that is true if and only if snow is in fact white; and sometimes a man who says "Get me a one-legged woman, a monkey, and a lumberjack" means exactly that and nothing else. (Ms Dern says, by the way, that the request was complied with, and by 4pm they were actually on the set filming, with a one-legged woman, a monkey, and a lumberjack.)

Posted by Geoffrey K. Pullum at 11:22 AM

Sometimes a prefix is just a cigar

Barbara Partee wrote to suggest that Verlyn Klinkenborg's NYT Op-Ed essay "Post-", 1/20/2007, "would be nice to share". Since Barbara is one of the few living linguists to have a university named after her, I pay close attention to her suggestions. Klinkenborg's lede:

It may seem odd to celebrate a prefix. After all, it isn’t even a whole part of speech. It does for a word, visually, what a big cigar does for a small man.

And his ending:

The beautiful irony of it all is that there is nothing remotely post-political in being post-partisan. Even the most political candidates may come to think of post-partisanship as a necessary precursor to being pre-presidential.

Posted by Mark Liberman at 07:22 AM


Time Magazine recently revamped its online presence, relaunching Time.com with a host of new features. One of them is a daily news aggregator summarizing top stories from newspapers and blogs. But since "aggregator" doesn't sound particularly hip, they're calling it "The Ag." So far, there are only five categories for items in "The Ag": "National," "World," "Politics," "Iraq," and, curiously, "Celeb-u-Gossip." Why not a more straightforward blend like, say, "Celebri-Gossip"? Where did that -u- come from? The likely story is that this is a blend upon a blend, grafting gossip to the end of the combining form celebu-, which owes its current popularity to celebutante and its many spinoffs.

Let's start with celebutante, a blend of celebrity and debutante. (Like many successful blends, there happens to be some phonological and graphological overlap, in this case -eb-, to help cement the connection between the two base words. See my post "Blawgs, phonolawgically speaking" for more on overlapping blends.) The original celebutante was Brenda Frazier, whose debut into New York high society on December 27, 1938 was accompanied by unprecedented hype. The columnist Walter Winchell coined the term celebutante in Frazier's honor, though it didn't appear in his widely read "On Broadway" column until the following April. In a list of "Faces About Town" he included:

Brenda Frazier, who inspires a new 1-word description: Celebutante.

(Newspaperarchive currently turns up at least three iterations of Winchell's syndicated column: Charleston [W. Va.] Daily Mail, Apr. 6, 1939; [Burlington, N.C.] Daily Times-News, Apr. 7, 1939; and [Reno] Nevada State Journal, Apr. 11, 1939. The OED entry for celebutante cites the last of these.)

It would take nearly half a century for celebutante to become repopularized in the American popular press. The second time around it cropped up in a June 3, 1985 Newsweek article about New York's flamboyant "club kids," who shared only the vaguest lineage back to Brenda Frazier. (Club kid James St. James, who was featured in the Newsweek article, wrote a memoir called Disco Bloodbath, later serving as the basis for a documentary and feature film both under the name Party Monster.) Newsweek wrote:

Young 'celebutants' hope to win the approval of full-fledged downtown celebrities like Dianne Brill, a voluptuous 26-year-old menswear designer who reigns over the New York night in red vinyl evening gowns.

By using celebutants instead of celebutantes, the writer was apparently trying to uphold a gendered distinction, as between male debutants and female debutantes. These days, however, one rarely sees the e-less version of celebutante, regardless of the person's gender (though celebutantes tend to be young women anyway).

Celebutante caught on in the wake of the Newsweek article, eventually getting yet another boost with the rise of such "famous for being famous" celebs as Paris Hilton and Nicole Ritchie. And in the hyperactive celebrity-watching of the Internet age, celebutante has spawned a multitude of second-order blends with celebu- as the first element. Some of these new blends are innocuous enough, such as celebumom (noted on the American Dialect Society mailing list in Sep. 2004). More often than not, however, the new celebu-blends have been intended as snarky put-downs of the Hilton-Ritchie set. One prominent example is celebutard, which uses the unpleasant -tard element of retard. The same element has also shown up recently in debutard, fucktard, and lactard. (The last of these, winner in the "Most Creative" category in the 2006 ADS WOTY voting, is a self-effacing term for someone who is lactose-intolerant.)

The gossip-bloggers at Gawker enjoy using another colorful put-down: celebuskank. (Skank, of course, is meant to be interpreted in the sense given by American Heritage as: "One who is disgustingly foul or filthy and often considered sexually promiscuous. Used especially of a woman or girl.") As part of a faux-lexicographical roundup last July, Time Out New York supplied a helpful definition (complete with a silly etymology and inaccurate pronunciation):

Gawker hasn't stopped with celebuskank, however, also introducing celebu-architect, celebu-cuisine, and celebu-lurker, among others. But another gossipy blog, Steve Hall's Adrants.com, has really gone celebu-crazy. Adrants has featured celebu-bash, celebu-billionaire, celebu-campaign, celebu-face, celebu-fashion, celebu-fragrance, celebu-lusting, celebu-media, celebu-model (or celebu-spokesmodel), celebu-obsessive, celebu-publishing, celebu-rag, celebu-sneaker, and last but not least, celebu-shit. Hall has evidently been spreading the celebu-gospel for quite a while now. Back in Sep. 2003, when Gawker founding editor Elizabeth Spiers was fielding names for a new Manhattan-based gossip blog, Hall's suggestions included "Celebu-blog" and "Celebu-snark." (Spiers eventually went with "The Kicker," now defunct.)

So a celebu-blend seems to have a certain trendy cachet that you wouldn't find in a more mundane celebri-blend. And celebu- is close enough to celebri- to maintain a resemblance, with only a change of the unstressed syllable that serves as a bridge to the second blend element. (The first two syllables carry the semantic weight, especially since celeb has been a clipped form of celebrity since the early 20th century.) New celebu-formations may also be helped along by the fact that the medial syllable -u- [ju] seems particularly blend-friendly, whether it's part of the first element or the second: think of docudrama, rockumentary, edutainment, Blacula, and AccuWeather.

Time.com further highlights the -u- syllable by setting it off with hyphens. "Celeb-u-Gossip" has a bit of a retro feel, reminiscent of "While-U-Wait" signs on old storefronts. At the same time, it suggests the many websites prefixed by You- or U-, from Youtube to Uclick. Unfortunately for Time.com, it also brings to mind Time's much derided selection of "You" as Person of the Year. Leave it to a Gawker commenter to drive the point home:

The U stands for "You," as in Time's Persons of the Year -- as well as in "You report gossip and we'll repeat it in our own words, just as if we had reported it ourselves."

Posted by Benjamin Zimmer at 12:59 AM

Standardizing away the world's languages

The transmutation of a prime into a skull and crossbones reported by Geoff is an example of the all too familiar incompatibility of files produced by different word processors and different versions of the same word processor, especially Microsoft Word. These incompatibilities are not only annoying and time-consuming, but where the file formats are secret, as are Microsoft's, they make it nearly impossible for competitors to inter-operate completely, make access to archives difficult, and lock users into the same product line. Among other things, this reduces competition and therefore increases costs for consumers.

In response to this problem some years ago a consortium was formed to create an open standard for exchange of documents. The standards group began work in 2002 and completed its work in 2005. The result was the Open Document standard, which you can read for yourself here. The official version of the standard is that produced by the International Organization for Standardization (ISO), available here, but they'll charge you 342 Swiss francs ($274). Open Document is open in the sense that anyone may read it and anyone may use it without obtaining a license or paying royalties. Part of being open in this sense is being sufficiently specific that someone wishing to implement the standard has all of the information he or she needs. If you write a word processor that exports in ODF, I can, using only the specification, without any other information about your program, write a word processor that will import your document perfectly, and of course, conversely.

ODF is a very good thing for just about everybody, from Geoff to the Commonwealth of Massachusetts, the National Archives of Australia, the Allahabad High Court, and Belgium, all of which have adopted it. One entity that is not too keen on ODF is Microsoft. In an effort to prevent ODF from becoming the universal standard, Microsoft suddenly came up with its own "open standard" [49 MB PDF document] known as Open XML. Open XML is not actually an open standard because it leaves some elements publicly undefined. Some elements, for example, are defined only by reference to secret Microsoft specifications. In any case, it isn't really a standard of the usual sort because, in its attempt to enshrine every detail of the formats used by Microsoft products it is much more specific than a normal standard. That is why it runs over 6,000 pages while ODF runs a mere 737. Rob Weir at An Antic Disposition has a hysterically funny blog post about the Open XML specification. I especially like his observation that: "This is not a specification; this is a DNA sequence."

Microsoft is now trying to get Open XML approved by the ISO. The current phase of the process is what is called the "contradictions" stage, in which contradictions between the proposal and existing standards are investigated. The process is described by Pamela Jones in this Groklaw article. Some of the contradictions that have already been pointed out are discussed by Andy Updegrove in this Standards Blog post. For example, Open XML does not follow ISO 8601, the standard for representation of dates and times. Why? Because whoever wrote the code for computing dates in a Microsoft product long ago did not know that 1900 was not a leap year. (Years divisible by 100 are not leap years unless they are divisible by 400.) Open XML requires conforming implementations to replicate this Microsoft bug forever.

Now, you might be wondering what this all has to do with linguistics. Well, one of the things that document metadata specify is the language of the document. The Open Document standard does this correctly. It uses (p. 61) the three-letter language codes of ISO-639, followed by a two-letter country code following ISO 3166. This allows for the specification of any of the world's languages. A three letter code allows for as many as 17,576 languages. ISO-639-3 in fact already encodes most of the world's approximately 6,700 languages. Open XML, on the other hand, does not follow ISO-639-3. Instead (section 2.18.52), it requires that languages be specified by means of two hexadecimal digits, e.g. 0x09 for English. That means that no more than 256 languages can be accomodated. The list of languages available is in the document referenced above on pp. 2531-2537 but for the two-letter hex codes you'll have to look elsewhere because Microsoft doesn't list them together with the languages. For some reason it gives a completely different set of non-hexadecimal codes ranging from 1025 to 58,380. The hex codes can be found in the fourth column of this table, the one labelled "Win Code".

In short, the Open Document standard provides for all the languages in the world, while Open XML excludes the great majority. This isn't a matter of ignorance. Microsoft has employees like Michael Kaplan who are quite knowledgable about the world's languages and the technical issues that they raise, but business strategy comes first.

Posted by Bill Poser at 12:30 AM

January 19, 2007

Skull and crossbones OK?

I swear I just had a Word document sent back to me from England by a copy editor who, in a truly admirable display of willingness to assume author eccentricity rather than error, appended a Track Changes comment to one of the characters in the document (which I had been required to submit in Word form): Comment: This displays as a skull and crossbones. OK?

No! No! No! The paper in question is a perfectly ordinary review article about the history of linguistics by Barbara and me, and contains nothing about pirates. Tell me whether you detect a difference between the two characters seen here:

′     ☠

If you can't see a difference, or can't see any characters at all on the line above, your browser is not displaying Unicode characters, and the best thing for you to do would be to switch your computer off and sit down and cry. The character in the formula in our paper was supposed to be PRIME, Unicode 2032, not SKULL AND CROSSBONES, Unicode 2620. Indeed, it even displayed as a prime on the Windows machine where I was doing the final corrections, but apparently not on the copy-editor's machine in England.

I am not a dingbat-crazed lunatic notation inventor, I told her. I'm just one of the millions of victims of the phenomenon of word processor file transfers always screwing up all the special characters. All our arrows had turned into stars and daggers and things; and all the bullets were exclamation marks by the time they reached the UK. Please God, make the Unicode revolution come soon before I lose it and fire a bullet at a word processor programmer. (It wouldn't kill him, of course; it would just turn into an exclamation mark and drop harmlessly to the floor.)

Update: To be specific about the changes: my PRIME turned into a SKULL AND CROSSBONES; my RIGHT ARROW (→) turned into some kind of a star (remember, I haven't seen it); my LEFT-RIGHT ARROW (↔) turned into a dummy symbol box; and BULLET (•) turned into exclamation mark throughout.

For the Word defenders who want to say perhaps there is some other explanation than a bug in the character code pages in some version of Word, let me add that I have now confirmed that the copy editor in English used Word 2003 running on Windows XP Professional on a Dell desktop. I have viewed the file I sent (though it is true that it was originally an RTF created by WordPerfect 11, it was then opened in Word to make sure things were correct, and was re-saved), using Word 2003 running on Windows XP on an HP. On my system the prime is a prime and the arrows and bullets are arrows and bullets. So we are not talking about a transfer between architectures (like from PC to Mac), and we are not talking about a failure of WordPerfect to write legal RTF, and we are not talking about me using Word and the copy editor viewing the file with some other program.

Sorry, Microsoft apologists, but the finger of suspicion here still points toward Redmond, Washington, and the morass created by years of ad hoc proprietary decisions about how to encode characters. We poor end users deserve better.

Posted by Geoffrey K. Pullum at 03:22 PM

If we could just talk

The New Yorker film critic Anthony Lane adds a linguistic note to the end of his review of The Abduction: The Megumi Yokota Story (1/22/07:91), about the agony of the parents of a young Japanese woman kidnapped in the 1970s by the insane DPRK regime (Megumi was never seen again; her poor parents never even got verified remains after her death): "The worst aspect is, as so often, the most prosaic: the main reason that North Korea stole human beings was because it needed language teachers for its spies. All that grief, just for a chance to talk." Indeed. Total isolation of a nation is not compatible with good access to language-learning opportunities. What the DPRK needed was not warm bodies but native speakers — something other countries can get via normal communications, and open borders, and immigration, and contact with foreigners. What a tragedy.

Posted by Geoffrey K. Pullum at 12:51 PM

Academically powerful words

While Geoff and Eric are listing the words they find most distasteful in titles -- "revisited", "redux", and "whither" for Geoff, "status", "nature", and "role"/"rôle" for Eric, though Eric promises more to come -- Stanford Daily columnist Katie Taylor has been cataloguing the "catchwords of the literati" in a 1/17/07 humor piece.  Taylor thoughtfully provides a Top Ten list of words that you can wield to better your academic life.  There's no overlap (yet) with Geoff's and Eric's lists.

Taylor uses stereotypes about discourse markers to set things up:

Valley girls insert "like" into the holes of their oral communication.  Teenagers include "you know" into [AMZ: "into" probably persevering from "insert into"] many of their dialogues.  Stanford students, however, fill the gaps of their in-class comments with "juxtaposition."

On to some quantitative claims:

Although "juxtaposition" is far and away the most frequently-heard word in fuzzy lectures, discussion sections and all other professor-student interactions, it is just one of the Top 10 Catchwords of the Literati.  The repeated usage of these chosen terms by the upper echelons of intelligentsia transforms them into, essentially, the "you knows" of the tenured track.  To sound smart, you need not learn hundreds of GRE words, or switch your internet homepage to wordaday.com.  In fact, all that is required to impress your peers, professors and oftentimes even yourself is to master the Top 10 list.

Written in increasing order of frequency, the Top Ten Catchwords of the Literati are as follows:
10) Iconoclasm
9) Ubiquitous
8) Paradoxically
7) Subjective/objective
6) Duality
5) Feminist
4) Ironic
3) Dichotomy
2) Race/ethnicity
1) Juxtaposition

... a solid understanding of these 10 words guarantees any student an opportunity to climb up the vocabulary-slinging, multi-syllabic word-dropping, intellectual ladder.  In fact, every title of every book that every Stanford professor has ever published contains one of these words.  Additionally, the statistics further demonstrate that every A paper includes on average six words from the Top 10 list, while the average B paper contains merely three to four.

(You might have wondered about those fuzzy lectures and so on.  This is not a reference to furriness or to fuzzy logic or woolly thinking.  Here on The Farm there's a distinction between fuzzy subjects, students, jobs, etc. and techie -- sometimes "techy" -- ones.  It's, roughly, humanities, arts, and the social sciences vs. the natural sciences, mathematics, and engineering.  We take these things seriously here.  Why, a while back our Anthropology department split into a fuzzier department, Cultural and Social Anthropology, and a techier one, Anthropological Sciences.)

Some of these words people have been criticizing for decades; "dichotomy" and "ubiquitous", in particular, are often seen as words for pseuds (people "with pretensions to cultural or intellectual sophistication", as Wordspy puts it).  Somewhat surprisingly, not one of the ten, even these two or "juxtaposition", is in Robert Hartwell Fiske's Dimwit's Dictionary, a collection of 5,000 words Fiske deems to be "overused".

Though it's probably a mistake to take Taylor's list even semi-seriously, I note that two of the ten -- "race/ethnicity" and "feminist" -- are there because of the topics of some of the classes Taylor is reporting on, rather than because of the vocabulary people use in talking about intellectual matters.  Any of the following might have gotten on the list for this reason: "gender", "heteronormativity", "homophobia", "misogyny", "racism", "sexuality", "social class", "stereotype".

Two more of the items -- "paradoxically" and "ironic" -- are often targets for criticism because they are frequently used loosely, expressing mere surprise on the speaker's or writer's part, rather than actual paradox or irony.

And then there are other words that could have been contenders, for instance: "antithesis", "bifurcate", "conflate", "counterpose", "prolegomenon", "reiterate", "synthesis".  You can probably think of some others to juxtapose to these.

[Addendum: Martyn Cornell has managed to jam all ten words into a single sentence: "Paradoxically, the ubiquitous juxtaposition of an ironic feminist subjective/objective dichotomy alongside the duality of race/ethnicity is not iconoclasm."  It sort of flirts with meaning at several points, without actually achieving meaningfulness.  He is hoping for an A+ for this submission, but I told him that Katie Taylor is doing the grading on this one.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:48 AM

Scholarship is hard, let's go drinking

As Geoff Pullum just pointed out, Christopher Orlet's recent American Spectator piece "Among the intellectualoids" is a "standard-issue English-going-to-hell rant". Orlet himself, a columnist for Vocabula Review, turns this stuff out by the yard. For a lovely example of the genre's low intellectual standards, consider this passage from his "Grammar for Smarties", The American Spectator, 1/11/2005:

Indeed, the battle cry of the language liberal might be, "Languages change. Get over it." Most linguists judge that language change is neither good nor bad, and, anyway, resistance is futile. Languages, like hemlines, will change whether we want them to or not. This indifference to standards is reflected in the latest editions of our popular dictionaries in which words that are commonly misspelled (alright) or misused (disinterested) have been given the lexicographer's stamp of approval.

Yet despite all this talk of transformation the mother tongue has gone remarkably unchanged since the King James Version of the Bible began to stabilize the language in the mid-seventeenth century. Words come and go, yes, but a letter written 367 years ago by John Milton to Benedetto Bonomatthai reads much like one composed by a good writer today:

I am inclined to believe that when the language in common use in any country becomes irregular and depraved, it is followed by their ruin or their degradation.

But what John Milton actually wrote in Florence on September 10, 1638, was

... equidem potius collabente in vitium atque errorem loquendi usu, occasum ejus Urbis, remque humilem & obscuram subsequi crediderim ...

At least, that's how it was published in Joannis Miltonii Angli, Epistolarum familiarium liber unus quibus accesserunt, ejusdem, jam olim in collegio adolescentis, prolusiones quædam oratoriae, Londini : Impensis Brabazoni Aylmeri, 1674, which is available in digital form from Early English Books Online. While we're at it, let's reproduce the full and more eloquent sentence from which this fragment was taken by a 19th-century translator mindful of his audience's reduced attention span:

Neque enim qui sermo, purusne an corruptus, quaeve loquendi proprietas quotidiana populo sit, parvi interesse arbitrandum est, quae res Athenis non semel saluti fuit: immo vero, quod Platonis sententia est, immutato vestiendi more habituque graves in Republica motus, mutationesque portendi, equidem potius collabente in vitium atque errorem loquendi usu, occasum ejus Urbis, remque humilem & obscuram subsequi crediderim: verba enim partim inscita & putida, partim mendosa, & perperam prolata, quid nisi ignavos, & oscitantes, & ad servile quidvis jam olim paratos involarum animos haud levi indicio declarant?

Here's an image that includes the crucial fragment:

(The whole letter in .pdf form is here.)

The version that Chris Orlet quoted was composed in the middle of the 19th century, and published in The Prose Works of John Milton: With a Biographical Introduction by Rufus Wilmot Griswold. In Two Volumes (Philadelphia: John W. Moore, 1847). Volume II: Familiar Epistles, Translated from the Latin, by Robert Fellowes, A.M. Oxon. (Fellowes' text is reproduced here.)

Chris continues:

Often there is good reason to be skeptical of change, particularly when it comes about out of laziness and the dumbing-down of grammar rules. Again, compare Fowler's inflexible 1926 Dictionary of Modern English Usage to current grammars like Woe is I, in which rules that are troublesome or too difficult to remember are pronounced outdated or dead. (Rats, if I had known this was possible in my college days I would have pronounced Algebra outdated and dead and gotten on with my binge drinking.)

Stop by and see me in Philly any time, Chris, and I'll buy you a round.

[Note that this is not the first example of questionable linguistic scholarship that we've observed in The American Spectator, a publication generally better known for fervor than for footnotes. See "Slurry" (11/24/2006) and "Slurry Accent II" (11/27/2006).]

Posted by Mark Liberman at 08:23 AM

Stupid wild over-the-top anti-linguist rant

Another standard-issue English-going-to-hell rant, if you can bear it, was published today in The American Spectator under the heading "Among the intellectualoids: Our inarticulate future." The author is Christopher Orlet. Naturally, linguists get a drubbing. Orlet's gloss on the meaning of "prescriptivists" is "those who would uphold grammatical standards". Linguist David Crystal's reference to prescriptivists as "linguistic Stalinists" (a nice analogy, actually, and historically justified) evidently marks him out as a creature of evil. The coming total destruction of all ability to communicate in English "suits some linguists fine", says Orlet. Linguists don't care even if change leads to "a future where communication resembles the squeals and grunts of . . . Neanderthals . . . , even if it leaves mankind with the inability not only to examine and express complex ideas and concepts, but to express anything beyond mere animalistic impulses." Totally over the top. Where does he get these ideas? Is there a single linguist who has ever said anything to support this raving? I glory in the English language, study its complex syntax with great interest, write books on it, and insist on my students writing it accurately and correctly. What am I, chopped liver? At which kinds of parties does Orlet meet the "linguists" of which he speaks, and what do they smoke there? Not that I would want to attend, I suspect.

Posted by Geoffrey K. Pullum at 01:57 AM

January 18, 2007

A full year of The New Y

I know I promised a while ago that I wouldn't be posting further sightings of the snowclone The New Y.  But now Randall Szott at Leisure Arts has offered me a diagram summarizing a whole year's sightings:

The project documents every instance of the phrase "is the new" encountered from various sources in 2005. It is intended to map the iterations of a peculiarly common marketing and literary device.

Here's a copy for you to look at right here on Language Log:

It's fascinating as an art object as well as a presentation of large amounts of information.  And the centrality of black is very clear.

Since I'm already posting, I might as well include the latest sightings at Language Log Plaza:
  • On New Year's Eve Eve Eve, Chris Worth wrote to quote a friend who asked: "Didn't you know?  New Year's Eve Eve is the new New Year's Eve".
  • Then the Zippy cartoon of 1/6/07 had this exchange:

    Zippy: Yes.  Adam Sandler is th' Jerry Lewis of our time -- a true auteur!
    Griffy: Don't hold your breath.
    Zippy: Will Nancy Pelosi listen to Adam Sandler's plan for withdrawal from Iraq?  Or are we all doomed?
    Griffy: You're wrong.  Donald Rumsfeld is th' new Jerry Lewis.
  • On 1/9/07, Jason Grafmiller noticed, as he was driving down the freeway, a sign for a hardware store that declared: "Grass is the new concrete."  He and I found this one mildly puzzling.
  • Then on the front page of the New York Times of 1/15/07, Louise Story reported in "Anywhere the Eye Can See, It's Now Likely to See an Ad":

    "We never know where the consumer is going to be at any point point in time, so we have to find a way to be everywhere," said Linda Kaplan Thaler, chief executive at the Kaplan Thaler Group, a New York ad agency.  "Ubiquity is the new exclusivity."

    Another one that takes a moment to piece out.

[Addendum 1/19/07: Ben Zimmer relays a report from Bob Yates on a recent article in Salon Magazine, which begins with a great wave of The New Y: "Now that ugly is the new pretty, white is the new black, and black is the new pink, back fat is the new six-pack, crotch shot is the new boob flash, Eastwood is the new Spielberg, babies are the new handbag dogs, handbag dogs are the new shelter dogs, and George Clooney is the new George Clooney, I guess it's about time that midseason became the new fall season."]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:18 PM

Equal Opportunity Idiocy

It turns out that Saudi Arabia is not the only country with an aversion to symbols reminiscent of the cross. It turns out that in some circles in Israel, the plus-sign is avoided due to its resemblance to the cross and is replaced with a version that looks like this: It is actually in Unicode, at codepoint U+FB29, dubbed HEBREW LETTER ALTERNATIVE PLUS SIGN.

On the basis of information from a few correspondents, it seems that the truncated plus sign turns up in some ulpan (Hebrew language classes for non-Israelis) and in the lower grades of primary school. It seems to be unheard of at the university level. A friend reports that her daughter was taught arithmetic with both the truncated plus sign and with a division sign ÷ with the horizontal removed, so that it looks like this:   . Now in grade 5 her class is using the usual plus-sign but is still using the modified division symbol. I'm guessing that the distribution of usage has to do with variation in the influence of the religious right.

I suppose that this is marginally saner than the Saudi objection to X since a plus sign more closely resembles a cross, but however you cut it, it is still pretty stupid. I'm a Jewish atheist who has read and written many a plus-sign and not once have I felt oppressed by the Christian overtones of mathematical notation.

Addendum: Several people have written to tell me that the colon-like division sign is used in elementary education in their country. This seems to be the case in France, Germany, the Netherlands and Norway at least. So there seem to be two different things going on in Israel - cross-avoidance with the truncated plus-sign, but ideology-free regional variation with the division-sign.

Posted by Bill Poser at 03:57 PM

Words in titles redux

Now that Geoff has opened the door on hated words (words of hate?), I have a list of my own that I'd like to get off my chest. It's a really long list, though, so I'll have to do this in installments.

This week's installment -- and I'm not hereby suggesting that I'll have a new installment every week -- are words that I'll never write in a title of an academic paper (or book, or whatever). This follows up on a side comment Geoff makes in his post:

It's like the fact that I will never (I hereby pledge) write an academic paper with "revisited" (or its Latin verson "redux") or "whither" in the title, or write a blues song that begins with the words "Woke up this morning".

And you may have already figured out that the list of words I love to hate in titles does not necessarily overlap with Geoff's.

And the words are: status, nature, and role (esp. when written rôle).

Yes, the list is short, but the words in it share certain crucial things in common that are at the root of what I hate about them, so I can imagine expanding the list to include other, similar words. What I hate about these words is perhaps specific to works that I've read with these words in their titles. Call me crazy, but I think that when I pick up a paper or book with a title like "The status of such-and-such" or "The nature of such-and-such" or "The rôle of such-and-such", it is reasonable for me to expect to learn something about the status, nature, or rôle, respectively, of such-and-such -- namely, what the author(s) think(s) the status, nature, or rôle of such-and-such is, and what arguments they will put forth to support their thinking. Invariably, though, I come away from reading works with these kinds of titles feeling quite unsure of what I've learned about the status, nature, or rôle of such-and-such. I think this is because there's a sense in which anything you say about such-and-such says something about the status, nature, or rôle of such-and-such, and authors who use these kinds of titles know this. But in that case I wish they'd use more specific words in their titles: if you're only going to say one thing about such-and-such, say what that one thing is in your title. Don't give me this promise that you're going to say everything about such-and-such when you're really only saying one thing.

At this year's secret annual cabal of linguists, I told a few friends about my strong distaste for these words in titles. One of these friends expressed concern that they may have to look over their old titles to make sure they hadn't committed an offense. But they shouldn't have worried: it's just me who hates these words in titles. (In fact, some of my favorite linguists have used them.) But if I've succeeded in convincing my friends (and I include all of you Language Log readers here) to hate these words in titles, too, then by all means: commence not using them.

I can't leave without saying something about Geoff's pledge to never "write a blues song that begins with the words 'Woke up this morning'". I would ordinarily agree with such a sentiment, but in this particular case I happen to wish it had been me rather than my good friend Derek Gross who had come up with the following lyrics to the world's shortest blues song:

Woke up this morning / went back to bed

[ Comments? ]

Posted by Eric Bakovic at 03:53 PM

2500 words for cursing the weather

According to Nathan Bierma (or the editor who wrote his column's headline), "Hell will freeze over before the Eskimo snow myth melts" (Chicago Tribune, 1/17/2007). Certainly the cartoonist Martin Kellerman, in his Rocky strip for 1/18/2007, is doing his part to restrain the demand for solid-phase water terminology in the nether regions (though not, apparently, in Sweden):


Rocky: Fuck snow, man! Fuck... snow...
Bird: Did you know that the Eskimos have 2,500 words for snow?
Rocky: How many words do they have for fuck...snow?

I don't know the Rocky culture well enough to understand why Rocky is using English to curse the weather -- you'd think that the Swedes would have developed an adequate native vocabulary for this purpose, over the centuries.

[Hat tip to Ed Keer at Watch Me Sleep]

[Update -- Ed Keer wrote to draw my attention to one of his earlier blog posts, which helps to make the role of English cuss words in Swedish a bit less obscure:

Taboo swear words are probably among the first thing a second language learner learns if they have a teenage mentality. But while it's easy to master swear words, I don't think you ever really internalise the depth of feeling associated with the taboo.

As a case in point, a few years ago, a movie about a teenage lesbian stuck in stifling a small town was all the rage in Sweden. The Swedish title of the movie is Fucking Åmål! Since Åmål is just the name of the small town in question, this Swedish movie title is really in English-and probably offensive to some.

I have heard that the original title of the movie was Fucking Jävla Skit Åmål! mixing some Swedish curse words in with the English. That title literally means fucking devil shit ... and can be loosely translated as Goddamn fucking shit ... Now if we could somehow objectively measure the degree of offensiveness of words across languages, I would bet that jävla skit is less offensive than fucking. But apparently the Swedish film board did not see it that way. They balked at jävla skit but allowed fucking!

Most Swedes know English. But surely, they don't really know the power of fucking.

When Fucking Åmål! was released in America, the title was changed to Show me love. Ugh! Talk about a tin ear translation. They could at least have gone with Damn Jävla Skit Åmål! I wouldn't have been offended.

Eric Vinyl also wrote to draw my attention to the same movie title shift.]

[And Axel Theorin writes from Osaka, Japan:

The simple (or rather quite complicated) fact is that Rocky's choice of expressions accurately matches that of his real life counterparts in Stockholm and most parts of Sweden.

Immersed in English language cultural expressions from a very early age, Swedes use, adapt and create entirely new English words and expressions in all manners of conversations (and even columns in the national press). Many lament this and call it a sign of attrition in these speakers' Swedish ability, which is what the joke of the strip rides on.

In Rocky's Swedish, the meaning "snow" is signed by the symbol "snow", while his friend uses the (more) Swedish "snö" (did that o with umlauts come out right?) . Accordingly he notes what his friend has to say about the Eskimoes and their astounding number of words for this "snö"-phenomenon, and then proceeds to connect this to what is on his mind: "snow".


[Axel adds:

After having read the update with the extra citations from Ed Keer's blog I cannot help myself but comment in turn.

Ed Keer's feeling on the subject obviously differs, but I would say that English swear words retain most of their feeling of taboo. Especially a word like fuck since swear words on the sexual theme are virtually nonexistent in "native" Swedish, except for expressions casting doubt on the virtue of a woman (I know only two which are felt as swear words: kuk - cock and fitta - cunt). Thus "fuck" is a very strong swear word in the Swedish context and is felt even stronger in (real life) English contexts were many Swedes will refrain from using it alltogether. Even though this in turn becomes unnatural (if the context is England, according to reports). On the other hand, the prolific use of "fuck" seems to have opened some doors and lately you will here, although infrequently, Swedes swear by the act of sexual intercourse using the native word "knulla"


[Victor Steinbok forwarded these observations "from a friend with some knowledge of Scandinavian and other Baltic languages":

Unlike Russian, the word for "fuck" "knulla" can't be adjectivized and is not usually used as an expletive. In other words, Swedish for "Fuck you!" is "Fuck you!" or "Ta dig i röven" "Take yourself in the ass". It's just not one of the great cussing languages. Even Finnish is better. When the best you've got to come with when you hit your thumb with a hammer is, "Fie, Devil!", you're pretty sad.

Some of the Finnish options were discussed here.

I take it that the Swedes are code-switching, not actually borrowing English curses into Swedish, either as calques or as fully nativized loans. Perhaps this doesn't work with taboo words. I'm glad to see, in any case, that we English speakers have been able to make a contribution to Swedish culture.]

[Update 1/27/2007 -- Lars Svensson writes:

A friend just pointed me to the Language Log. Great stuff! Hope I found a good email address.

The Rocky story is a few days old already, so this comment may be woefully late; but to appreciate the dialogue and the rôle of the language mix, I believe it helps to know that Rocky and his friends are rap and hip-hop enthusiasts, who frequently quote gangsta lyrics more or less accurately (or rather modified to fit the occasion). I'd not be surpised to find a line similar to "fuck...snow" on a CD liner.

Btw, it's easy to adjectivize "knulla" as "förknullad", in loose correspondence to "fucking", "fucked-up", and German "verfickt". The Swedish word is not in common use but readily understandable, will not be mistaken as a reference to intercourse, and (by virtue of not being common) makes the listener take notice, which is the whole point of thoughtful (as opposed to routine) swearing.

Ah well. Back to grading engineering exams. Sigh.


Posted by Mark Liberman at 02:40 PM

People of color

I want to confess to a straightforward idiosyncratic personal linguistic preference — an aesthetic judgment, if you want to call it that. At the end I draw out a lesson from it. The confession is this: I simply hate the term person of color (along with its standard pluralization, people of color). I have never used it, and I never will. They can't make me. Fifteen years ago I was a graduate dean, running an affirmative action program and making speeches on the topic, and I didn't use the phrase then. I happen to be a firm defender of affirmative action legislation; and back then all administrators used the phrase as a badge of having the right attitudes concerning ethnic diversity, even if they were privately skeptical about affirmative action. But although I walked the walk, I wouldn't talk the talk. I didn't refer to African Americans and Chicanos and Latinos and Asian Americans and Pacific Islanders as "people of color", and I still won't.

This is a linguistic dislike I'm expressing, and it's not much easier to convincingly justify (though I will supply a little reasoning below) than any taste judgment in music or art. You may dislike split infinitives, in which case you will have winced at the foregoing sentence; I dislike person of color and people of color.

I speculate (unverified empirical claim coming up) that most other people who dislike or avoid using the phrase person of color are negatively disposed toward the aims and ideals of political movements focused on rights of ethnic minorities. They refuse to go along with terminological innovations that they scornfully associated with "political correctness". That, as it happens, is not to be the case with me. I have exactly the sort of attitudes that the haters of "political correctness" generally despise. I not only believe in legislative and social action to ensure rights for ethnic and racial minorities, I also fully accept the right of ethnic and racial groups to decide how they would like to be referred to. Except in the case of this particular phrase, which I happen to I hate.

Perhaps this is as good a place as any to note that if my life involved interaction with Yup'ik or Inuit people, I would probably use those terms when talking to or about them and their people, rather than calling them "Eskimos", if they didn't like the latter word. ("Eskimo" has often been said to have an insulting etymology involving a word meaning "eater of raw flesh" in some Athabaskan Indian language. That turns out to be a myth: the real source word means something like "snowshoe weaver" [thanks to Jesse Sheidlower on this point].) I do in fact use the word Eskimo (and occasionally get criticized for it), but my uses of it are either references to the Eskimo branch of the Eskimo-Aleut language family, where the word is a technical term in English as used in comparative linguistics, or else they are repetitions of those hundreds or thousands of people who have asserted that the "Eskimos" have many / dozens / scores / hundreds of words for snow. They never know which exact language they are talking about, so in that context we are talking about all the indigenous Greenlandic, Arctic Quebecois Inukttitut, Central Canadian Inuit, Alaskan Inuit, Alaskan Yup'ik, and Siberian Yup'ik peoples; restricting to one of those would make no sense. But I definitely accept that if the indigenous people of Greenland want to be called Greenland Inuit rather than Greenland Eskimos, that is their right, just as it is their right to have Greenland called Kalaallit Nunaat, and to have the former Godthåb called Nuuk. My point is that I don't use Eskimo because of a dislike of the words Yup'ik or Inuit; I use it as a technical term in language family classification.

In the past four decades I have willingly shifted from using "negro" to using "colored" to using "black" (or later "Black") to using "Afro-American" to using "African American", all for the single ethnic group that the conservative British linguist Geoffrey Sampson has (astonishingly) carried on calling "negroes" throughout four decades.

Generally people with political views like mine (positive about people of other ethnicities and strongly negative about racism) unhesitatingly use the term people of color, especially in public pronouncements. But I don't and won't.

To the extent that I can do anything to provide rational argumentation to support my dislike of the term, I would say that its semantic looseness is one problem. The phrase seems to function more as a badge of political progressiveness and racial tolerance than anything else; but to me it seems like an unwholesome capitulation to the old apartheid idea that there really is some meaningful division between people who are white and people who are not — it seems to presuppose and endorse the stupid idea that there really is some way of determining whether some random Armenian or Azerbaijani or Albanian or Afghan or Argentinian or Ainu or part-Aboriginal Australian is or is not a legitimate claimant to the label "person of color". I genuinely think it is nonsense to true and draw such a line. At one time under apartheid, South African law treated Chinese (who didn't have a lot of economic clout back then) as non-white, but Japanese (increasingly important to the economy) as white. Gives the whole game away, doesn't it? I say that hunting for the line between those who are white and those who are not is a fool's game.

Even setting this aside (after all, there are many other highly vague predicates), the quasi-archaic syntactic weirdness of the phrase makes my teeth itch. The phrase seems mincingly awkward to me in syntactic terms. The idea is to have a syntactic work-around so that the notion of not having pale pink-colored skin can be expressed without any appearance of going back to the 1960s uses of "colored". So colored person is replace by person of color. But there is no regular process that yields the pattern person of X for X-ed person: if you try the same thing with other -ed adjectives it sounds utterly insane: you can't refer to someone who is freckled as a person of freckles, or a person who is dazed as a person of daze. Batman, the caped crusader, is not a person of cape. [Update: Coby Lubliner tells me: "the process by which 'colored people' (which in US English has historically referred only to black people) became 'people of color' is through a retranslation from the French gens de couleur, at a time when the writings of Frantz Fanon et al. were popular." This is interesting. So is the fact that Jesse Sheidlower tells me the Oxford English Dictionary people have found a use of the phrase from as early as 1781. But of course these facts do not alter my aesthetic dispreference at all. That's what aesthetic dispreferences tend to be like.]

Even with adjectives of the form X-ored where X is a noun stem, it doesn't work: a person who is censored is not a person of censor; a person who is visored is not a person of visor; a person who is monitored is not a person of monitor; a person who is sponsored is not a person of sponsor.

This argument from failure of pattern is not adequate to say that there is really something wrong with person of color. There are, of course, such things as entirely idiosyncratic one-off constructions. But it really doesn't matter, because my confession is not that I have discerned a syntactic failing; it is simply that I hate the phrase. It irritates me.

So let me now stop with the whining and make the spinoff point about this confession (I did say at the start that there would be one, and there is).

I do not believe that there is anything wrong with having personal dislikes here and there, whether in the linguistic sphere or in fashion or in any other domain. I have them too. When we argue against prescriptive grammar here on Language Log, we are not saying people shouldn't have strong likes or dislikes on matters of English usage. We are saying those likes and dislikes should not be confused with objective facts of correctness, and they should not be taught to schoolchildren as if they were facts.

When I say that the phrase person of color just irks me, and I refuse to use it, that's a fact about me. It's like the fact that I will never (I hereby pledge) write an academic paper with "revisited" (or its Latin verson "redux") or "whither" in the title, or write a blues song that begins with the words "Woke up this morning".

So when I say of the phrase people of color that I dislike it, I'm not saying anything about what is linguistically wrong. I'm not going to publish a usage book in which I assert that person of color is a vulgar error and not good English and you shouldn't use it, because none of those things are true. You can use it all you want, and you're right to do so.. I'm just saying it annoys me and I won't say it. See the difference?

Because if you can, then obnoxiously opinionated compendiums of subjective edicts like Strunk & White's The Elements of Style will not do you as much harm as they do to the millions of poor abused Americans who believe those edicts spring from something real and important — something independent of some ill-tempered individual's personal preferences. Although I will sometimes (because of my academic specialism in English) have something factual to explain to you that you might do well to listen to, you don't have to follow me in any of my personal or aesthetic preferences. And you don't have to follow the dotty old Will Strunk or the hypocritically grouchy E. B. White in their unreasoned preferences either.

To apply this to a particular recent case: if you hate the phrase "less than three years", then don't use it. Use "fewer than three years" instead if you like. Don't try to tell other people the former is "wrong", because that's not true. Just avoid it yourself if you dislike it. That's all.

Posted by Geoffrey K. Pullum at 11:39 AM

The future of linguistics

I spent this morning's blogging hour writing letters of recommendation, so I'll just point readers to the slides for a talk I gave on January 6, at the 2007 Annual Meeting of the Linguistic Society of America. My title was "The future of linguistics", and a pdf of the abstract is here.

Posted by Mark Liberman at 08:37 AM

January 17, 2007

KishKish bangbang

It looks like BBC News scored a big scoop on December 14: "Skype users to get lie detectors", 12/14/2006:

Callers using internet phone system Skype who might be tempted to tell a few porkies should beware - the user on the other end may have a lie detector.

Skype is to offer the KishKish Lie Detector, which is made by BATM, as an add-on for customers.

It analyses audio streams over a Skype call in real time and illustrates the stress levels of the other person.

But experienced Language Log readers will be able to guess what's coming -- could this be another credulous BBC reproduction of a press release?

BBC news seems to have been the first to carry this story -- but more than a month later, no other major news outlet seems to have gotten onto it. Google News has 28 hits this morning for {Skype lie detector}, but they're all from outfits like TG Daily, Israel 21C and the Sofia Echo:

Wolfgang Gruener, "Lie detector for Skype watches over voice communications", TG Daily, 12/21/2006;
Nicky Blackburn, "Forget the tall tales - Israeli lie detector keeps Skype users honest", Israel 21C, 12/31/2006;
"Bulgarians, Israelis Develop Lie Detector for Internet Calls", Sofia Echo, 1/8/2007.

Although Wired didn't carry this as a news story, Ryan Singel at Wired Blogs had a sensible response -- he tried the plug-in out a bit, and asked for help in testing it further ("Help 27B Test Skype Lie Detector", Wired Blogs, 12/18/2006):

How well does it work? I don't know. A test call to "Clinton Denial" seems to show that Clinton was lying when he said he never asked any one to lie for him. Like I needed software to know that. But another call to the Skype Test Call seems to show that the woman thanking me for using Skype is lying. Does she really not care.

So, help me out folks. Make my internet phone jingle (my handle is Ryan Singel) and I'll ask you a series of questions after we chat for 15 seconds or so so a baseline can be set. I'll ask a 5 to 10 questions and you should lie or not lie accordingly.

I look forward to the report of the results -- but the information on the KishKish Lie Detector™ site indicates that this is an implementation of the idea of microtremor-based "voice stress analysis"; and thus a test of its effectiveness is likely to be a replication of a long series of (mostly negative) results that began several decades ago. KishKish tells us that

Voice Stress Analysis (VSA) is a type of lie detector which measures stress in a person's voice. The use of Voice Stress Analysis (VSA) as a lie detector became popular in the late 1970s and 80s. In the 90s the first Computerized VSA (CVSA) systems came to out to the market. The CVSAT is now the truth verification device of choice in the law enforcement community as the number of law enforcement agencies utilizing the CVSAT continues to grow dramatically, proving the viability of the system for twenty-first century crime detection. The CVSAT is also being utilized by the US Military in the global war on terrorism.

Now KishKish Lie detector offers you a tool to detect the stress level of the person you communicate with over Skype. With the use of KishKish Lie detector you can monitor in real-time the stress level of the person you talked with. This allows you to gage the level of stress and modify your questions in real time. You could also use our KishKish SAM VSA that allows you to record the call and analyze the stress level off-line.

I discussed "voice stress analysis" in a blog post a couple of years ago ("Analyzing voice stress", 7/2/2004), and I won't go over the same ground in detail again here. Some highlights:

I've been amazed by this work for almost three decades. What amazes me is that research (of a sort) and commerce (at a low level) and law-enforcement applications (here and there) keep on keepin' on, decade after decade, in the absence of any algorithmically well defined, reproducible effect that an ordinary working speech researcher like me can go to the lab, implement and test.

Well, these days there's no need to go to the lab for this stuff -- you just write and run some programs on your laptop. But that makes the whole thing all the more amazing, because after 50 years, it's still not clear what those programs should do. I'm not complaining that it's unclear whether the methods work -- that's true too, but the real scandal is that it's still unclear what the methods are supposed to be.

Specifically, the laryngeal microtremors that these techniques depend on haven't ever been shown clearly to exist, as far as I know. No one has ever shown that if these microtremors exist, it's possible to measure them in the pitch of the voice, in a way that separates them from all the other phenomena that modulate the pitch at similar rates. And that's before we get to the question of how such undefined measurements might be related to truth-telling. Or not.

In the absence of any clear recipe for replication, all that people can do is test the commercial devices, just as Ryan Singel at Wired Blogs set out to do. When the tests are carefully done, they generally appear to give negative results -- for example, Troy E. Brown et al., "Ability of the Vericator™ to Detect Smugglers at a Mock Security Checkpoint", DoDPI03-R-002.

As I observed,

How can I make you see how amazing this is? Suppose that in 1957 some physiologist had hypothesized that cancer cells have different membrane potentials from normal cells -- well, not different potentials, exactly, but a sort of a different mix of modulation frequencies in the variation of electrical potentials between the inside of the cell and the outside. And further suppose that some engineer cooked up a proprietary circuit to measure and display these alleged variations in "cellular stress" (to the eyes of a trained cellular stress expert, of course), and thereby to diagnose cancer, and started selling such devices to hospitals, and selling training courses in how to use them. And suppose that now, almost half a century later, there is still no documented, well-defined procedure for ordinary biomedical researchers to use to measure and quantify these alleged cell-membrane "tremors" -- but companies are still making and selling devices using proprietary methods for diagnosing cancer by detecting "cellular stress" -- computer systems now, of course -- while well-intentioned hospital administrators and doctors are occasionally organizing little tests of the effectiveness of these devices. These tests sometimes work and sometimes don't, partly because the cellular stress displays need to be interpreted by trained experts, who are typically participating in a diagnostic team or at least given access to lots of other information about the patients being diagnosed.

This couldn't happen. If someone tried to sell cancer-detection devices on this basis, they'd get put in jail.

But first, BBC News would publish a story about each new "cellular stress analysis" product, without any indication that there might be any history of concern about what it does and whether it works.

Let me emphasize that I'm open to being persuaded on all these points. As I wrote back in June of 2004:

I'm not prejudiced against the "microtremor" theory -- I'd love to have another measurement dimension for speech analysis. I'm not prejudiced against "lie detector" technology -- if there's a way to get some useful information by such techniques, I'm for it. I'm not even opposed to using the pretense that such technology exists to scare people into not lying, which seems to me to be its main application these days. But when a theory about quantitative measurements of frequency-domain effects in speech has been around for half a century, and no one has ever published an equation, an algorithm or a piece of code for making these measurements, and willing and competent speech researchers (like me) can't create reliable methods for making such measurements from the descriptions we find in the literature... something is wrong.

If someone at KishKish will describe the algorithm to me -- I wouldn't say no to some Matlab code, but I'd be happy with a clear description of the analysis algorithm -- I'd love to try it out, and I'll praise it to the skies if it works, even at the level of giving a reliable measurements of something you could call a "laryngeal microtremor".

And if someone at BBC News will explain to me how what they did on December 14 wasn't just parroting a press release without any attempt to analyze its credibility, in a way that suggests that their standards are lower than those of every other major news organization in the world, I'll apologize for calling them a disgrace to journalism.

[Update: checking Yahoo News shows that Agence France Presse ran this story on January 8 in an even more credulous form (Sophie Nicholson, "Nothing but the truth with Israeli Internet lie detector"). AFP interviewed Zvi Marom, the CEO of BATM, the company responsible for the KishKish product, and Paul Amery, director of Skype Development; but again, there was no attempt to look into the background of this non-innovation. So to be fair, I guess that I have to admit that the BBC is not uniquely untrustworthy as a source of information about things like this, though they were still first out of the misinformation gate by more than three weeks. I didn't learn about the AFP story from Google News, by the way, because AFP sued Google last year to prevent their stories from being indexed: "France defies Google", 3/19/2005.

Yahoo News also informed me that Nate Anderson at Ars Technica found that the KishKish accused his mother of lying about Christmas dinner: "Skype stress detector calls my mother a liar, 12/22/2006. His sensible response: "This might not be the sort of program you want to base major life decisions upon".

Isn't Skype at some legal risk for disrupting friendships and family relationships, if trusting users actually take this stuff seriously? ]

[Update -- Michael Katsevman writes:

Love the blog, loyal reader, etc. :)

I find it very amusing that the source of the company's name KishKish seems to me to come from an Aramaic proverb that's fairly common in Israel: "Istra balagina kish kish karya" which translates to "A coin in a clay vessel makes the sound 'kish kish'". In meaning, it is roughly equivalent to "a storm in a teacup" and "much ado about nothing".

If that's not delightfully appropriate, my sense of irony must be broken.


Posted by Mark Liberman at 07:15 AM

January 16, 2007

Cautious reporting

The trail of reporting on Louann Brizendine's assertion that women talk (much) more than men has now led to Harper's Magazine, where the "Findings" page for February 2007 mentions it.  ("Findings" is a string of very brief summaries -- usually just single sentences -- of research reports.)  For a change, the version in Harper's is suitably cautious.

From "Findings":

A giant tsunami was observed passing across the face of the sun.  Yet another black hole was observed eating a star.  New studies found that the brains of psychopaths are abnormal; that new mothers are more likely to go crazy; that left-handed people are better at multitasking.  A female psychiatrist claimed that women talk more than men.  Researchers at Los Alamos National Laboratory taught bees to sniff out explosives, and computer scientists claimed to have developed a self-aware, curious robot that can diagnose its own problems and take concrete steps to heal itself.

Note that "Findings" reports some items as observations or results, but others as claims: the LANL folks taught bees to sniff out explosives, while the computer scientists claimed to have developed a self-aware robot.  Brizendine gets the more cautious treatment.  And her claim that women talk three times as much as men has been toned down to merely "more than".

Probably Harper's found the item worth reporting because the claim came from a woman; "A psychiatrist claimed that women talk more than men" wouldn't have been nearly as newsworthy.

Annoyingly, Harper's doesn't provide sources for the items in "Findings", though some of them are easy to search for.  Googling on {self-aware robot} gets you to news reports on the Cornell research.  And {"female psychiatrist" "women talk more than men"} gets you to stories about Brizendine.

By the way, Brizendine's web site now subjects you to a welcome message from Brizendine herself.  If you go there, be prepared to turn off the sound file.  Then you can, among other things, read comments (all very positive, of course) from readers of The Female Brain, order a copy of the book, and sign in to receive a "FREE Gift!"

[Addendum: Bob Hay notes that this is not the first time that Language Log fodder has appeared in "Findings".  From the November 2006 column: "Scientists concluded that teenagers are physically incapable of being considerate, British cattle have regional accents, elephants mourn their dead, and nicotine sobers drunk rats."  Hey, they just report what they find in the media.  Still, they deserve a hearty chorus of moos (boos in Cattlese).]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:47 AM


Well, I stepped on a few corns yesterday. Rather than add updates to the posts in question, as I usually do, I'll collect the email and my answers in one place here, and link forward from the earlier posts.

Neil Golightly writes:

Love the blog, very interesting stuff even to a non-linguist like myself.

However, I would like to make a comment on your entry "BBC's duplicity stuns Language Loggers".

Duplicity? Remember Hanlon's Razor - "Never attribute to malice that which can be attributed to stupidity". Although here I don't think it's even stupidity, but more likely to be a limitation of their content management system and processes: the point has been addressed here: http://www.bbc.co.uk/blogs/theeditors/2006/10/sniffing_out_edits.html (see also comments 39 and 40)
Quote: "But lesser changes - including minor factual errors, corrected spellings and reworded paragraphs - go through with no new timestamp because in substance the story has not actually progressed any further."

Perhaps you ought to send them a comment at the.editors@bbc.co.uk ? Or there is a formal complaints site here:
http://www.bbc.co.uk/complaints/ . Don't just expect them to be checking Language Log every day, excellent site though it is.

Telepathic parrots? A "minor factual error"? If African Greys were really telepathic, it would be the story of the century.

As for complaining, fat lot of good it did John Wells. Though they did change "this phenomena" to "this phenomenon" in the quote attributed to him, which I guess is something. But the inflation of a cheese-company public-relations stunt into a fake science story was left in place. And the result of multiple complaints about the chatnannies story was that BBC News simply removed it from their web site and pretended that it never happened. Complaints about most other things, such as the frog sex flapdoodle, have simply been ignored.

Given that sort of response, and the frequency with which BBC News continues to produce prodigious inanities, it seems better to try to alert the public than to yell into the deaf ears of BBC News about the most recent specimen of hooey.

All the same, I hope it was clear that the post's title was tongue-in-cheek. What I think about the BBC News is not that they're duplicitous, but that they simply don't care, one way or the other, about whether the things they write are true or not. Since I obviously can't know their motives, and read only a small fraction of what they write, let me state that more carefully: they often act as if they don't care about the truth of what they write, either before or after its initial publication. At least, that's the impression that I get from the fraction of what I read on their site that deals with topics that I know something about.

Given that judgment, it would hypocritical not to give a prominent place to readers' criticism of our criticism, and we'll continue to do that.


What got the most mail was our criticism of Robert Fisk's screed about how "This jargon disease is choking language": Geoff Pullum's post "Fisk on downsizing", and my follow-up "Downsizing Fisk's bile".

Rob Sears wrote:

Right, fine, "downsizing" in the mouths of managerial types is not something that's applied directly to employees, and the managers themselves certainly wouldn't consider the word to be a synonym or euphemism for "fired". I agree with the letter of all your criticisms of Fisk. But both of your posts on this topic seem naive to the idea that "downsizing" belongs to a set of business words and concepts that are regularly used to obscure and paper over embarrassing topics. Of course, downsizing a company involves a lot more than firing people. But when managers want to talk specifically about firing people they will talk about downsizing the company in an ostensibly general way, anyway. This is their means of broaching the topic. In my line of work as a business copywriter I sometimes draw up communications for or concerning people losing their jobs. The topic is absolutely this -- the loss of jobs. But if I started using direct language, you can be damn sure I would get the document back with Track Changes all over it. Managers want to keep the ostensible message exclusively on the positive aspects of job cuts -- e.g. sleeker, more agile companies with restructured workforces -- while conveying something rather different. You can blame them for that or you can consider it part of responsible management to avoid scary language. But when CEOs talk about job losses they often use words like downsizing, and rarely words like "lay off" or "fire". That's a usage fact. So I'd say you can flip a coin to decide whether Fisk's sense of downsizing is a new sense, or whether it's just a "weapons of the weak" spin on a sense that managers already employ, albeit duplicitously.

That is all. Keep up the good work, Rob

PS If you've not come across it I think "promoted to customer" takes some beating as a joke euphemism for getting fired.

I'm sure Rob is right. I spent 15 years working in an industrial research lab, and was fed my share of unwelcome biztalk. (And universities are increasingly part of the same universe, especially on the financial, operations and facilities side.) But some of this stuff, though easy to ridicule, is not entirely blameworthy. Most organizations are basically worthwhile -- universities are certainly among these -- and it's better to run them well than to run them badly. It's a good thing for people to try to develop and communicate insights about how to do this, and it's hard to develop ideas about a specialized topic like this without developing some specialized vocabulary, i.e. jargon.

Since Rob is a business copywriter, I presume that he understands all of this better than I do.

As for the "set of business words and concepts that are regularly used to obscure and paper over embarrassing topics", this process exists in every area of human life. You can see it as covering up embarrassing topics or as using diplomatic language; its opposite might be called "refreshing candor" or "gross insensitivity".

I'm generally in favor of plain speaking, but I'm also in favor of kindness. I'm generally in favor of realism, but I also prefer to look on the bright side, and to choose optimistic turns of phrase when they're a realistic choice.

And "promoted to customer" is a great phrase, which was new to me. Thanks!


Jamie Hopkings wrote:

I'm amazed that you would publicly slander someone with the phrase "Fisk's well-known anti-Americanism", especially when a simple look at the facts immediately suggest otherwise. Stating that an individual has a broad bias against 300 million people due to the country of their origin/adoption is an astonishing broadside, worthy of the worst political rhetorician.

I enjoy the enlightening writing LL immensely; but I enjoy it for its discussion of language, not your personal political opinions. I would simple ask that if you want to rant politically, start a political blog and leave LL free of it.

On reflection, I conclude that I should have written "Fisk's well-known dislike of American corporations and the American government". I don't think that his animus in that area is disguised or in any way controversial. (Not that he is much fonder of the British government or European corporations, but he has stated that journalists must "challenge authority -- all authority", and the American authorities are currently the biggest ones around.) I don't read everything that Fisk writes, but I've read a lot of his stuff, and I don't recall having seen anything that contradict this evaluation. (Though he has more than one set of peeves -- for example, he clearly has an intense dislike of football, that is, what Americans would call "soccer".)

If you look at the words and phrases that Fisk objects to, many of them -- excellence, interact, impact, outsource, downsize, feedback, input, big picture, no-brainer, outside the box, mission statement, push the envelope, work space, key players, tipping point, and so on -- strike me as having come to prominence via American business and government, whether directly or through popular or academic writing about related topics. (Though I freely admit that I haven't traced the history, except in the case of downsize, the word specifically treated in our posts, which is definitely of American corporate origin). Another large subset of the words he cites -- conflicted, stressed (out), bonding, cope, seeking closure, personal space, quality time, dysfunctional -- comes from the language of pop psychology and self-help books, what he calls "the language of therapy, in which frauds, liars and cheats are always trying to escape" I think of (this style of) therapy-talk as being basically American in origin, though it's certainly now a world-wide phenomenon.

It's true that a few of Fisk's lexical complaints -- about author being used in place of authoress, for example -- seem simply to be bizarre bits of idiosyncratic conservative crankiness, without any particular national or social connection. But there are several veins of jargon, or at least "words and concepts that are regularly used to obscure and paper over embarrassing topics", that he chose to stay away from, such as the one that includes militant and martyrdom operation. So I thought it was plausible to suggest that his take on the "jargon disease that is choking language", like everything else he writes, is influenced by his notion of who the bad guys are.


And Steve Jones wrote:

I find it rather sad that a purported intellectual such as yourself should fall into the basic trap of confusing opposition to the US's foreign policy as being opposition to all things anti-American. [sic]

As to Fisk not being interested in political facts perhaps you could explain how it is his analyses that have proven time and time again to be correct and not those of his know-it-all opponents (normally acting from knee-jerk pro-Israeli bias) in the blogosphere - hint; the fact that he is actually in the area as opposed to in his bedroom thousands of miles away might have something to do with the matter).

I can see that Steve feels strongly about this, or he wouldn't have written "opposition to all things anti-American" when he meant "opposition to all things American". I concede this point, though I'd supplement "opposition to the US's foreign policy" with "dislike of American corporations and American corporate culture" and "dislike of the American military and all its activities".

On this last issue, there's a striking contrast with the way that Fisk treats the officers and men of non-western armies -- for example in his (very interesting) reporting from the Iran-Iraq war. The same thing goes for his descriptions of the partisans of various irregular fighting groups that he has met and described, such as the member of Islamic Jihad, one of the captors of Terry Anderson, whom he describes at length near the end of Pity the Nation:

Here was a man, I thought as I watched him, who had travelled far from our world, had sought and found a determination that suppressed any apprehension or disquiet or fear. Because of the suffering he had caused Terry, I should have hated him. He himself called the taking of innocent hostages an 'evil'. But I did not hate him. In the course of our conversation he would become visibly angry, stabbing his right fist -- forefinger extended -- in fury as he condemned America for its support for Israel and for shooting down the Iranian civil airliner over the Gulf in 1988. I had so often seen this fury, in the aftermath of air raids or artillery bombardments, at cemetaries and mass graves. If he had allied himself with others -- and few in Lebanon doubted that an Iranian faction controlled Islamic Jihad -- his passion was genuine.

So could he not in his heart, I asked him, feel any compassion for Terry? Still those large eyes never left me. 'Of course,' he replied, 'it would be very easy to find the answer to this question if you had been the mother or the wifeof one of the hostages in Khiam -- or the mother or wife of Terry Anderson. My feelings towards the mental pain of Terry Anderson are the same as my feelings towards the Lebanese hostages in Khiam -- with the exception that the Lebanese hostages have gone through, adn are going through, both mental and physical torture.' [...]

What was he seeking, I wondered? Comprehension? Forgiveness? Did he want to show a westerner that he was a human being rather than the 'terrorist' portrayed by his American and Israeli enemies? I rather think he did.

Has Fisk ever written with this degree of sympathy about an American soldier? Not in anything that I've read, though I'm open to correction on this point.

I'm not sure what Steve means about the "analyses that have proven time and time again to be correct". My impression is that Robert Fisk takes a strong political stance in every area where he works, and writes in support of his beliefs without any pretence of doing otherwise. Sometimes his analyses have been prescient, and sometimes they've been flat wrong. This is not just the judgment of "knee-jerk pro-Israeli" bloggers. His (self-identified) friend Simon Hoggart wrote in the Guardian ("A war cry from the pulpit", 11/17/2001) that

At the time of the Gulf war [Fisk] wrote incredibly despondent articles predicting the annihilation of the western powers. He found a group of British soldiers lost in the desert and extrapolated defeat for the whole of Desert Storm. At the time of the Kosovo crisis he reported that the bombing would only make things worse. [...] In short, he is that most valuable resource, a journalist whose judgments are not just mistaken, but reliably mistaken.

This seems excessive, but it also seems excessive to say that Fisk has "proven time and time again to be correct". As I read Fisk, he makes it clear that he cares about facts only insofar as they support his causes. Has he ever discovered and reported a fact against interest? Not very often, I think, if ever.

We're getting pretty far afield from matters of language, and into an area where we don't often stray. But I felt that Robert Fisk's attitude towards the "jargon disease" that "is choking language" was very much of a piece with his attitude towards political reporting. He expressed strong feelings about a long list of specific words and phrases, arguing that this vocabulary is "repulsive", "an aggressive language of superiority", a set of "lies and obfuscations" that are "infuriating", a "disease" spread by "tiresome" people, and so on. You could infer many of his political and cultural opinions from the specific list of "repulsive" words and phrases he gives, just as you could infer a very different set of views from a list of allegedly "repulsive" jargon that included terms like transgressive art or hegemonic discourse.

Geoff and I looked at one example, downsize, where Fisk didn't just make a face, but made a sort of an argument about why the word is one of those "repulsive ... lies and obfuscations"; and we concluded that he was wrong on the facts. I'm sure that Fisk wasn't trying to mislead his readers; but I'm equally sure that he made no attempt to check his interpretation. This combination of bias and carelessness -- or, you could say, conviction and boldness -- characterizes most of his writing, it seems to me.

[Update: Ed Lass writes:

You concede too quickly to Jamie Hopkings and Steve Jones! The Wikipedia article on anti-Americanism suggests that your usage -- as a term for opposition to United States policy, multinational corporations, and possibly the cultural influence of both -- is a widespread one. The article does reflect the beliefs of those who object to the term, but then is it you who are bringing politics to LL, or your critics?

If anything, I thought the term helped to clarify Geoff Pullum's post, where Fisk's rant is labeled "conservative" (a label that you repeat in the Mailbag). I know what you mean, but I find plenty of room to pick the wrong definition of this word while talking about a political figure.

Thanks for the blog, I certainly enjoy it.

I agree that "anti-Americanism" has long been used as shorthand for a general sort of cultural animus. I used it that way a couple of years ago in discussing the section on "Americanisms" in H.W. Fowler's 1908 The King's English ("Stuck inside of Fowler with the Memphis blues again", 2/16/2004). Fowler objected to the "remorseless and scientific efficiency" of Rudyard Kipling's "americanizing" style, somehow connected this to the "barbaric taste illustrated by such town names as Memphis", and concluded that "a very firm stand ought to be made against placate, transpire, and antagonize.." I described this as "a sort of cubist collage of the classic themes of European anti-Americanism". Nobody objected then -- though Robert Fisk clearly has more partisans among our readers than Fowler does!

The attitudes in question have been around, in one form or another, since the 18th century. Philippe Roger's L'ennemi Américain traces the peculiarly French versions from the Comte de Buffon onwards. Charles Dickens expressed a 19th-century English form in Martin Chuzzlewit, about which an American reviewer wrote "There is no picture of English life in Dickens in which there are not lovable English men and women. But there is no lovable American man or woman in Martin Chuzzlewit."

It seems clear to me that Robert Fisk has his place in this distinguished tradition, whatever name we use for it.

Lameen Souag, along with some correspondence on other matters, wrote:

As for Robert Fisk, I have only read one of his books - Pity the Nation, his account of the Lebanese War (which he lived through) - but I found it rather impressive, providing a detailed first-hand account and regularly lambasting the atrocities, the cynicism, and indeed the stupidity of every major organisation involved, Palestinian or Israeli, Christian or Muslim or even Druze, UN or Syria or US. If he "cares about facts only insofar as they support his causes", he didn't do a very good job there of hiding the inconvenient ones. My overall impression is that his forecasts are unreliable for precisely the same reason that his reporting is great: he immerses himself in the feelings and experience of the inhabitants of the place he's in, whereas far too many reporters in countries with a "difficult" language seem to completely ignore such things.

This corresponds pretty well with my own impression of that book, though I wouldn't say that his empathy is as evenly distributed as his scorn. ]

Posted by Mark Liberman at 07:07 AM

Banning the Letter X

We've previously discussed the fact that in Turkey the use of the letters Q, W, and X is illegal since they are not found in the Turkish alphabet. It turns out that Saudi Arabia too has a problem with the letter X, for different, and even dumber, reasons. According to this article by Amr Mohammed Al-Faisal in the Arab News, his company tried to register the trademark Explorer with the Saudi government as the English counterpart to the existing Arabic trademark for a product. The Ministry of Commerce turned down his application due to the objections of the هيئه الأمر بالمعروف و النهي عن المنكر the "Commission for the Promotion of Virtue and the Prevention of Vice," popularly known as the "religious police", in Arabic, the mutawwa'in. Why? Because the letter X resembles the cross, the symbol of a popular non-Islamic religion.

I encountered this story today in this article on Dhimmi Watch a web site devoted to criticism of Islamism and especially of non-Muslims who in its opinion submit to being dhimmis, the subordinate status assigned to non-Muslims in Muslim countries under Muslim law. Dhimmi Watch picked this up from NewsMax, which seems to be a right-wing news site. NewsMax doesn't cite its source, but I found it easily enough. The item is legitimate (though of course I can't vouch for the veracity of the original writer), but you may already have noticed, if you read the linked article, that it appeared on November 2nd, 2003, over three years ago. That doesn't make the Commission's position any less bizarre but it is curious that a news site would reproduce a three year old news item as if it were current.

P.S.: For those readers who have written me recently and haven't had a response, I've had a bad cold, together with a grant application deadline and some other pressing matters, so I'm behind. I'll get to you.

Posted by Bill Poser at 01:11 AM

January 15, 2007

How to thaw a jacket

A number of Language Log posts have been about the dangers and humor of translating Chinese into English menus and commercial instructions. For example, see here and here. It's easy to understand why overseas manufacturers with limited ability in English might have problems translating from one language to another but I can't figure out what happened with the new jacket I purchased a few days ago. It was made by Columbia Sportswear, a company in Portland, Oregon, which seems to specialize in winter wear. One might expect the instructions for use by an American company, in Oregon no less, to make sense. But what I found mystifies me. In the zipper pocket, I found a three by four inch card saying the following:


I like my jacket a lot but I can't figure out why I should need to thaw it. It wasn't frozen when I bought it and I can't imagine falling into an icy river or spending a night locked in a giant foodlocker. Hoping for further enlightenment, I followed the instructions to the "see below" part of the card. Here's what it says:

Soft sueded fabrication on the outside with fleece on the inside
Heavy weight cotton/polyester
Easy access hidden pockets for music players
Lined hood with drawcord

As far as I can tell, these instructions weren't the least bit relevant for helping me thaw my jacket, even if I wanted to. In fact, they weren't even instructions. Apparently Chinese to English translations aren't the only problem with instructions on commercial products.

Posted by Roger Shuy at 06:19 PM

BBC's duplicity stuns Language Loggers

A story on the BBC News website, "Parrot's oratory stuns scientists", by Alex Kirby, "BBC News Online environment correspondent", claims that it was "Last Updated: Monday, 26 January 2004, 15:27 GMT". The version of this story captured by the Wayback Machine on 4/24/2006 is also labelled "Last Updated: Monday, 26 January, 2004, 15:27 GMT", but it's somewhat different in content. This is a matter of some personal interest to me, because on January 28, 2004, I wrote a blog post complaining about "Parrot telepathy at the BBC", which began:

I hate to pile on, what with the "sense of shell shock" and "a bit of a meltdown" at the BBC. However, I have to take BBC News Online environment correspondent Alex Kirby to task, for sexing up this story about N'kisi the African grey parrot.

I yield to no one in my admiration for parrots' communicative efforts, and N'kisi does sound like a remarkable fellow, with a vocabulary said to number 950 words, but you have to wonder what is happening at the BBC when Mr. Kirby writes that:

N'kisi's remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine.

However, in the version that's now on the BBC website, the sentence in question reads

N'kisi's remarkable abilities feature in the latest BBC Wildlife Magazine.

There's no correction at the bottom, and no indication that an imputation of telepathy has been retracted, so I'm glad that the Wayback Machine shows that I didn't just make the whole thing up.

[A skeptical take on N'kisi, including the telepathy experiments, can be found here. I'm skeptical in principle about the telepathy part, but I'm agnostic about the communicative potential of parrots; at least, I enjoy a good story as much as anyone.]

At the water cooler here at Language Log Plaza, Geoff Pullum commented "Wow! The BBC are not just science idiots; they actually fake the record later, and delete things from published material!" I reminded him of the infamous chatnannies affair, but I agree that silent removal of an embarrassing phrase, while retaining a false "Last Updated" banner, is worse. I'm sure that if a government ministry did the same thing with embarrassing predictions about the war in Iraq, or something of the sort, that BBC News would (quite properly) be all over the story.

Arnold Zwicky suggested that perhaps we ought to just relax and enjoy it, quoting email from Alessandro J.:

Why do you skeptics have to make the world so boring?  I for one do believe in telepathic parrots. I also believe in gorillas who develop nipple-fixations because they were not properly breast-fed as infants.

This is a reference to Patricia Yollin, "Gorilla Foundation rocked by breast display lawsuit", San Francisco Chronicle, 2/18/2005.

[The discovery of the telepathy-free BBC article was made by David Beaver, now iced up in Austin.]

Posted by Mark Liberman at 03:10 PM

Downsizing Robert Fisk's bile

In reference to Robert Fisk's screed about "jargon", Geoff Pullum speculated ("Fisk on downsizing", 1/14/2007) that

... in his hasty condemnation of the verb downsize Fisk slid from the original use to a popular extension of it, without realizing that it made what he said false: I suspect that in the "repulsive" usage of the managerial classes who talk about downsizing companies, downsize does not mean "fire". I suspect it is precisely the people who fell victim to the process by being laid off during a downsizing who coined the second meaning of the verb.

A bit of further research (it's a holiday, so I can spend my lunch as well as my breakfast on small blog-sized research projects) confirms and extends Geoff's intuition. Specifically, the use of downsize to mean "fire" is the third or fourth stage of the verb's semantic development, depending on how you count; and the available evidence supports his belief that it was the laid-off victims who originally were (and largely remain) responsible for the extension.

The OED entry for downsize starts with a note that may offer some additional insight, given Fisk's well-known anti-Americanism: "orig. and chiefly U.S.". The historical array of definitions and citations indicates that the word was first used in the automobile industry, to refer to the effects in 1974 and 1975 of the the Oil Shock of 1973:

a. trans. To design or build (a car) of smaller overall dimensions, esp. without reducing interior and boot capacity. Also absol.

1975 Automotive Industries 15 Oct. 10/1 The auto companies and their suppliers are turning to the job of ‘downsizing’ most of their cars to meet government and market demands for cars that are lighter and more economical.
1976 Time 13 Sept. 47 All the automakers are already at work down-sizing their cars for 1978 and later years. 1

I've found a couple of slightly earlier citations, taking the use in U.S. newspapers back to June of 1975. Thus "Wall Street sees more losses for car industry", Long Beach California Independent, 6/16/1975 :

GM and Ford have said they will redesign many of their large cars to make them smaller and more fuel efficient. Chrysler officials have said they, too, plan to downsize their models.

But there seems to be nothing earlier, and it's plausible that the word was invented by some unknown auto industry executive or analyst, probably in late 1974 or early 1975, once it became clear that the oil price increase was serious and more or less permanent. During the period from 1975 to 1980, there are thousands of examples of this use in American newspapers and magazines, corresponding to the reality represented by this chronology of crude oil prices:

The next step began in 1979 or so, four or five years after the auto-industry origin of the term, and involved the natural generalization of the idea of "downsizing" to other objects and products. The OED has:

b. gen. To reduce the size of. Also intr. for pass., to be reduced in size.

1979 Newsweek 19 Nov. 79 His formal announcement in Washington was similarly down-sized.

I've turned up a slightly earlier UPI citation, from The Galveston Daily News 7/4/1979:

A new line of eyeglasses has been produced especially designed for children 6 to 12 years of age to fit their smaller facial contours, rather than simply "downsized" adult models.

Other examples become commonplace from 1980 onwards, applied to hats, bags of cranberries, and construction plans, among many other things:

"Westward, ho! Hats on for the latest fashion look" Daily Intelligencer, 8/10/1980

With 3-inch brims and 5 1/2-inch crowns, the hats are a downsized version of the traditional hats with 4-inch brims and 7-inch crowns.

Deborah Hartz, "Advocate's book helps consumers buy wisely", Chicago Daily Herald, 8/13/1981

You're sure to think back to 1980, when the standard 16-ounce plastic bag of fresh cranberries was downsized to 12 ounces and sold at the same or a higher price than the traditionally sized package.

Lloyd Batzler, "'Significant problems' with jail design resolved", The Frederick Post, 8/18/1981

Mrs. Williams said the country would downsize the planned gymnasium to 3,000 square feet.

In this context, it's inevitable that people would begin to use the same term for decreasing the size of companies. You could see this as a new meaning, since decreasing the size of a company is a much more complicated thing than decreasing the size of a hat, but really, it seems to me to be just an another application of the same basic sense previously used for eyeglasses, hats, bags of cranberries, and gymnasiums.

The OED agrees, treating this usage as the same sense b given above. The earliest citations of downsizing companies or businesses are from 1982-83:

1982 Fortune 25 Jan. 7/1 Right now he's ‘downsizing’ the company, and hopes to achieve 1982 cost savings of about $600 million.
1983 Washington Post 10 June D8/4 Decline in demand for certain products and other factors ‘make it imperative to downsize the business’.

In a few minute's searching, I haven't turned up anything earlier.

The OED treats the use that bothers Robert Fisk so much in a "Draft Addition" dated June 2006:

trans. euphem. or humorous. To dismiss (a person) from employment. Freq. in pass.

with the earliest citation given being from 1990, fully 15 years after the word came into common usage as a term for making American automobiles smaller in response to rising oil prices:

1990 Communication World May-June 40/3 Communicators were facing tough times on their jobs. Many were getting downsized and outplaced.
1997 GQ Sept. 96/3 Just been downsized? Worried that someone may discover the bodies?

Both of these examples use the term as if from the perspective of people laid off, not from the perspective of those making the cuts.

A bit of web search turns up a slightly earlier use that is clearly in a quotation from someone who had been laid off -- James Barron, "Learning to Hang More than a Diploma", NYT 2/1/1990:

Nancy Wickstrom, an investment banker from Sunnyvale, S.I., who said her job was "downsized" in the aftermath of the 1987 stock market collapse, decided to go into paperhanging, because "I didn't want to go into Manhattan. I didn't want to have to wear dresses or skirts. And I wanted to be my own boss."

Ms. Wickstrom is referring to the Black Monday stock market crash of October 19, 1987. And this cartoon, from a display ad in the Wall Street Journal a year and a half earlier (8/15/1988), talks about an "operation" being downsized, but clearly plays to the fears of the individual "mammouths" whose jobs are in jeopardy in the wake of that same event, which loomed large in the consciousness of New Yorkers:


I'm sure that Geoff was right to predict that Robert Fisk "will not be very interested in the distinction between word senses that I just drew", and I expect that he will be even less interested in the further elaboration of facts in this post, despite the demonstrated connection between watershed geopolitical events and the stages in the semantic development of the verb downsize.

This is partly because Fisk, as Geoff observes, is not interested either in language or in the internet, except to complain about them. But it's also because of the same characteristic that has enabled him to give his name to a verb for providing "detailed point-by-point criticism that highlights errors, disputes the analysis of presented facts, or highlights other problems in a statement, article, or essay". The fact is that Robert Fisk appears to lack even the slightest interest in any historical, social or political facts at all. His writings are pure attitude, and the things in them that look like facts are actually examples of a completely different rhetorical category, whose definition, due to Harry Frankfurt, is worth repeating here:

What bullshit essentially misrepresents is neither the state of affairs to which it refers nor the beliefs of the speaker concerning that state of affairs. Those are what lies misrepresent, by virtue of being false. Since bullshit need not be false, it differs from lies in its misrepresentational intent. The bullshitter may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to.

This is the crux of the distinction between him and the liar. Both he and the liar represent themselves falsely as endeavoring to communicate the truth. The success of each depends upon deceiving us about that. But the fact about himself that the liar hides is that he is attempting to lead us away from a correct apprehension of reality; we are not to know that he wants us to believe something he supposes to be false. The fact about himself that the bullshitter hides, on the other hand, is that the truth-values of his statements are of no central interest to him; what we are not to understand is that his intention is neither to report the truth nor to conceal it. This does not mean that his speech is anarchically impulsive, but that the motive guiding and controlling it is unconcerned with how the things about which he speaks truly are.

It's this characteristic of Fisk's work that makes it such a juicy target for fisking, although fiskers often mistakenly assume that everything he writes is false. It's not -- but like the blind dog in Gamble Roger's story, he just don't care.

[Update: more here.]

Posted by Mark Liberman at 12:53 PM

Let's discover F words

Elizabeth Daingerfield Zwicky has alerted me to the fact that Book Closeouts, here, is offering the volume "Let's Discover F Words", a Troll Picture Dictionary (for kids) by Robyn Supraner.  This is the  only volume in the series offered on closeout, but a glance at amazon.com shows that the series covers the entire alphabet ("two volumes" means that there's a volume on "Let's Discover x Words" and another on "Let's Discover More New x Words"):

A (two volumes), B (two volumes), C, DE, F, GH (two volumes), IJK, L (two volumes), M (two volumes), NO, PQ, R, S (two volumes), T (two volumes), UV, WXYZ

I'd guess that the title of the F volume put a damper on sales.

Meanwhile, back in the adult world, new deployments of "the F word" continue to come in.  From the Economist, 1/13/07, a leader on p. 13, "Clones to the right of me, jokers to the left", begins:

There is something about the study of embryonic stem cells that brings out the "F" word.  That word, unlike the embryonic cells themselves, is nothing to do with reproduction.  It is "Frankenstein".

What next: ferrets, fisticuffs, fandango, Firbank, furlong, Fu Manchu,...?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:45 PM

Martin Luther King's rhetorical phonetics

In the early 1960s, millions of Americans were ready to listen to Martin Luther King's message, and the way that he delivered that message helped us to hear it.

Listen to these two phrases from the famous "I have a dream" speech, delivered on August 20, 1963 at the Lincoln Memorial. The first phrase, "I am happy to join with you today", is his opening. The second, "when we let it ring from every village and every hamlet", is from his peroration, just before the immortal ending "free at last".

His timing is eloquent: he speeds up and slows down in a way that conveys how his sentences are put together. Every fluent speaker does this to some extent, and he does it abundantly and at the same time precisely. But within most phrases in this speech, his pitch is relatively level, almost as if he were chanting or singing rather than speaking.

In particular, his phrases often end with a sustained or slightly falling pitch, instead of the steeper relaxation to low pitch that English phrases usually have. Because the expected falls are missing, some of his sutained final syllables (e.g. "today" in the opening phrases) may sound to some people as if they go up. But listen carefully, and look at the pitch contours:

Of course, King's individual phrases in this speech do have a melody -- though sometimes a subtle one -- that helps convey his message. And he varied the overall pitch range much more widely from section to section of the speech, as effective speakers since time immemorial have done to embody the ebb and flow of ideas and emotions. But there was something about the way that he chanted each phrase, like a song or a prayer, that commanded attention and memory.

For an example of a contrasting rhetorical style -- one that was more familiar to most white Americans, at least in the north -- listen to the opening phrase and part of the peroration of another justly famous early-60s speech, John F. Kennedy's Inaugural Address, delivered January 20, 1961. (The two MLK phrases are numbers 1 and 2 in the panel below, and the two JFK phrases are numbers 3 and 4.)

Kennedy also uses pitch pitch and time effectively, but in a different way. As the pitch tracks below illustrate, JFK tends to use more within-phrase pitch modulation, and his non-final phrases are more likely to end in a fall-rise or a fall, without the singing or chanting quality of MLK's public rhetoric:

This little breakfast-time exercise in rhetorical phonetics is anecdotal and allusive at best, so I put it forward only tentatively, as an invitation to someone to do better. It's a curious fact about modern intellectual life, though, that such analysis is not commonly done in a more systematic and scientific fashion. The people who are interested in rhetoric don't (as far as I can tell) know how to use the methods of modern instrumental phonetics and statistical modeling, while the phoneticians don't see rhetoric as within their purview. I doubt that this disconnection would have happened in any earlier era.

One more small point. It's an obvious point, but too often forgotten. The way that someone speaks in a given context -- including the context of the phonetics lab -- is not a fixed and invariant property of their individual essence. It's a way of behaving that depends not only on who they are, but also on what they're saying, where they're saying it, why they're saying it, and who the audience is. Among other things.

We can see one small example by comparing the two phrases from MLK's "Dream" speech with two phrases from his reading of the "Letter from Birmingham Jail", sent April 16, 1963.

It's the same man, and the same voice, but a lot of things are different.

For one thing, he uses within-phrase pitch modulation to a greater (proportional) extent, and a larger fraction of his phrases end in English-typical final falls, as the pitch contours for these two phrases suggest:

For another thing, his overall pitch range is radically lower. Within each recording, his pitch range expands and contracts, and goes up and down, in the usual rhetorical parallelism of sound and sentiment. But his performance of the letter -- a sober and intimate communication, read as if to a small nearby audience -- is pitched at a whole different level from his performance at the Washington Memorial.

This boxplot of pitch values from the four sample phrases illustrates the point:

I said that this kind of variation is obvious, and also that it's too often forgotten. Here's an example of how phoneticians often forget it, or more precisely, pretend that it's not true. The graph below is from a meta-analysis that combine data from many studies of male and female fundamental frequency in speaking:

The basic interpretation is clear and also true:

  • Children speak with higher pitch than adults, due to smaller body size and smaller larynx dimensions;
  • As children pass puberty, average male and female pitch values diverge, with male pitches being lower due to the effects of testosterone on larynx growth during puberty (the "voice changing" effect);
  • The voices of older people tend to get slightly higher, probably due to decreased tissue elasticity.

However, something important is left out here. The implication of the plot is that a person of a given sex and age has a specific and predictable F0, characteristic of them as a member of the group. This leaves out the wide range of variation in characteristic speaking pitch among (say) 20-year-old males -- but it's normal, if sometimes misleading, to simplify such plots by showing only the means and ignoring the distribution of values. [Actually, plot does show error bars, but they're misleadingly small, in my judgment.] The same thing might happen if we were plotting height as a function of sex and age -- and it might well be the most appropriate way to present the information.

But in this case, something else is left out as well. Someone's height doesn't vary much from context to context within a short period of time,the way that their pitch does. As the plot of MLK's "Dream" and "Letter" performances shows, you can't characterize someone's pitch -- even as an average -- without considering their utterance's context, purpose, style, audience and so on. Note in particular that the average pitches of MLK's "Dream" speech are well above the line of pitches given as characteristic for female speakers.

It's certainly true that there's an underlying laryngeal biology of age and sex that underlies, at least qualitatively, the last plot above. That's presumably because the measurements were made in roughly similar settings, to which the speakers of different ages and sexes reacted in roughly similar ways.

But in general, in order to understand behavioral measurements from human groups, whether biologically or culturally defined, we need to think about what the contexts were, and how the people interpreted and responded to them. And whatever we learn about the group effects, we need to remember one of Martin Luther King's other memorable phrases:

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.

Posted by Mark Liberman at 11:15 AM

A Guardian editor's bitterest embarrassment

Back in November, I noted a Guardian column by reader's editor Ian Mayes, in which Mayes unquestioningly accepted a reader's absurd assertion that the "correct" superlative form of bitter is most bitter and never bitterest. Mayes even bolstered the reader's point with an equally erroneous claim of his own, writing that the Oxford English Dictionary has no recorded use of bitterest. I wasn't the only one to call foul: in today's column, Mayes abashedly responds to the stream of reader responses pointing out that bitterest does indeed appear in the online OED (41 times in a full-text search of citations), and that the word has a long history of literary usage. Appropriately, he concludes with a line from John Stainer's The Crucifixion (libretto by W.J. Sparrow Simpson): "Back to mine agony I must go, lonely to pray in bitterest pain."

[Note added by Geoff Pullum: And nowhere does Mayes mention Language Log. Clearly it is not just one but two crucial tools for the modern language scholar that he does not know how to use.]

Posted by Benjamin Zimmer at 10:22 AM

January 14, 2007

Fisk on downsizing

Robert Fisk is extremely well known as a foreign correspondent for The Independent who has lent his name to the English language as a verb. I don't know if he likes that neologism, but he apparently hates a lot of others. His column "This jargon disease is choking language" is a tired piece of same-old-same-old conservative rant about modern expressions. The words and phrases he hates seem to be from all over the map: psychotherapy and psychology ("bonding", "closure", "conflicted", "dysfunctional", "healing", "move on", "quality time", "stressed"), business management ("downsize", "excellence", "feedback", "input", "outside the box", "outsource"), military affairs ("spike", "surge"), feminism ("author" for "authoress", "actor" for "actress"), perceived political correctness ("Happy Holiday", "Estuary English"), or just general contemporary colloquial speech ("no-brainer", "cope"). It is hard to know what to say about this poor man who seems to hate a large number of the words and phrases that must surround him every day of his life. But let me make one small observation — just a speculation, really — about one of them: the verb downsize.

Fisk is careless about the crucial point: first he says that he finds it "repulsive" that people nowadays "downsize" the number of their employees; but then he says that "Downsizing" employees means firing them. But what is it that one downsizes: numbers, or people? Notice that he slides from one sense to the other.

Here's what I suspect may be the case: the verb downsize was originally created by management people as a one-word way of making reference to the complex matter of making a company smaller without (of course) intending to make its profits smaller. Downsizing involves cutting the number of sites to be maintained, vehicles to be run, divisions to be operated, departments to be staffed, salaries to be paid, and thousands of other things. A word like "shrinking" would not really capture it: you can shrink a company in many ways. A worker who goes berserk and shoots half a dozen workmates on the factory floor is not downsizing the operation. Downsizing is a controlled operation undertaken for benefit of a business, and capricious firing of people you don't like, neglect of the company so that it withers, or committing multiple murder of workmates does not count. The verb in this original sense definitely does not mean "fire" (British "sack"), because that does not even make sense when applied to a company or a set of numbers.

Downsize with a noun phrase denoting a human being as its logical object, as in I used to work for IBM but they downsized me or I was downsized, may well be a colloquial extension of the term created by employees rather than managers. I'm not at all sure that managers would ever talk about downsizing an idle secretary, or that top executives would discuss downsizing the chief financial officer.

The claim here is basically an empirical one: as I've said, I am offering a speculation, not the result of an inquiry. But I think that in his hasty condemnation of the verb downsize, Fisk slid from the original use to a popular extension of it, without realizing that it made what he said false: I suspect that in the "repulsive" usage of the managerial classes who talk about downsizing companies, downsize does not mean "fire". I suspect it is precisely the people who fell victim to the process by being laid off during a downsizing who coined the second meaning of the verb. Fisk has carelessly wandered from one lexicographical topic (managerial euphemisms) to another (semi-jocular popular appropriations of technical vocabulary).

Fisk exhibits a characteristic behavior of people who rant about language but are not really genuinely interested in it. Those of us who write for Language Log typically are interested, so we pay attention to the details. Yes, I've written rants aplenty myself; but my rants (here's a random example) usually involve fairly close attention to the details of the language I'm talking about, unless I'm just engaging in self-parodic frothing for humorous effect, which admittedly does sometimes happen. I don't think Fisk intends humorous self-parodic frothing. I think he regards his grumbles as serious. But I'm fairly sure that he will not be very interested in the distinction between word senses that I just drew. (For one thing, he has stated: "I don't use the Internet. I've never seen a blog in my life. I don't even use email"; so he has never really seen any serious fisking, and he will never even know that I wrote this.)

Update: Joseph Ruby tells me I am definitely wrong about business people's usage. He says: "Employers and managerial consultants do use the verb downsize as a euphemism for the verb fire — more accurately, for "lay off," as "fire" strictly speaking implies ending an employment relationship for cause." And he cites as an example this source for the following:

  • Downsizing or doing layoffs is a toxic solution. Used sparingly and with planning downsizing can be an organizational lifesaver, but when layoffs are used repeatedly without a thoughtful strategy, downsizing can destroy an organization's effectiveness. How you treat people really matters - to the people who leave and the people who remain.
  • One outcome of downsizing must be to preserve the organization's intellectual capital.
  • How downsized employees are treated directly affects the morale and retention of valued, high-performing employees who are not downsized.

I certainly agree, that does look like a clear managerial use of the new sense. However, see the further discussion, which tends to support what I said, by Mark Liberman in this post.

[Thanks to my dad for pointing out the Fisk column.]

Posted by Geoffrey K. Pullum at 06:21 PM

Separating fiction from real life

A review of Elif Shafak's The Bastard of Istanbul in the 1/13/07 Economist reports the trial of Shafak in the Turkish courts for insulting the Turkish identity (a criminal offence) -- on the basis of what one of her characters says in the book.  From pp. 76-7:

... Ms Shafak's crime was to have drawn attention to the Armenian genocide.

Setting a bizarre precedent, prosecutors rested their case on the words of one of the fictional Armenian characters in her book, which was originally written in English, but which is only now coming out in America.  The offending phrase talked of "genocide survivors, who lost all their relatives at the hands of Turkish butchers in 1915".  That phrase and other unflattering references to Turkish behaviour were deemed to have violated the penal code under which insulting "the Turkish identity" is a criminal offence.

Ms Shafak was eventually acquitted after the court agreed that she could not be convicted on the comments made by a fictional character.

Nobelist Orhan Pamuk got into similar hot water a while back, but for comments he made in his own voice, in a newspaper interview; the charges against him were eventually dropped.  Shafak actually went to trial, but she wasn't held accountable for the thoughts and words of her characters.  In many other times and places, the distinction between author and character would have counted for nothing in such a case.  No doubt Turkey's interest in joining the European Union and the bad publicity surrounding the Pamuk case worked in Shafak's favor.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:59 PM

Bush presidency plutoed

A recent Andy Borowitz humor column (which I caught in the January 2007 Funny Times) reports that the Bush presidency has been plutoed:

An international group of scientists who demoted the planet Pluto to dwarf status three months ago met in Oslo, Norway today and reclassified the Bush White House as a dwarf Presidency.

... with the President's approval rating in a free fall, it became clear even before the scientists convened that some sort of reclassification along the lines of the Pluto demotion was in order.

I hear rumors that the President plans to surge supporters to improve his approval rating.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:48 PM

Natural number

In response to my posts yesterday and today on the historical tendency to shift from "there is a number of" to "there are a number of", I got email from David Denison, an expert on the history of English syntax. His comments are reproduced in full beyond the jump.

I think there are three partially independent things going on here. One is the reanalysis of partitives generally, so that headship of a phrase like a majority of Americans shifts from the first noun to the second.

Another is that there's gets reanalysed as an invariant form and so can be used readily with plural NPs -- much more readily than the uncontracted form in speech. (Brief comments in my 1998: 213-14.) You mentioned this in yesterday's post. I agree with you about the incipient reanalysis. Trouble is, written English (especially American written English) is so heavily prescripted that a writer thinking of monosyllabic contracted there's may still get subedited (or self-edited) on the page to there is. I wonder whether fear of the contraction outranks fear of a concord mismatch -- indeed there might well have been a rule re-ordering over time (if prescriptive rules are subject to ordering!).

Anyway, it's an interesting question. For some earlier careful counting of 19C data, plus comparisons between what contemporary grammarians said should happen and what actually happened, see

Dekeyser, Xavier. 1975. Number and case relations in 19th century British English: A comparative study of grammar and usage. (Bibliotheca Linguistica, Series Theoretica.) Antwerp and Amsterdam: De Nederlandsche Boekhandel.

And the third tendency is towards what I once fancifully called 'natural number':

Remembering how natural gender supplanted grammatical gender over the course of the OE and eME periods (CHEL II: 105 8), we might see a tendency towards 'natural number' in such developments as plural none, plural government, public, etc., and likewise singular themself. (1998: 123)

I guess your number of phenomenon -- or rather the more general reanalysis of partitives -- could be referred to this tendency too.


Denison, David. 1998. Syntax. The Cambridge history of the English language, vol. 4, 1776-1997, ed. Suzanne Romaine, 92-329. Cambridge: Cambridge University Press.


I guess the full six-volume set will go on my birthday list for the next few years.

Posted by Mark Liberman at 01:22 PM

Wonders of scholarship

Every once in a while I'm struck by a report on scholarship that makes me marvel about the world.  Last week (1/9/07) it was a NYT Science Times article about restoring an ancient Egyptian mud fortress.  Two things.  First, a casual reference to the enormous time depth of Egyptian civilization:

It was in the 26th century B.C., a few generations later, that even more powerful kings erected the majestic pyramids at Giza, the last surviving of the so-called seven wonders of the ancient world.

Then a list of the experts on the project:

William Remsen, a preservation architect; Anthony Crosby, a specialist in mud-brick and earthen architecture; and Conor Power, a structural engineer

It's a wonderful world that has specialists in mud-brick and earthen architecture in it.  (Structural engineers I knew about, of course, and I have a friend who's a preservation architect, but the mud-brick and earth specialization was news to me.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:11 PM

"A number of" has been getting more plural since 1800

Yesterday I suggested, without much evidence, that noun phrases like "a number of Xs" have become increasingly plural over the past couple of hundred years, at least with respect to their treatment in the frame "there is/are __". For this morning's Breakfast Experiment™, I checked a couple of web-searchable sources, and the results tend to confirm my suggestion.

Between 1800 and 1840, American magazines and other periodicals used is about 14% of the time in the frame "there is/are a number of". (See the end of this post for some examples.) By the time we get to 1970 or so, publications like the New York Times and the Wall Street Journal use is less than 1% of the time in the same context. And most of the remaining uses of "is" are in quotations, like this fine example from a 1/9/2007 story by Jeff Zeleny and Carl Hulse, "Democrats Plan Symbolic Votes Against Iraq Plan":

“We believe that there is a number of Republicans who will join with us to say no to escalation,” said the Senate majority leader, Harry Reid of Nevada.

Here are the counts from the ProQuest American Periodicals Series:

  are is Percent is

(I started in 1780 because there were not enough examples of either sort before that.)

Here are the counts from the New York Times archive -- the archive starts on 9/18/1851, so I ran each 20-year period from 9/18 to 9/17.

  are is Percent is

And here are the counts from the Wall Street Journal. The WSJ archive starts on July 8 1889, so I ran each 20-year period from 7/8 to 7/7.

  are is Percent is

If we plot all three on the same scale, putting each point at the center of its 20-year span, and using A for the American Periodicals Series, N for the New York Times, and W for the Wall Street Journal, it looks like this:


I guess that I should plot the error bars as well, but it's kind of a pain to do that in R's plot( ) function, so I'll use the excuse that breakfast time is almost over to leave the error bars as an exercise for the reader.

This evidence raises some additional questions, whose answers are also outside the bounds of breakfast, though some of the answers may be known.

1. Was there an earlier time when singular agreement was the norm in such cases? If so, when was it? Or is singular agreement in "there is/are a number of Xs" a fashion that grew to a point and then receded?

2. There's a more general tendency to use singular agreement with there, even when the noun phrase is unambiguously plural. Thus this wise counsel from Jeff Schneider, the coach of a high school basketball team in Virginia:

“Some kids try to blame themselves, saying, ‘I should have done this or that.’ I try to tell them that there is no one thing that lost the game, rather there is many things that could have won it,” said Schneider, whose team has now lost to top rivals Potomac Falls and Broad Run in consecutive games.

Or this quotation from Richard Campbell, of Lindale, Texas, who turns out to own an illegal dumpsite ("Trashing East Texas: Land Owner Visits Acre of Dead Animals", 12/19/2006):

"There is several different tracks of property that is probably 300 acres plus," says Campbell.

Or this observation by Bobby "Blitz" Ellsworth, frontman for the thrash metal band Overkill:

That is always a blast to go over there. Every weekend there is two or three festivals.

When people like Harry Reid use singular agreement with "a number of Xs", are they they treating "a number of Xs" as singular, or are they treating it as plural but using is anyhow? (Of course, there's always the possibility that he was misquoted... And see this earlier post for a discussion of a really easy way for a copy editor to change informal speech into non-standard speech in such sentences -- just "correct" there's to there is!)

3. What's the mechanism that creates the (ubiquitous) pattern of gradual change in relative frequency over the course of centuries, well documented elsewhere and exemplified in a small way by the counts cited here?


The first five hits for "there is a number of", APS Online, searching from 1/1/1800 onward...

"Special Manifestations of Divine Power and Grace", The New-York Missionary Magazine, 1802(3) p. 116: In another town, there is a number of open, avowed infidels.

"Amsterdam", The Boston Weekly Magazine, Nov. 20, 1802, p. 14: I went, between eleven and twelve last night, to a singular institution; by an ordinance of the government, there is a number of Bagnio houses in this town, to which women, who have no better means, are invited on the terms of a support for life.

Balance and Columbian Repository [Albany], June 14, 1803, p. 190: There is a number of brigand-barges out about different parts of the Island -- the ship May Flower, Capt. Logan, was chased close in with this port.

"Missionary Intelligence", The Massachusetts Baptist Missionary Magazine, May, 1804, p. 33: Here the cause of Zion seems to languish; yet there is a number of faithful brethren, who, I trust, will be able to stand as a bulwark against opposition.

"To The Gossip", The Boston Weekly Magazine, May 19, 1804, p. 117:

[Update: also see Sally Thomason's 9/6/2006 post "18th-Century Grammarians vs. Shakespeare et al.", which I had forgotten about.]

Posted by Mark Liberman at 08:21 AM

Are we ready yet to let a historian claim that English is a Celtic language with Germanic words?

There has long been a kind of secret society of scholars who consider English grammar to be deeply imprinted by Celtic language's structure. The idea is that the reason that English is, in terms of its grammar, a kind of twisted sister in the Germanic family (an Anglophone doesn't precisely feel "at home" when learning German, whereas a Swede, Frisian or even Afrikaans speaker does) is because the Celts who learned the language of the Angles, Saxons and Jutes learned it in a Celtic-infused way, and their way gradually became THE way English was spoken by everybody.

The more extreme advocates even claim that English is Celtic grammar with Germanic words. As you might expect, this idea has never penetrated mainstream work on English in any serious way (although none other than J.R.R. Tolkien was a fellow traveller). As such, I was fascinated to run across a casual espousal of the Celtic Hypothesis by a nonspecialist.

It's in Roger Osborne's oddish and fine new Civilization: A New History of the Western World, in which he describes post-fifth century Britain as "a combination of British and Germanic cultures -- the resulting language, for instance, was Germanic in vocabulary, but Celtic in construction." (p. 41)

Where Osborne picked this up is unclear: he happens to give only a very broad "Useful Sources"-type list of bibliography for his chapters. However, he certainly didn't get this from any standard source on the history of English, in which it is regularly stated that the Celts were overwhelmed by Germanic speakers and left no imprint on English beyond place names.

Yet I am not taking this as an occasion to do a grand old Language Log-style post highlighting Osborne as one more non-linguist disseminating linguistic falsehoods to the general public. That's because I have come to think that the Celtic Hypothesis crowd is, in fact, on to something.

Take the way English uses DO in negative sentences like HE DOESN'T KNOW, or interrogative ones like DO YOU LIKE CHEESE? It seems so ordinary to an Anglophone, but what language have you ever learned that used DO like that?

It wasn't a Germanic one, for example. German and the rest do use DO in a fashion related to this (nonstandard German ER TUT DAS SCHREIBEN "He writes that"). But those who have had German classes may well not have known that -- it's strictly nonstandard, rather than perfectly formal as it is in English. And no Germanic language uses DO obligatorily, rather than optionally, in ALL negative and interrogative sentences as English does.

And forget Romance, Arabic, Chinese. In fact, some of the only languages on earth (among 6000) that use DO in this way are Celtic ones spoken in Britain (as well as Celtic prodigal son Breton over in France, which was brought from Britain in the fifth century A.D.). In the late, great Cornish, for example, "Do you love?" was GWRA CARA?, in which GWRA was the DO word.

There are a passel of English grammar features like this, weird as Germanic goes but perfectly ordinary as Celtic. And on top of that, the wonders of the ongoing project tracing the migrations and blendings of humans since the emergence of our species via variations in DNA are currently deep-sixing the old idea that Celts somehow "disappeared" in most of Britain and wound up huddling on the margins in Wales or giving it the old college try in Cornwall but dying out eventually.

Rather, modern Brits are, genetically, full to bursting with chromosomal Celticity (a recommended source is Stephen Oppenheimer's new The Origins of the British). The Celts held on just fine -- and increasing evidence suggests that one of their most vibrant legacies has been leaving many of the features of their notoriously unique languages in English.

Not so many that English is really Celtic with Germanic words -- that is a tasty notion in its counterintuitiveness, but ultimately cannot stand. However, there is increasing evidence that there is enough Celtic in English that standard treatments might well one day start covering it substantially.

As such, I was tickled, albeit perplexed, to find a historian writing casually as if the Celtic Hypothesis were accepted canon -- because I have come to hope that someday it will be.

Posted by John McWhorter at 02:57 AM

January 13, 2007

Canada is different from the US

News of what US politicians think about language seems almost always to be about their efforts to make English the official language and discriminate against other languages. In Canada, things are a bit different, as shown by this story [subscription required] in the Vancouver Sun about two British Columbia MLAs (the equivalent of state legislators), Sue Hammel and Bruce Ralston, who are learning to speak Panjabi, the first language of about 30% of their constituents.

Interestingly, in light of complaints by English First-ers in the US that immigrants seek to impose their languages on Americans:

Ralston said constituents who come to his community office don't expect him to speak Punjabi. Those who are not fluent in English usually bring along a relative or friend who is. Learning basic Punjabi "is really something I've decided to do, rather than something people have demanded of me," he said.

Addendum 2007-01-14: It has been brought to my attention that you can't read the article I linked too without a subscription to the Vancouver Sun, which I have, but most of you probably don't. I don't think that reproducing the entire article would constitute fair use, but here's a little bit more:

Hammell and fellow New Democrat Bruce Ralston, the MLA for Surrey-Whalley, have both enrolled in beginner's level Punjabi classes at the Surrey campus of Simon Fraser University -- a fact the university disclosed in a release advertising the Jan. 20 start of a new semester of Punjabi classes.
Hammell, meanwhile, says she is "past beginners" -- an accomplishment acknowledged by the SFU release, which said Hammell "can now read the language with some confidence."

Judging from the attempt to conceal John Kerry's ability to speak French, that would probably ensure her defeat in the US.

Posted by Bill Poser at 06:44 PM

There's a bunch of reasons -- or are there?

A few days ago, the Language Log bat signal went up over Livejournal, where mrs_cake wrote ("This is the weather of our discontent"):

Now, forward my trusty grammarians. On Language Log, my bible for all things language, I saw the following in an entry I came across while following linkage - thus I will never find it again. Sorry I can't reference this properly. Anyway, this was the linguistic stone I stumbled over (out of context I'm afraid):

There are a bunch of reasons for this.

I'm sure you see where I'm headed. I have some issues with singular vs plural (but I admit to subscribing to the 'singular they' school of thought wholeheartedly). Now. I would use "are" here because it sounds better (don't hit me). However, is there some rule that governs the use of singular vs plural "bunch"?

The Corpus that is Googles yields: "there is a bunch" = 184,000 / "there are a bunch" = 988,000 results. Usage figures would indicate that bunch is used with the plural much more often than with the singular.

While I'm at it, what about "number"? I'd always say "There is a large number of reasons". Is this correct?

Modern standard usage is heavily in favor of "there are" with noun phrases like "a bunch of" and "a number of", as mrs_cake observes and we confirm below. But there are a bunch of interesting reasons why mrs_cake might be puzzled about this. It's at the intersection of uncertainty about what verb forms to use with subjects like "a <collection-word> of Xs", and uncertainty about the developing indeclinability of "there's". So nobody better hit her, OK? Her preferred usage is the standard one, and at the same time, her ambivalence about it is well founded and rational.

The cited string {"are a bunch of reasons"} doesn't seem to occur in the Language Log archive's 4,000+ posts. However, {"are a bunch of"} occurs six times. (The six bunches in question are "flaming homosexuals", "factors", "uncultured yahoos", "ugly chain-whipping mofos", "responses", and "competing landscape-oriented ape-brain simulators for Mac OS X", indicated the breadth of our coverage as well as the context's informality and apparently negative affect; but I digress.) Three of these examples are in the frame "there are a bunch of".

The string {"is a bunch of"} occurs three times. In this case, the bunches are "sissies", "little words that Paglia probably couldn't care less about", and "pompous turkeys". Three out of three are negative this time -- but none of them are involve the frame "there is a bunch of". And the contracted alternative {"there's a bunch of"} doesn't occur at all.

As mrs_cake points out, counts out on the web are consistent with ours in preferring "there are a bunch of Xs" to "there is a bunch of Xs". But she omitted the common contracted option, "there's", which is overall about three times commoner than "there is" in this frame, and substantially boosts the proportion of bunches with (apparent!) singular agreement:

  Google Yahoo MSN Language Log
there are a bunch of
there is a bunch of
there are / there is
Percent is
there're a bunch of
there's a bunch of
there are / there's
there('re| are) / there('s| is)
Percent is or 's

There's a bunch of different things going on here. The first is the question of whether "a bunch of Xs" is singular or plural. In modern standard usage, it's overwhelmingly treated as plural. In terms of Google counts, the following examples make the point:

a bunch of __ is
a bunch of __ are
proportion "is"
"bunch" as subject in sample
corrected proportion of "is"

[The reason for the "corrected proportion" in the table above is that in most examples of "a bunch of Xs is", bunch is not the subject of is. Thus

blind juggling for a bunch of people is hard
what kind of person would think that pretending to kill a bunch of people is funny?

But there are a few examples of where bunch is really the subject, e.g.

besides a trip to Boston next weekend a bunch of people is going to Delaware in June.
the sort of ambiance that enriches the side-conversations that ensue when a bunch of people is making sandwiches

So I estimated the percentage by checking the top 20 hits -- I wish Google would give you the option to see 20 random hits! -- and corrected accordingly. The correction is probably not very accurate, but the whole thing is an order-of-magnitude exercise anyhow...]

But the second question is what's going on with "there is" vs. "there's".

There's a significant tendency for "is" to increase relative to "are" in the frame "there __ a bunch of Xs" compared to "a bunch of Xs __". However, the big news is the widespread usage of the contracted form "there's a bunch of Xs":

places folks songs reasons
There is a bunch of __
There are a bunch of __
There's a bunch of __
are / is
are / 's
proportion "is"+"'s"

We should note in passing that there seems to be quite a bit of word-to-word variation in usage here, among various values of X in "there's a bunch of Xs". And a quick check suggests that the pattern is consistent across search engines, though the details differ -- thus comparing things to people, usage shifts in a major way from "there are a bunch of ___" to "there's a bunch of __".

  Google Yahoo MSN
There is a bunch of things
There are a bunch of things
There's a bunch of things
Percent     is - are - 's 0.8 - 96 - 3.2 2.5 - 84.9 - 12.6 2.8 - 73.9 - 23.3
There is a bunch of people
There are a bunch of people
There's a bunch of people
Percent     is - are - 's 1.1 - 68.9 - 30 2.4 - 61.4 - 36.2 7.4 - 59.2 - 33.4

But the main point is that overall, "there's + <plural noun phrase>" has a different usage pattern than "there is + <plural noun phrase>". Arnold Zwicky and I discussed this a couple of years ago -- When "there's" isn't "there is" (9/1/2005). Arnold put it this way:

"there is" + <plural noun phrase> is indeed nonstandard (and somewhat more common in the south and south midlands than elsewhere, I believe -- I'm away from my sources on this today) , but "there's" + <plural noun phrase> should really be characterized, in current English, as merely informal/colloquial, rather than nonstandard.

A theory that fits the facts, I think, is that there's is on its way to be re-analyzed as an indeclinable form like French il y a or Spanish hay. For most people, it co-exists with the older there is/are pattern, as an informal/colloquial variant. The smaller increase in the proportion of "there is + <plural noun phrase>", compared to "<plural noun phrase> is", is presumably due to uncertainty about how to spell the indeclinable form.

And bunch, of course, is already informal/colloquial, so "there's a bunch of __" fits nicely. And talking about "reasons" is arguably less colloquial than talking about "people", so that might be why "there's a bunch of reasons" is so much less common, relatively speaking, than "there's a bunch of people" or "there's a bunch of places". Unfortunately, I don't have any plausible story to tell about why "there's a bunch of things" is also frequentistically impoverished.

What about the tendency of Language Loggers to treat X in "a bunch of Xs" as negatively evaluated? (Actually, in most of the cited examples, we're attributing the negative evaluation to someone else, by means of free indirect speech -- a topic for another post.)

This appears to be a rhetorical peculiarity of our own, not a general characteristic of the language. On the web, there are bunches of nutjobs, knuckleheads, masochistic motherfuckers, and useless truths, but the web also offers equally large bunches of imperiled children, really good liberal bloggers, excellent free choices, and applications that are cool.

As for the grammatical number of a number of and the general class of similar expressions, it would take several separate posts to do it justice. But I'll note in passing that {"there are a number of"} occurs 31 times in the Language Log archive, while {"there is a number of"} doesn't occur at all. And if we ask the LION archive about usage in the literary canon, broadly construed, we get the following distribution:

  Poetry Drama Prose
there is a number of
there are a number of

The two poetic hits for "there is a number of" date from 1589 and 1647. The dramatic example is from 1783, and the prose examples are from 1771, 1817, 1838, and 1850.

The poetic hits for "there are a number of" date from 1810, 1840, 1937, 1937, 1991, 1991, 1994, 1994, 1996, 1997 (though the two 1937 items are from modern footnotes in an edition of Swift's poems, and thus not strictly speaking poetry at all).

The prose examples of "there are a number of" start in 1761 with Charles Batteux's A Course of the Belles Letters:

There are a number of other actors, whose parts, though not so considerable as the foregoing ones, are nevertheless each of them characterised, sometimes by an historical stroke, at others by a particular and personal incident, or circumstance remarkably interesting.

They continue through Hazlitt's 1818 Lectures on the English Poets:

There are a number of good lines and good thoughts in the Cooper's Hill.

John Keats' 1819 letter to George and Georgiana Keats:

This Winchester is a place tolerably well suited to me; there is a fine Cathedral, a College, a Roman-Catholic Chapel, a Methodist do, an independent do,---and there is not one loom or any thing like manufacturing beyond bread & butter in the whole City. There are a number of rich Catholic‹s› in the place. It is a respectable, ancient aristocratical place---and moreover it contains a nunnery.

William Makepeace Thakeray's 1848 Vanity Fair:

THE kind reader must please to remember ---while the army is marching from Flanders, and, after its heroic actions there, is advancing to take the fortifications on the frontiers of France, previous to an occupation of that country,---that there are a number of persons living peaceably in England who have to do with the history at present in hand, and must come in for their share of the chronicle.

And Mark Twain's 1873 The Guilded Age:

There are a number of prarie dogs running around.

A few more contemporary counts of usage in somewhat authoritative sources:

  "there are a number of" "there is a number of" Percent "is"
Google News
NYT Archive
(since 1981)
The Guardian
(online search)
The Atlantic
(1857 to present)


So mrs_cake's preference for "there is a large number of reasons" puts her very much in the modern minority, though more in tune with the fashions of earlier times. But again, no one should hit her -- this is a word-rage-free zone.

[Update -- John Cowan writes:

J.R.R. Tolkien once received a letter (addressed to "any Professor of English Language") asking him about the rectitude of "A large number of walls is/are being built", and saying that "big money" was riding on the issue. He answered, of course, that you can say what you like. His original reply is not in print AFAIK, but a letter to someone else referencing it is in The Letters of JRRT.


[Update: more here and here.]

Posted by Mark Liberman at 10:37 AM

More alleged trouble with alleging

Kelly Parnell found a lovely headline mistake at a page owned by the Rochester Democrat and Chronicle (the link will come later): it said:

Driver in fatal hit-and-run allegedly turns self in

The wrong place for that awkward police-talk modifier, of course (Language Log has discussed such things before; see e.g. this post by Arnold Zwicky). As Kelly noted, what was alleged was that the hapless individual had been involved in a fatal hit and run (a bad thing), not that she turned herself in (a relatively good thing). It's absolutely certain that she turned herself in — she was remanded to the county jail. The clearly ill-chosen modifier placement in the original headline got me thinking about just how many places there were to try:

  1. Alleged driver in fatal hit-and-run turns self in
    (Was she really at the wheel?)

  2. Driver allegedly in fatal hit-and-run turns self in
    (Was her vehicle ever really there?)

  3. Driver in allegedly fatal hit-and-run turns self in
    (Did anybody really die?)

  4. Driver in fatal alleged hit-and-run turns self in
    (Was it really a hit-and-run?)

  5. Driver in fatal hit-and-run allegedly turns self in
    (Did she really show up at the police station?)

  6. Driver in fatal hit-and-run turns alleged self in
    (Was it really herself that she turned in?)

Kelly and I were crestfallen when we found the people at the paper had fixed their error within a few hours (though not soon enough: as of January 13 searching on Google News for "allegedly turns" would still find the original). The headline as revised at the newspaper's site makes perfectly good sense now. But I wonder if you can guess which of the above possibilities it was actually changed to?

You can find out by going to the page in question and simply reading the new headline.

Posted by Geoffrey K. Pullum at 12:47 AM

More on how the basil leaves

An amazingly rich crop of email correspondence from my simple little post about how on earth "basil leaves" could have been translated for a grocer's sign into a Spanish phrase meaning "basil departs". A correspondent called Trevor told me he thinks http://www.translationgold.com/ might have been responsible (but hey, we're Language Log, not Blame Log). Mark Reed thinks maybe the grocery store used the same translation service as his local Chinese restaurant, where he has noticed that the Spanish menu offers chow mei divertido ("chow mei fun": geddit?). Tako Schotanus thinks it was a print dictionary, and notes: "a lot of people don't know how to properly use a dictionary ... [and] will not even look at all those strange little abbreviations that will tell you if a word is a verb or a noun." Good point, that. Clarissa Ryan tested out Google and Babelfish, and found that both gave the correct translation for "basil leaves" or even "the basil leaves," but the incorrect translation for "the basil leaves and branches can be used." Cute syntactic discovery! And Nancy Friedman had perhaps the most interesting idea of all: that it might have been a transcription error rather than a translation error. The original might have read albahaca seca (dried basil), which somehow could have migrated via bad handwriting to albahaca seva and thence to albahaca se va. Ingenious!

Meanwhile, Dick Margulis wrote that he regards the mistranslation as serendipitously and poetically true: "Men leave, basil leaves; rosemary is for remembrance." And Barbara Zimmer also noticed that "basil leaves" is like "fruit flies", and had a further question: she asks whether there is a special term for such phrases:

Basil leaves
Fruit flies
Cracker jacks
Banana peels
Cough drops
Apple fritters
Juice boxes
Bean sprouts
Onion rings
Bed springs

The answer is that there's no special name: they're just two-word noun phrases that happen to be homophonous with two-word declarative clauses. You'll get one every time you can find a pair (AB) in which all of the following conditions are met:

  1. A is a noun that can stand alone (with no determiner);
  2. B is a verb that needs no complement;
  3. B ends in the 3rd singular present inflection "-s";
  4. the stem of B also happens to be a noun stem;
  5. as a noun stem, B inflects regularly for plural;
  6. the plural noun B can occur with the noun A as an attributive modifier.

A tricky set of constraints to comply with; yet Barbara Zimmer found ten cases in all. It's quite amazing what we can do with our language even when we're just playing.

Update: Barbara Partee recalls singing a song (at camp when she was a kid) with the words: ""Have you ever seen a house fly, a house fly, a house fly? Have you ever seen a house fly? Now you think of one!". New verses would be made up on the fly, the guiding constraint being that a new verse should be based on a pair (AB) such that A is a noun in its singular inflected form, B is a verb in its plain form, and A B is a compound noun or a noun phrase, so that there would be an ambiguity between a noun phrase sense of A B and another sense in which see A B is an instance of the perception/causation verb phrase construction meaning "see A perform the activity denoted by B". Another game that's easier to play than to describe accurately.

Posted by Geoffrey K. Pullum at 12:07 AM

January 12, 2007

Our Polish mascot

It's nice to see that Language Log's Polish mascot (Lech, the legendary founder of Poland) is finally alluded to in our, um, pages, in the head of Roger Shuy's recent posting, "Language Log Pole vaults into the future".  Lech is, of course, the patron saint of hard copy.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:56 PM

The sad anniversary of Air Florida Flight 90

Twenty-five years ago, as my Eastern Airline flight from Oklahoma City was landing at Washington National Airport, I noticed the pilot's apparent hesitancy as he brought the plane down in a blinding snowstorm. We  bounced on the runway and eventually pulled up at the gate. When I got off, I was surprised to find an almost empty terminal. Routinely, I went outside to take a taxi home. No taxis. So I crossed the street to board the Metro. After sitting on it for some thirty minutes, I learned that there had been a fire on the Metro on some other line, causing the entire system to shut down. My home was only about five miles from the airport so I took the only transportation available--my legs. As I walked toward Key Bridge, it was clear to me that walking was faster than the cars and trucks that were sitting motionless on the streets. Eventually I reached a pay phone, where I called my wife to tell her that I'd be late getting home. She cried in relief, explaining to me that the radio reported a plane crash at National and she was terrified that I might be on it.

The date was January 13, 1982. The crash was that of the now famous Air Florida Flight 90. I later learned that my own flight was the last one to land before the airport closed. What was so eerie to me was that as I walked home, I could see or hear nothing to indicate the tragedy that killed 78 passengers. In the blinding snow I could hear no sounds of helicopters hovering. I could hear no sirens. Even though the crash was on the next bridge east, the 14th Street Bridge, it was close enough that one would think some noise or sights might be evident. I guess it was that kind of storm.

Today's Washington Post describes how this crash left a legacy of failed communication in the airline business, especially what the article calls the "authoritarian cockpit culture dominated by captains." The conversation among the captain and others, while still on the ground and shortly after they took off, was recorded for history, including the following excerpts that may illustrate what the post article was talking about.

Captain: It's not really that cold.
First Officer: It's not that cold, cold, like ten with the wind blowing, you know. People's going to deplane in the snow here. Piedmont's going to park it on the ramp.
Captain: Here comes the chain tractor.

The First Officer indirectly expressed his concern here, which the Captain ignored. Other instances of the crew's concerns about taking off follow:

First Officer: Boy, this is shitty. It's probably the shittiest snow I've ever seen.
Captain: (inaudible) go over to the hangar and get deiced.
First Officer: Yeah, definitely.
Stewardess: The tire tracks in the snow, is that the way ours are, that low to the ground too?

          *      *      *
First Officer: What's the release good for, one hour? One hour release? Ha, ha. God, he said LaGuardia is not taking anybody. It's early yet. We may end up in Kennedy or somewhere, you never know (sound of laughter).

          *      *      *
Captain: Tell you what, my windshield will be deiced. I don't know about my wing.
First Officer: Well, what we really need is the inside of the wings anyway, the wing tips are gonna speed up by eighty anyway, they'll, they'll shuck all that other stuff (sounds of laughter).

       *      *      *
First Officer: Yeah, Palm thirty-five's in the holding pattern right now.
Captain: Gonna get your  wing now.
First Officer: D'they do yours? Can you see your wing tip now?
Captain: I got a little on mine.
First Officer: A little? This one's got about a quarter to half an inch on it all the way. Look how the ice is just hanging on his, uh, back there. It's impressive that these big old planes get in here with the weather this bad, you know? It's impressive.

       *      *      *
First Officer: See all those icicles on the back and everything? ... See this difference in that left engine and the right one? Don't know why that's different ... I'm certainly glad there's people taxiing on the same place I want to  go 'cause I can't see the runway, taxiway withough those flags.

       *      *      *
First Officer: Boy, this is a losing battle here on trying to deice those things. It gives you a false feeling of security, that's all it does.
Captain: That, uh, satisfies the Feds.

       *      *      *
First Officer: Boy, I'll bet all the school kids are just *** in their pants here. It's fun for them. No school tomorrow, yahoo (sounds of laughter).

       *      *      *
First Officer: Let's check these tops again, since we been sitting here a while. I think we get to go here in a minute.

       *      *      *
First Officer: Slushy runway. Do you want me to do anything special for this or just go for it?
Captain: Unless you got anything special you'd like to do.

Air Florida Flight 90 then took off. There isn't much transcript because the plane reached only to the 14th Street Bridge before ramming it and crashing into the Potomac River. Here are excerpts again:

First Officer: God, look at that thing. That doesn't seem right, does it? Ah, that's not right.
Captain: Yes it is, there's eighty--
First Officer: Naw, I don't think that's right... Ah, maybe it is.
Captain: Hundred and twenty.
First Officer: I don't know.

       *      *      *
First Officer: Larry, we're going down, Larry.
Captain: I know it.

(sound of impact)

There were many problems that led up to this tragedy, but, as the Post article observes,  at least one of these may be related to the language used in the cockpit.

Posted by Roger Shuy at 01:42 PM

A conscious negotiation, if not indeed a virtual exchange of views

Not long after noting Plug Loafsley's v-like /r/, Thomas Pynchon takes up the phonetic motivations for sound change in a more direct (though less human) fashion. It's page 406 of Against the Day, and we're back with the Chums of Chance on the Inconvenience, now hanging around the First International Conference on Time Travel at Candlebrow U.:

One night after Evening Quarters, the Tesla device came squawking to life, and the boys gathered around to listen. "Having taken delivery," announced a deep, reverberant voice, "from duly authorized agend Alonzo Z. Meatman of the map informally known as the Sfinciuno Itinerary, signing all receipt forms properly, you are directed to set course immediately for Bukhara in Inner Asia, where you will report T.D.Y. to His Majesty's Subdesertine Frigate Saksaul, Captain A. Zane Toadflax, Commander. It is assumed that the Inconvenience already has a complete allocation of current-model Hypopsammotic Survival Apparatus on board, as no further expenditure for that purpose will be approved."

Puzzled about what that "Hypopsammotic Survival Apparatus" might be, the Chums consult Professor Vanderjuice, who directs them to Roswell Bounce:

As early as 1899, the Professor had informed them, Roswell had grasped the principles of what would become the standard-issue Hypopsammotic Survival Apparatus or "Hypops", revolutionizing desert travel by providing a practical way to submerge oneself beneath the sands and still be able to breathe, walk around, so forth.

"You control your molecular resonance frequencies, 's basically all it is," explained Roswell, "including a fine-adjustment feature onto it to compensate for parameter drift, so as to keep everything solid-looking but dispersed enough that you're still able to walk through it all 'th no more effort than swimming in a swimming hole. Sonofabitch Vibe Corp. stole it from me, and I feel no hesitation about beating their prices. How many were you looking for?"

The "Sfinciuno Itinerary", by the way, is an optically-encrypted map of old Venetian trading posts in central Asia, which shows the location of the lost subterranean city of Shambhala.

Some time later, the Saksaul puts in at the buried port city of Nuovo Rialto.

Nearby loomed a high, ruinous structure of great antiquity, of some red-brown color suggestive of blood spilled none too recently, whose supporting pillars were torch-bearing statues male and female, and whose pediment was inscribed in an alphabet invented, according to Gaspereaux, by Mani himself . . .

It was here, evidently, that the sand-frigate planned to tie up. After evening "chow", enjoying a cigar on the fantail, Chick heard a high-pitched screaming, which seemed to him almost articulated into speech. He located a pair of under-sand goggles, slipped them on, and peered into the darkness beyond the settlement walls. Something large and heavy came thundering by, in high swooping hops, and Chick thought he recognized the smell of blood. "What in Creation was that?"

Gaspereaux had a look. "Oh. Local sand-fleas. Always coming round to see what's what whenever a new ship pulls in."

"What are you talking about? Whatever just went by was the size of a camel."

Gaspereaux shrugged. "Down here they are known as chong pir, big lice. Since the first Venetians arrived, these creatures, following a diet exclusively of human blood, have grwon over the generations larger, more intelligent, one ventures to say more resourceful. Feeding upon the host is no longer a matter as simple as mandibular assault but has evolved into a conscious negotiation, if not indeed a virtual exchange of views--"

"People down here talk to giant fleas?" inquired Darby with his accustomed directness.

"Indeed. Usually in a dialect of ancient Uyghur, though, owing to the mouth structure unique to Pulex, one finds certain difficulties with phonology, notably the voiced interdental fricative--"

Cute. But this seems to be an unusual case where Pynchon's fantasy runs aground very quickly on the shoals of fact. In the real world, the nature of insect respiration means that mouth structure (unique to Pulex or otherwise) ought to be the least of an insect's difficulties in generating fricatives of whatever place and manner.

The respiratory system of insects (and many other arthropods) is separate from the circulatory system. It is a complex network of tubes (called a tracheal system) that delivers oxygen-containing air to every cell of the body.

Air enters the insect's body through valve-like openings in the exoskeleton. These openings (called spiracles) are located laterally along the thorax and abdomen of most insects -- usually one pair of spiracles per body segment. Air flow is regulated by small muscles that operate one or two flap-like valves within each spiracle -- contracting to close the spiracle, or relaxing to open it.

After passing through a spiracle, air enters a longitudinal tracheal trunk, eventually diffusing throughout a complex, branching network of tracheal tubes that subdivides into smaller and smaller diameters and reaches every part of the body. At the end of each tracheal branch, a special cell (the tracheole) provides a thin, moist interface for the exchange of gasses between atmospheric air and a living cell. Oxygen in the tracheal tube first dissolves in the liquid of the tracheole and then diffuses into the cytoplasm of an adjacent cell. At the same time, carbon dioxide, produced as a waste product of cellular respiration, diffuses out of the cell and, eventually, out of the body through the tracheal system.

This respiratory system is not only separate from the insect's circulatory system, it also doesn't share the mouth with the digestive system, so that mouth parts can't be used to modulate an airstream. In fact, as far as I know, insect sounds are all produced by rubbing body parts together, rather than by using the reptilian and mammalian trick of exciting resonant cavities with sound (whether hiss or buzz or blat) generated by the aerodynamics of constricting a respiratory airstream. And the sound-generating body parts are generally wings or legs rather than mouthparts -- though I guess all bets are off for camel-sized fleas, which could only have grown so large by somehow supplementing or extending the usual insect tracheal system so that body size is not limited by the physics of gas diffusion.

Of course, mouth-part peculiarities can certain have phonetic effects:

Posted by Mark Liberman at 05:50 AM

January 11, 2007

Language Log pole vaults into the future

In a startling and totally unexpected announcement today, Language Log's Board of Directors, speaking from their spacious conference room at Language Log Plaza, shocked the blogosphere by revealing that they have voted to discontinue all electronic posts as of April 1, 2007. Instead, Language Log is to become a hard copy journal, flying in the face of the opposite modern trend away from paper versions to electronic formats. From now on, interested readers will have to subscribe to this new journal (yet to be named) for a substantial sum of money (yet to be determined).

When asked why this change is being made, an unnamed Language Log spokesperson said:

A hard  copy journal is what the public really wants. It will give Language Log staff a chance to offer more thoughtful language science information, much the way readers now get it from the media, such as the BBC Science Section. Anyway, the public appears to respond better to hard copy, as evidenced by their  continuing belief that females speak three times more words than males, that Eskimos have 17 (or more)  words for snow, that British cows moo in regional dialects, that restrictive clauses always take "which," and that the English lexicon is about to reach a million words. We've done our very best to counter this nonsense in our outdated electronic format but the Board of Directors believes our writers will be much more effective if, in the future, their information is printed on paper.

Ironically, the timing of this unusual and highly debated decision comes on the heels of the Linguistic Society of America's recent announcement that it is  going in exactly the opposite direction, changing its book notices section from print to an electronic format.  Blackwell Publishers has just begun a new electronic journal, Language and Linguistics Compass, and many other electronic journals are already up and running or are soon to become available, illustrating how overcrowed this outmoded approach is becoming.

So, in order to serve its faithful readers even better, Language Log is, as usual, at the forefront of progress.

Posted by Roger Shuy at 01:10 PM

Common Sense

We'd all be a lot safer, I suspect, if the intelligence functions of the Department of Homeland Security were turned over to the alumni-association industry. Those people are efficient, all-knowing and politely relentless. As I've moved through life since obtaining an MIT degree in 1975, I've apparently been adopted by the local MIT Club of every region that I've ever lived in, and every one of them sends me multiple email reminders of their frequent and interesting events, none of which I've ever actually attended.

The most recent flurry of alumnospam advertised a lecture tour by Henry Lieberman from the Media Lab, on the topic "Beating Common Sense into Interactive Applications". The various emails in the series featured catchy graphics, like the one on the right, and copy like this:

Would your computer understand the joke? Would it have more 'common sense' than Gracie? Not now, but the work in the Software Agent's Group lead by Henry Lieberman is working to bridge that gap. Creating and linking common sense knowledgebases to the problems of interactive applications.

This emerging technology allows computers to know things indirectly. Not by facts or structures in the application. But by inferring it from the knowledge of human experience that becomes 'common sense'. [...]

This technology is at the core of the next generation of web functionality…and this is your chance to here about it from one of the researcher's most linked on the web. And a chance to network with other alumni of course!

I'm tempted to say that failure to understand old comedy routines is not my computer's biggest current problem; but that would be a cheap shot. So what I'll say instead is that someone should do a magazine piece on the way that classical AI, driven underground by the ascendency of Machine Learning, has morphed into the Semantic Web and related research.

Earlier LL posts on related themes:

"AI is brain dead" (7/31/2003)
"Ontologies and arguments" (11/11/2003)
"Borges on metadata" (11/13/2003)
"Trees spring eternal" (11/23/2003)
"The AI gnomes of Zurich" (11/26/2003)
"Retrospective: the semantics of harpers.org" (1/31/2004)
"Thesauri, SKOS and terminology variation" (3/16/2004)
"The passivator" (4/16/2004)
"Their GO-mark'd love" (10/23/2004)
"EMELD" (7/2/2005)

[Neal G. writes:

What struck me about your Burns & Allen post is that at first I didn't really get the joke. I thought that Gracie was just being a prescriptivist who thinks that "take" shouldn't be used synonymously with "bring."

It took me a minute or two to realize that the joke really turned on an ambiguity in how the word "her" was being used—as a possessive determiner or as an indirect object. (I apologize if my terminology isn't exactly right; I'm not a linguist, I just play one on TV). What George intended his sentence to mean was "I took flowers to her," but what Gracie heard it as something like "I took flowers belonging to her out of her room." The joke wouldn't have worked if the person in the hospital had been a man, because each of the two functions is expressed in a different word (lexeme? lexical item? whatever): "him" and "his." That means the ambiguity can't arise in the first place.

The fact that I didn't pick immediately get the joke made me wonder if maybe using "take" as synonymous to "bring" wasn't as well established back when Burns & Allen were doing this bit as it is now. Because if you don't pick up on the ambiguity at least subconsciously, the joke isn't very funny.

Of course, the other possibility is that I was just being dense.

(Let me say first that despite Neal's diffidence about terminology, the logic of his analysis is exactly correct.)

I don't think that the relevant aspects of the sense and usage of bring and take were any different 50 years ago than they are today. I don't have any information about relative frequency, but it's clear that the construction "take <someone> <something>", in the sense of transferring it to them as a gift, has been part of the English language for well over a century. For example, the OED contains these examples (from the entries for shine, preserve, and medlar, respectively):

1847 ROBB Squatter Life (Bartlett 1860), To make a shine with Sally, I took her a new parasol.
1854 MRS. GASKELL North & S. xx, Perhaps, I might take her a little preserve, made of our dear Helstone fruit.
1881 R. D. BLACKMORE Christowell xxxvi, We will take her some medlar jelly.

So I don't think that that the problem is a change in the language. But I don't think that Neal is dense, either. A crucial part of putting the joke across was Gracie's reputation for ditsy misunderstandings of this general type, and (in this case) her eloquent unspoken expression of concern at George's rude behavior. Out of context, the joke is much harder to get than (say) Groucho's famous line, "Outside of a dog, a book is man's best friend. Inside of a dog it's too dark to read." ]

[Steve J. writes:

It took me a minute to get the joke too. The reason I believe is that in the written version here we have both verbs italicized but the joke does not play on the difference between verbs.

The example seems a favourite amongst those involved with AI.

Googling, I've come across these three examples:
where the difference between the two parsings is correctly given,
and direct references to the George Allen joke here:
where "brought" is italicized but not took.

There is also another example here

where "took" is circled and "bought" is italicized, so it looks like the problem might have started with Lenat.

Presumably the double italicizing in the alumnospam was done by the person from MIT alumni responsible for publicizing the meeting. This does suggest that work on improving individual human intelligence might be a more fruitful endeavour than worrying about the smarts of our computers.

Hmm. Do you mean the intelligence of the individual designing the invitation, or the intelligence of the individuals reading it? As for Doug Lenat, when someone has a unit in the (extended) SI system named after him, you have to cut the guy some slack. ]

[Matt writes:

Maybe this doesn't warrant an update to the post, but FYI, Wikipedia has a different version of the "took her flowers" joke which is set up much more obviously:

(from http://en.wikipedia.org/wiki/Gracie_Allen )

George: (looking at Gracie, who is arranging a large vase of beautiful flowers) Grace, those are beautiful flowers. Where did they come from?

Gracie: Don't you remember, George? You said that if I went to visit Clara Bagley in the hospital I should be sure to take her flowers. So, when she wasn't looking, I did.

There's no reference (and it IS Wikipedia), so who knows if this is the original (no doubt they did this kind of gag a few different ways), but it does take care of the "Isn't that a little too obscure to be funny?" issue.


Posted by Mark Liberman at 08:18 AM

January 10, 2007

Is it down cigar head can pull out necessary?

The Language Log bat signal can be sent out from anywhere... even from an automobile work-light in northern Ghana. John Schaefer passes along the packaging from a Chinese-made work-light that his brother found in a store in the Ghanaian city of Tamale. The text on the package is an impressive piece of Chinglish found poetry.

You can click on the picture to the right for the full-sized image, or just follow along with this transcription, in five eloquent stanzas:

Operation method :
1 . The regular whole light of
magnet in the base and nece
ssary position
2 .  the fan-shaped tooth which
locks the organization
establishes the lamp holder
on different angle and position
3 . is it down cigar head
can pull out necessary
4 wire . insert some
cigarette device
5 in . finish using directly
cigar head to draw, in
the rotationoverlayed
before the wire is black
through the head is
deposited in  In the
shell of one

I particularly like the unexpectedly abrupt ending, with its Joycean echo of "A way a lone a last a loved a long the". John Schaefer appeals to Language Log for help with a number of imponderables:

I guess I want to know:
(1) Was this a case of reading across or down or otherwise backwards/sideways in the direct back-translation from the Chinese characters, or
(2) was it a case of the layout getting garbled in some computer program, or
(3) is it just somebody's best stab for packaging heading to anglophone Africa (where, really, nobody needs instructions to figure out how to work a flashlight)?
In any case, where did "cigar head" come from?

I have no good answers for any of these questions, though I assume "cigar head" refers to the plug that's inserted into the cigarette lighter. Further research may involve contacting the manufacturer, which I managed to track down through the wonders of Google. This particular item looks to be the TL1155 Spotlight, manufactured by Ningbo Taller Electrical Appliance Co., Ltd. (see here, here, and here). I propose a Language Log fact-finding mission to Ningbo Taller headquarters, located in the Simen Industrial Park of Ningbo, in China's Zhejiang province. From Ningbo we could then head out to Ghana for an on-site inspection of Tamale, before returning to the comfort of Language Log Plaza. I'd expect funding agencies to jump at the chance to underwrite this trip, since it's all in the name of improving cross-cultural understanding in this confusing era of globalization and transnational commodity flows.

[ Previous posts on Chinglish:
"Engrish explained" (3/11/06)
"A grander Chinglish" (5/15/06)
"Regale in basilica" (5/18/06)
"A less grand Chinglish" (5/30/06)
"GAN: Whodunnit, and how, and why?" (5/31/06)
"Further thoughts on the riddle of GAN" (6/3/06)
"The shrimp did what to the cabbage?" (9/11/06)
"And next, facial poo" (10/25/06) ]

Posted by Benjamin Zimmer at 12:23 PM

Semantic drift in the classroom

Amazingly enough, the string {"making off with tons of booty"} is unknown to Google.

[Update -- David Denison writes:

Nice cartoon on Language Log just now. Just to tell you that it's happened to me. In a history of the lang. class a couple of years ago I used the word booty in its original sense, not so much cluelessly as thoughtlessly - didn't see the risk till the word was leaving my lips - and sparked a fit of the giggles in a couple of students. Had to come over all stern and middle-aged.

But the piece about the Sunday Times was more important, and horribly dispiriting. I'd love to believe your closing sentence.

Keep up the good work - Language Log is a much better excuse for not getting on with my own work than almost any others I've been able to come up with.


[Update #2 -- Dennis Paul Himes writes:

The first thing I noticed about that Zits "booty" comic when I read it was that the teacher's statement was just as correct for the students' alternate meaning of "booty" as it was for the intended meaning. DNA analysis of Icelanders has confirmed that their ancestry contains a great deal of Celtic booty.

A truly generation-bridging response!]

Posted by Mark Liberman at 07:22 AM

The ultimate nightmare becomes an everyday reality

According to The Next Hurrah ("A wolf in gay sheep's clothing: Corruption at the London Times" 1/4/2007):

This week the Sunday Times of London published a pack of lies so transparent, so thoroughly discredited, that its appearance can't be chalked up to mere journalistic sloppiness. Rather, the timing of the piece, its willful disregard of the truth, and the behavior of the journalists themselves indicate a deliberate political hit job purposefully dressed up in the garb of one of the most internationally respected newspapers.

Harsh words, but apparently true. Except maybe the story wasn't "a deliberate political hit job", but just an example of the arrogance and incompetence of (some) journalists. And maybe it wasn't a pack of lies, exactly, but rather an example of a different rhetorical category.

It's true that the Times article baldly asserted multiple blatant falsehoods about research by Charles Roselli. One simple and concrete example out of many: the article asserted that "The animals’ skulls are cut open and electronic sensors are attached to their brains", while in fact, the cited experiments involved no surgery of any sort -- rather "Pregnant ewes (n = 10) were treated with the aromatase inhibitor 1,4,6- androstatriene-3,17-dione (ATD) during the period of gestation" and the effects on the sexual preferences of their offspring were observed. (Though it doesn't matter, there was essentially no effect.) For an inventory of other falsehoods in the story, you can read the long discussion in the cited blog entry and its links.

And it's true that the anti-vivisection group PETA was the source of at least some of these falsehoods. However, a quick web search doesn't turn up any evidence that the writers Isabel Oakeshott and Chris Gourlay are spear-carriers for PETA, or for gay-rights groups either. (It does turn up at least one earlier case where Oakeshott has been accused of making things up, but the victims were conservative politicians rather than biologists.)

But even if ideology motivated the writers, it seems surprising that they (and their editors) would be willing to print such plain and easily-refuted whoppers. You'd think that an accomplished journalist who wanted to trash Roselli's work would have taken the time to craft a slanted article that didn't actually tell any out-and-out lies. And there doesn't seem to have been any external reason for deadline pressure getting in the way of this, since the research in question was published in June of 2006, and PETA's complaints about it were published (and debunked) last August.

The Next Hurrah asks

Is this just journalistic sloppiness? If they had googled "charles roselli" they would have seen my post debunking these claims, fifth hit from the top. If the reporters had read Roselli's papers, they would have known they got the conclusions wrong.

It's certainly puzzling. Oakeshott and Gourlay apparently never bothered to read Roselli's article (C.E. Roselli et al., "The effect of aromatase inhibition on the sexual differentiation of the sheep brain", Endocrine 29(3) 501-11, 2006); and it appears they didn't do any significant web research -- or at least they didn't pay any serious attention to what they found.

But on balance, it doesn't seem likely to me that their Times article was "a deliberate political hit job". Rather, it seems to have been one of the modern "bible stories" that are published so often these days in the guise of science journalism.

You start with a grabby narrative with mythic resonances -- here it's one about scientists using neurosurgery and hormone injections into the brain to "cure" homosexuality, testing their techniques by cruel experimentation on cute little sheep. This can come from a publicist's press release, or a story circulating on the dinner-party circuit, or an author on a book tour, or a catchy tale from anywhere at all -- this one seems to have come from the PETA web site and from the anger of Martina Navratilova and other gay-right activists at what they perceived as anti-gay science. (According to Michael Grew, "Gay sheep experiments outrage campaigners", Pink News 1/2/2007, Navratilova wrote a letter of protest to Roselli's university in November.)

Then you add journalists and editors eager to create some buzz. The fairy tale about fixing gay rams with brain surgery is definitely buzz-worthy, sure to rile up the anti-vivisectionists and the gay rights activists, and maybe the religious right too. Does it have any correspondence whatsoever to the facts of the world or even to the claims of the research? Who cares? Not the reporters or their editors, apparently. So they bang it out in the form that's appropriate for their medium, and we're off.

But it's important to note that these people are not lying, exactly. They simply don't care one way or another about what the facts are, and this shifts their work out of the category of lies and into the category for which Harry Frankfurt has suggested the technical term bullshit:

What bullshit essentially misrepresents is neither the state of affairs to which it refers nor the beliefs of the speaker concerning that state of affairs. Those are what lies misrepresent, by virtue of being false. Since bullshit need not be false, it differs from lies in its misrepresentational intent. The bullshitter may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to.

This is the crux of the distinction between him and the liar. Both he and the liar represent themselves falsely as endeavoring to communicate the truth. The success of each depends upon deceiving us about that. But the fact about himself that the liar hides is that he is attempting to lead us away from a correct apprehension of reality; we are not to know that he wants us to believe something he supposes to be false. The fact about himself that the bullshitter hides, on the other hand, is that the truth-values of his statements are of no central interest to him; what we are not to understand is that his intention is neither to report the truth nor to conceal it. This does not mean that his speech is anarchically impulsive, but that the motive guiding and controlling it is unconcerned with how the things about which he speaks truly are.

Timothy Noah asked "Why should bullshit be so prevalent now?" and answered

The obvious answer is the communications revolution. Cable television and the Internet have created an unending demand for information, and there simply isn't enough truth to go around. So, we get bullshit instead. Indeed, there are some troubling signs that the consumer has come to prefer bullshit. In choosing guests to appear on cable news, bookers will almost always choose a glib ignoramus over an expert who can't talk in clipped sentences.

But my own guess is that the desire to create a buzz, regardless of the facts, has always been strong among journalists, and is only kept in check by a concern to avoid a high probability of significant damage to individual and corporate reputations -- or bank accounts. As Tim Jackson recently observed on the BBC's World Service,

The only time that a- that a journalist, whether it's television or radio or newspaper uh tends to actually be subjected to really detailed scrutiny of what he or she is doing is if there's a court case. But I believe in a growing trend this ultimate nightmare is actually going to become an everyday reality for journalists around the world. The oddity is, and what I think what the newspapers fail to grasp, is that something has changed in the world of journalism.

Indeed. Thanks to the democratization of media by the internet, a much larger fraction of journalistic bullshit is effectively challenged in the court of public opinion.

In other words: it's not that there's more bullshit, there's just more bullshit detection.

This doesn't usually lead to libel judgments, but it sometimes affects careers, as Dan Rather and Eason Jordan can testify. It remains to be seen whether this will lead to a different ethos among journalists. One sign to look for: will writing and publishing a story full of obvious and blatant falsehoods have any impact on the careers of Isabel Oakeshott, Chris Gourlay and their editors at The Sunday Times? My bet, alas, is on "no".

The media may not enforce accountability unless blatant falsehoods are printed about powerful people, or travel vouchers are falsified, or one newsroom faction wins out over another. But there's still a cumulative effect on public opinion. A generation of young intellectuals is gradually learning the lesson that everything they read and hear is likely to be bullshit, even when it comes from sources like The Sunday Times or CBS News. This is a bad thing for society at large, but it should be especially bad for the (employees and stockholders of the) news media. So if the economists are right about rational choice, you'd expect sooner or later to see some news sources that claim to tell the truth, and put real effort into ensuring that the claim is not bullshit.

[Update: Ben Goldacre has an excellent piece on this case in the Guardian: "Gay sheep? Let's get the facts straight", 1/13/2007.]

Posted by Mark Liberman at 07:18 AM

January 09, 2007

Dilbert becomes a sales engineer

Doth any man doubt, that if there were taken out of men's minds vain opinions, flattering hopes, false valuations, imaginations as one would, and the like, but it would leave the minds of a number of men poor shrunken things, full of melancholy and indisposition, and unpleasing to themselves?

Posted by Mark Liberman at 01:26 PM

The syntonic phonetics of Pynchon's pitchuhv

At 1,085 pages, Thomas Pynchon's Against the Day reminded one Amazon reviewer of Ambrose Bierce's comment: "The covers of this book are too far apart".

The situation is worse than a simple page count suggests, because every aspect of every event in the book is tied to to a burgeoning tangle of associations. And the connections within the book are bad enough, but Pynchon's works are set in real places and times, and web search makes it easy to follow his hints off into one intellectual thicket after another, as you can see from the links at the bottom of this page.

I've been making a little list of bloggably linguistic passages as I make my way through Against the Day -- the tally is up to to 28 items now -- but I've found it hard to turn any of these into blog posts in the time that I can devote to the task. Under the influence of his infectiously divergent style, my Pynchon posts tend to overflow the breakfast hour.

This post -- an attempt to explain why Pynchon rendered picture as "pitchuhv" in the speech of a Hell's Kitchen street urchin circa 1900 -- took me two full breakfasts and part of a third one, and it's only seeing the light of day now because I decided that enough is enough, or (you may well think) too much.

Turn to page 397, if you're following along at home. We're back with the Chums of Chance, last seen on Language Log as their hydrogen skyship Inconvenience approached the World's Columbian Exposition of 1893 in Chicago. Several adventures later, Pynchon opens a chapter by reminding us that his world is an artificial wilderness of cryptic correspondences, like Baudelaire's forest of symbols, but mathematically amplified and more than a bit more menacing:

In New York for a few weeks of ground-leave, the boys had set up camp in Central Park. From time to time, messages arrived from Hierarchy via the usual pigeons and spiritualists, rocks through windows, blindfolded couriers reciting from memory, undersea cable, overland telegraph wire, lately the syntonic wireless, and signed, when at all, only with a carefully cryptic number -- that being as nigh as any of them had ever approached, or ever would, to whatever pyramid of offices might be towering in the mists above.

So what is "the syntonic wireless", I wondered? A web search turns up this item from Electrical Review, June 29, 1901:

After the reading of Mr. Marconi's paper, which was published in full in the ELECTRICAL REVIEW for June 15 and 22, before the Society of Arts, in London, Professor W. E. Ayrton being in the chair, the following discussion took place...

The chairman: Although still far away, he thought they were gradually coming within thinkable distance of the realization of a prophecy he had ventured to make four years before, of a time when if a person wanted to call to a friend he knew not where, he would call in a loud, electromagnetic voice, heard by him who had the electromagnetic ear, silent to him who had it not. "Where are you?" he would say. A small reply would come, "I am at the bottom of a coal mine, or crossing the Andes, or in the middle of the Pacific." Or, perhaps, in spite of all the calling, no reply would come, and the person would then know that his friend was dead. Let them think of what that meant, of the calling which went on every day from room to room of a house, and then think of that calling extending from pole to pole; not a noisy babble, but a call audible to him who wanted to hear and absolutely silent to him who did not, it was almost like dreamland and ghostland, not the ghostland of the heated imagination cultivated by the Psychical Society, but a real communication from a distance based on true physical laws. On seeing the young faces of so many present he was filled with green envy that they, and not he, might very likely live to see the fulfillment of his prophecy.

And the second hit was Oliver Lodge and Alex Muirhead, "Syntonic Wireless Telegraphy; With Specimens of Large-Scale Measurements", Proceedings of the Royal Society of London, 82(554) 227-256, 1909, which starts this way:

The absence of effective tuning is one of the marked features of wireless telegraphy as at present usually conducted in practice.

In many cases, messages are disentangled from a crowd of superposed disturbances, i.e. from other messages, largely by the skill of the receiving telegraphic operator, who, by the exercise of selective attention, manages to interpret and read what is intended for him; the process being identical with the ordinary human faculty wehreby a conversation can be listened to amid general talking and a crowd of other noises at a dinner table.

The OED glosses syntonic a.2 as

1. Electr. Denoting a system of wireless telegraphy in which the transmitting and receiving instruments are accurately ‘tuned’ or adjusted so that the latter responds only to vibrations of the frequency of those emitted by the former; also said of the instruments so ‘tuned’.

2. Psychiatry. Denoting the responsive, lively type of temperament which is liable to manic-depressive psychosis.

And syntonic a.1 antedates both the wireless and the psychiatric usages. The "syntonic comma", also known as the "comma of Didymus", is the difference between the major third created by four intervals of a perfect fifth (e.g. C,G,D,A,E) and the just interval of a major third, corresponding to the ratio 5/4. In decimal terms, 5/4 is obviously 1.25. A fifth in just intonation is the ratio 3/2. (3/2)4 = 81/16, which is two octaves (64/16) plus an interval 81/64, or 1.265625. The "syntonic comma" is (81/64)/(5/4) = 1.0125, which is about a quarter of a semitone. (An equally-tempered third -- 24/12 , or about 1.259921 -- roughly splits the difference.)

The ancients saw the syntonic comma as one of several troubling flaws in the design of the universe, another being the irrationality of the diagonal of a unit square. Though for some of the Pythagoreans, such things hinted at hidden messages...

OK, back to Central Park, and the more specific message that arrives from Hierarchy:

One midnight, with the usual absence of ceremony, a street-Arab in a stiff hat and a variety of tattoos appeared and with an ingratiating leer handed over a grease-stained envelope. "Here you go, my good lad," Lindsay dropping a silver coin into the messenger's hand.

"'Ey'! Whut's 'is? some koindt of a sailboat pitchuhv on it! whuh country's dis from, I eeask yiz?"

The silver coin in question is a 50-cent piece commemorating the Columbian Exposition of 1893. In the back-and-forth that follows, the urchin mentions "the time machine" in a way that makes Chick Counterfly suspicious:

"We must talk about this further. Where can we find you?"

"Evvrands to vrun vroight now. So I'll be back." Before Chick could protest, the impertinent nuncio had vanished into the sylvan surroundings.

The impertinent nuncio's name turns out to be "Plug" Loafsley, and some of Pynchon's readers may be baffled by the orthographic phonetics of Plug's "pictuhv", "evvrands" and "vrun", even if they are familiar with the New York dialect that he speaks. I was certainly taken aback, until I tried pronouncing them to myself a few times. The confusion arises because the feature in question is not very often caricatured in print, and Pynchon's notation is unusual and perhaps confusing.

The sound in question (I think) is not the voiced labiodental fricative usually represented in English by the letter 'v'. Rather, it's a bilabial approximant, sometimes close enough to create the turbulent flow that makes for a bilabial fricative, for which the International Phonetic Association recommends the symbol [β].

So I think that when Plug said "picture on", which in a more standard and formal American (though still one without syllable-final /r/) might have been [ˈpɪk.tʃə.ɹən], he said something like [ˈpɪ.tʃə.βən] instead.

The simplification of the medial cluster by omitting the [k] closure is common to all forms of the language -- even radio announcers do it except in unually careful pronunciations. Thus notating it (as is commonly done in spellings like "pitcher" for "picture") is an example of "eye dialect", in the sense of "unusual spellings for perfectly ordinary pronunciations, functioning to suggest that the speaker is uneducated or crude", as Arnold Zwicky put it a couple of months ago.

But what about the substitution of a bilabial fricative for /r/? Where does that come from? Given Pynchon's fondness for weaving science into his stories, it seems only fitting to explain further.

American English /r/ is an unusually variable consonant. In an earlier Language Log post, David Beaver reproduced a figure from M. Tiede, C. Holland and K. Choe, "A new taxonomy of American English /r/ using MRI and ultrasound", JASA 115(5) 2633-2634, 2004.

What ties these diverse articulations together, as I have always understood it at least, is the desire to lower the frequency of the third resonance of the vocal tract (the "third formant" or, colloquially, "F3").

My old lecture notes for Linguistics 520, Introduction to Phonetics, included a module on the "Qualitative Theory of Formant Values" (this version is from the fall term of 1998). We consider the standing waves corresponding to the resonances of

a uniform tube, significantly longer than it is wide, and closed at one end while open at the other. If we restrict our attention to frequencies substantially below those whose wavelength is (twice) the radius of the tube, then we need to consider only the standing waves parallel to the long axis of the tube. These longitudinal standing waves will be perfectly sinusoidal in shape, since the tube is uniform in cross-section. They will have a (velocity) node at the closed end (since the particles immediately adjacent to the boundary are not free to move). Equivalently, there will be a pressure antinode at the closed end, since the hard boundary permits the pressure to vary. At the open end, there will be a velocity antinode (since the opening permits the air to move freely back and forth), or equivalently, a pressure node (since there is nothing for the pressure to ``push against'').

There are infinitely many sinusoids consistent with these boundary conditions in a tube of length L, having wavelengths 4L/1, 4L/3, 4L/5, ..., and frequencies C/4L, 3C/4L, 5C/4L [where C is the speed of sound] ...

Here's a figure showing the (particle velocity) standing waves for three formants:

And you can often predict (or at least understand) the resonance effects of vocal-tract changes by using the following rule of thumb:

Constriction at a velocity node of a standing wave raises the frequency of this standing wave. Expansion at a velocity node lowers the corresponding frequency.

Constriction at a velocity antinode of a standing wave lowers the frequency of this standing wave. Expansion at a velocity antinode raises the corresponding frequency.

What the various American /r/-articulations have in common is constrictions at two or three of the third-formant antinodes -- that is, constrictions that lower the third resonance of the vocal tract. Here's one of the examples from Tiede et al. 2004, with the three constrictions indicated:

And here's an example of the lowering of F3 (and F1 and F2 as well) that signals an /r/ in American English. This one is Louann Brizendine saying "in utero", during an NPR interview a few weeks ago (for details, see "The spread of bogus numbers in the meme pool", 12/16/2006). I've traced the third formant in the spectrogram

The tongue articulations for /r/ shown in Tiede et al. are extremely diverse, but everybody seems to retain the lip rounding.

Now, it's common for children and some adults to pronounce /r/ in a way that retains the lip rounding and some back-of-the-tongue constriction as well, but produces something more like the labiovelar approximant [w] than [ɹ]. Popular culture offers Elmer Fudd's characteristic references to Bugs as that "wascally wabbit". There's a whole product line ("The Entire World of R") devoted to speech therapy for this problem.

Maybe Plug Loafsley just has an individual speech disorder. But over the years, I've known several people from New York City (and Brooklyn more specifically) who produced /r/ in syllable-initial or intervocalic contexts as a very closely-articulated bilabial approximant, with variable amounts of rhotic flavor mixed in. This, I think, is what Pynchon is trying to represent by writing "Evvrands to vrun vroight now."

I don't have any recordings at hand, and I don't know of any studies of this phenomenon -- if you can contribute a sound clip or a citation, let me know.

Some other Pynchon posts:

"Doing the Kenosha Kid" (7/30/2004)
"How alphabetic is the nature of molecules" (9/27/2004)
"Birlashdirilmish yangi Turk alifbesi" (9/27/2004)
"Prescriptivism in literature" (11/26/2006)
"Crimson = worm?" (12/8/2006)
"Rinehart" (12/9/2006)

[Update -- Adam Braff reports that "back when I was studying phonology at Brown, this pronunciation was thought to be a Rhode Island thing", and points to the rendition of "Rhode Island" as "Vo Dilun" by the columnists Philippe and Jorge in the Providence Phoenix, e.g. in this year-end summary of 2006, which explains the outcome of a local political contest this way:

“Laughing Boy” Carcieri prevailed over “Munster Head” Fogarty. The Don’s superior PR skills and ebullient personality do not make up for how his values are those of wealthy and powerful corporate Republicans, not those of working class Vo Dilun.

Two other examples from the same source:

The year's biggest non-revelation had to be a major insurance company's report that Vo Dilun has the worst drivers in the nation.


Ah-Leen [recently fired local radio personality, Arlene Violet] knows Vo Dilun as well as anyone. She has that local accent that sounds like broken glass, and when any big legal situation emerges, such as the Bud-I's racketeering case or the Station fire, she was the go-to gal for a take on the legal issues. We now have the functional moron Sean Hannity, a Dubya butt-boy from Fox News, taking Ms. Violet's afternoon drive-time slot. Boy, there's a genius move.

Topic for another post: local columnists using non-obvious eye-dialect versions of local place names, like Fluffya for Philadelphia. ]

[Update 1/9/2007 -- Eric Christopherson writes:

Are you sure Pynchon's spelling doesn't represent a *labiodental* approximant? According to (the ever incontrovertible) Wikipedia, it occurs for [r] in some British speakers, as well as speakers in Boston and New York City. I don't know what the Rhode Island sound is, but I seem to remember it sounding more labiodental when a former employer of mine, who was from there, said it.

As for "picture," I didn't realize the /k/-less pronunciation was so prevalent. I tend to say /k/, but then I also pronounce an /l/ in words like "calm."

My memory of the sound is more bilabial, but memory-based phonetics is a dubious enterprise all around. Does anyone have a video with a close-up of the lower face? ]

Posted by Mark Liberman at 07:22 AM

January 08, 2007

Brand Name Homogeneity

The latest incident in the Québec language wars is an outcry against Imperial Oil's plan to change the name of the convenience stores at its 54 Esso gas stations from Marché Express to On the Run. Their stated goal is to have all of their convenience stores go by the same name "in North America". Francophones are objecting to this as yet another bit of erosion of the use of French.

The funny thing here is that the gas stations themselves do not go by the same name. Imperial Oil is the Canadian subsidiary of ExxonMobil, whose gas stations in the United States use the brands Exxon and Mobil. ExxonMobil has both Exxon stations and Mobil stations because it is the result of a 1999 merger between Exxon and Mobil. Prior to the merger, Exxon had adopted the name Exxon in order to unify the several brands it was using, in part due to previous mergers, but also because it was prohibited from using the Esso brand in some states due to the 1911 Standard Oil antitrust agreement.

Marketing is something that makes absolutely no sense to me so perhaps this is obvious to people who understand such things. While I doubt that changing the names of these convenience stores will have a significant impact on the French language, at the same time I am hard put to imagine why Imperial Oil thinks that whether they use the same name or different names for their convenience stores will make a difference to their bottom line. After all, the convenience stores are part of the gas stations, and the gas stations within Canada all have prominent Esso signs.

Posted by Bill Poser at 04:40 PM

Linguistic Incompetence in the US Government

One of the less publicized observations of the Iraq Study Group is that, of the 1,000 employees of the US Embassy to Iraq, only 33 speak Arabic at all, only SIX fluently. As Roger pointed out a couple months ago, the situation at the FBI is no better. The mind boggles.

Posted by Bill Poser at 03:50 PM

Marxist quotation

Ok, you can stop sending me messages about the Marx Bothers.  There was an allusion I just didn't get.  This was back in what I thought of as a trivial piece on who(m) referring to decidedly non-human entities, in Slavoj Zizek's imagined attitude of former Iraqi minister Muhammad Said al-Sahhaf:

There was something refreshingly liberating about his interventions, which displayed a striving to be liberated from the hold of facts and thus of the need to spin away their unpleasant aspects: his stance was, "Whom do you believe, your eyes or my words?"

This was, surely intentionally, a Marxist quotation.

Mae Sander, Zev Handel, and Dave McDougall got the Marx connection, and Seth Finkelstein cited a Richard Pryor version of it.

The primary ancestor is the line from the Marx Brothers' movie Duck Soup, usually reported as:

Who are you going to believe, me or your own eyes?

This is widely attributed to Groucho Marx, which is close but not quite right: as the wikiquote page for Groucho tells us, the line was actually spoken by Chico Marx -- though at the time he was dressed up as Groucho, so any confusion would be entirely understandable.  In any case, John Baker has now directed me to Fred Shapiro's excellent Yale Book of Quotations, where on p. 497 you can find what Chico actually said, without the auxiliary are and with gonna instead of going to:

Who you gonna believe, me or your own eyes?

I have seen Duck Soup many times, but this line seems not to have stuck in my memory, so I didn't hear any resonance of it in Zizek's imagined quote.

At some point, the Chico quote has been "improved" by making explicit the claim that the addressee's perceptions are deceived:

Who are you going to / gonna believe, me or your lying eyes?

It's often quoted in this version.

The Zizek version deviates from the original in several other directions. First, there's the accusative whom, which strikes me as just wrong.  My correspondents have suggested several sources for the accusative: that Zizek wrote who but an editor "corrected" it to whom; that Zizek himself wrote whom as the "correct" form; or that Zizek chose whom because it was the sort of thing that Said al-Sahhaf might have said.

Next, my words instead of me.  This really doesn't work for me, as I said in my previous posting.

Then, your own eyes simplified to your eyes, probably for parallelism with my words.

Finally, reversing the 1st and 2nd person expressions.  Where Chico had 1st before 2nd (me or your own eyes), Zizek's version of Said al-Sahhaf has the reverse (your eyes or my words).  Chico put the reference to himself first, and he put the shorter, lighter expression (me) before the longer, heavier one (your own eyes) -- both good moves.  Merely reversing these (your (own) eyes or me) isn't nearly as good, though my words for me improves the prosody (but see above).

In any case, the result is inept, and fairly far from the original.  Now Zizek -- whose name is properly Žižek, with two hačeks -- is certainly well acquainted with popular culture (check out his website and the Wikipedia page), so my guess would be that he intended this version to be a reflection on Said al-Sahhaf, using an allusion supplied by Zizek himself.

I'll send him e-mail.

[Further complexity: Dave McDougall notes that Zizek has used this exact quote on other occasions (and seems to be the only person to use this version), attributing it to Groucho Marx in a Marx Brothers film (not identified).  But at least once, in a discussion of opera, he refers to "the well-known joke from the 18th century comedy, when a wife caught by her husband in bed with a lover denies the obvious and adds: 'Whom do you believe, your eyes or my words?' "]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:02 PM

Linguistic incompetence in Britain

Back on 16 December the Economist took up not only the three new official languages of the EU (reported on here), but also the monolingualism of Britons.  There was a leader (U.S.: editorial), on p. 14, and a news story, on pp. 55-6, about how Britons are "God's worst linguists": "just 30% of Britons can converse in a language other than their own (only Hungarians did worse)", and fewer and fewer young people are studying languages in school.  (Why bother?  Everyone else is learning English!)

We chatted about these pieces over sherry in the Writers' Lounge at Language Log Plaza at the time, but the story didn't seem significant enough to merit blogging.  Even in Europe, the Hungarians (are said to) do worse than the Britons (though only by one percentage point), and surely Americans and Australians would give the Brits a run for their money in the foreign language incompetence sweepstakes.  (Actually, that 30% figure seems high.   The figures are from a European Commission report of 2005, based on a questionnaire administered to 29,328 people, and they represent the percentages of people in each member state who "assert that they can speak at least one other language than their mother tongue at the level of being able to have a conversation."  We might expect such self-reports to be a bit, well, optimistic.)

Now there's a new twist: are Britons competent even in their own language?

In a letter in the 6 January issue of the Economist (p. 14), Jason Smith of London raises the issue:

With regard to your leader lamenting the willingness [I would have said "unwillingness"] of the British to learn Johnny Foreigner's native tongue, perhaps you could turn your attention to persuading Britons to master their own language first...  I recently received a marketing leaflet advising me: "Dont wait for new year sales when there in stock now".

Yes, it's about SPELLING (including punctuation and capitalization).  I was prepared to see a rant about non-standard grammar or the innovations of the young or even the appalling pronunciations of Estuary English, but instead -- in response to pieces about people's ability to CONVERSE in languages other than their mother tongue -- we get a hell-in-a-handbasket letter about spelling.

English orthography is cunningly mined with traps for the unwary, and ordinary people writing in English have never been particularly good at avoiding the traps.*

[* Bob Yates addresses the issue in e-mail to me:

Whenever I read statements that "things" are sliding downhill, I remind myself of this petition written by [a] white woman in Miller County, Georgie in September, 1863 to Jefferson Davis.  Fortunately, Williams, who says this was typical of such petitions from women written to Davis, did not fix the spellings.
Our crops is limited and so short [that we] cannot reach the first day of march next. . . . But little [illegible] of any sort to Rescue us and our children from a unanumus starveation. . . . We can seldom find [bacon] for non has got But those that are exzempt from service . . . and they have no humane feeling nor patreotic prinsables in thare harts. . . . they care not ef all the South and its effort fail and sink so they swim. . . An allwise god who is slow to anger and full of grace . . . will send down his forty and judgement in a very grate manar [on] all those our leading men and those that are in pwere if thare is no more favors show to those the mothers and wives and of those hwo in poverty has with patrootism stood the fence of Battles. . . I tell you that with out some grate and speadly alternating in the conduckting of afares in this our little nation god will frown on it and that speadly.

Source: Williams, David. (1998). Rich Man's War. Athens: University of Georgia Press. (p. 113-4)

This is not some email exchange. This is a text written to the leader of [the] country. ]

In addition, spelling ability is at best weakly connected to measures of verbal facility, intelligence, and the like.  Being "good at" spelling in English (note: in ENGLISH) is one of those odd language-related talents like being good at Double-Crostics or being able to talk ad lib in iambic pentameter or being able to rap.  If you're not good at spelling, you'll have to find a way to work around that in circumstances where spelling is important, but you shouldn't let spelling problems prevent you from writing; after all, some spectacularly bad spellers are very good writers indeed.

On the other hand, if you can't frame what you want to say in ways that your intended audience will be able to understand without hard work, you're in trouble.  If this is in your mother tongue, you have some sort of disability.  I've spent a fair amount of time with people with such disabilities, and it's distressing all around.  (Maybe I'll be able to post about this, maybe not.  The case I know the most about is my partner of 26 years, now dead.)

Two notes: a claim about the ability to hold a conversation in a language other than your mother tongue got turned into a rant about writing, in fact about the mechanics of writing, thus trivializing a serious issue; and all deviations from correctness, all kinds of "incompetence", are treated as equivalent, thus elevating poor spelling to a significance it doesn't deserve.  Why, someone who would spell they're as there might be capable of anything!

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:12 PM

Accusatory headline

It's been a very long time since we've written about the way newspapers report on people suspected of, alleged to have committed, accused of, or charged with criminal behavior of one kind or another.  The usual situation is that a paper, in an effort to avoid asserting guilt, ends up writing that "the unidentified suspect fled the scene" or "so-and-so was charged with allegedly embezzling $10,000" or the like.  But things can go wrong in the other direction as well.

Here's the Palo Alto Daily News of 6 January, with this headline on p. 4:

not guilty

Oh no, this really won't do.  Describing Marco Antonio Carlos as a "shooter" is definitely over the line; it asserts that he shot someone (a teenager from Newark, CA), but that's exactly the sort of claim of guilt that newspapers usually want to avoid. The paper could have referred to "shooting suspect" instead of "redwood shooter", but, I'd speculate, in the heat of getting the paper out, it missed the problem.  Headline writing is a hell of a job.

Now, a footnote about the multiple ambiguity of "Redwood shooter".  Almost any expression plucked from its context is multiply ambiguous, but headlines (given that they are so compressed) are especially likely to invite multiple interpretations.

So: a "Redwood shooter" could be a person who shoots (into) redwoods with some sort of weapon; a person who shoots redwood, or redwoods, out of some sort of weapon (a cannon, perhaps); a weapon for shooting redwoods, in either sense of "shoooting redwoods"; or a person who does shooting in redwood forests.  Or, in this case, given that the headline is in a Palo Alto paper, a shooter from Redwood City CA.  Context is important: this abbreviation might not work elsewhere, even in the Bay Area, and it certainly wouldn't work outside of it -- in Atlanta, say.

(Please don't send me other interpretations.  There probably are a zillion of them.  I'm still trying to cope with people complaining that I didn't get ALL of the possible interpretations of "We saw her duck" in my last posting on headlines, linked to above.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:43 AM

AP bollixes WOTY coverage (again)

A year ago, when the American Dialect Society selected truthiness as the 2005 Word of the Year, the Associated Press article on the vote managed to omit any mention of the word's vociferous booster, Stephen Colbert of "The Colbert Report." At the time I predicted that the snub would "serve as more fodder for Colbert's put-upon persona of perpetual outrage." Sure enough, Colbert waged a splenetic assault on the AP (with wordanista Michael Adams caught in the crossfire). You would think the folks at the AP might have learned their lesson this time around, making sure that their '06 WOTY reporting was error-free. No such luck.

Here's how the AP article appears in dozens of news outlets both in the US and overseas (such as ABC News, CBS News, FOX News, the San Francisco Chronicle, the Miami Herald, the Houston Chronicle, the Toronto Globe & Mail, the Guardian, the International Herald Tribune, etc., etc.):

'Plutoed' Chosen As '06 Word of the Year
ANAHEIM, Calif. (AP) -- Pluto is finally getting some respect - not from astronomers, but from wordsmiths.
"Plutoed" was chosen 2006's Word of the Year by the American Dialect Society at its annual meeting Friday.
To "pluto" is "to demote or devalue someone or something," much like what happened to the former planet last year when the General Assembly of the International Astronomical Union decided Pluto didn't meet its definition of a planet.
"Our members believe the great emotional reaction of the public to the demotion of Pluto shows the importance of Pluto as a name," said society president Cleveland Evans. "We may no longer believe in the Roman god Pluto, but we still have a sense of personal connection with the former planet."

Language Log readers might recognize that quote from Cleveland Evans, which I cribbed from the the ADS press release for my post about the news from Anaheim. And the more eagle-eyed might also recall that this quote was not actually WOTY-related: it was about the selection of Pluto as Name of the Year by the members of the American Name Society, rather than the ADS vote for the verb pluto as Word of the Year. Indeed, Cleveland Evans is president of the ANS, not the ADS, as the press release makes clear. (For the record, the incoming ADS president is Bill Kretzschmar, taking over for Joan Hall.)

Since the ADS hosted the ANS vote this year, and the two winners ended up being similarly Plutonic, it's easy to see how the AP slipped up. Still, it would only require a quick double-check of the press release (or, say, a phone call or email to someone involved with the selection) for the reporter to get the relevant facts straight. It's enough to make members of these two fine scholarly associations feel devalued... plutoed, even.

Posted by Benjamin Zimmer at 01:04 AM

January 07, 2007

Ultimate avoidance

We continue the comic-strip theme with one about the ultimate in taboo avoidance:

(Thanks to Kevin Hayes for the pointer to Ruben Bolling's Tom the Dancing Bug.)

In case you're curious about slang words using one letter, there's a webpage about them.  But you can collect more instances, not on this webpage, by searching on R-word, T-word, D-word, etc.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:09 PM

Comic language

In the comics back on 28 December, we read about Griffwords, expressions dear to the cartoon character Griffy (cartoonist Bill Griffith's version of himself) in Zippy.  Echoes of old comic strips:

Some notes, most suggested by postings to the American Dialect Society mailing list:

According to this website, Nov shmoz ka pop? comes from Gene Ahern's strip The Squirrel Cage:

The protagonist of The Squirrel Cage (which was first seen on Sunday, June 21, 1936) was a little hitchhiker referred to only as "The Little Hitchhiker". He had a long, white beard, wore an enormous tam on his head, and covered his body with a black smock or overcoat (the beard got in the way of knowing for sure)...

The little hitchhiker would get into strange little adventures by standing beside the road, his thumb out, uttering phrases in some incomprehensible language, a few of which were translated but most not. The most frequent of them, "Nov shmoz ka pop?" (never translated), etched itself into the American consciousness to the point where, to this very day, many people still wonder how that silly thing ever got into their heads.

Notary Sojac were the words on a sign that hung on the wall in the Smokey Stover strip (1935-1973).  From the Wikipedia page:

Smokey Stover was a comic strip written and drawn by Bill Holman from March 10, 1935 until he retired in 1973. It was distributed through the Chicago Tribune and was the longest lasting of the comic strips of the "screwball comedy" genre.

The strip featured Smokey the firefighter, in his two-wheeled fire truck called "The Foomobile", fire chief Cash U. Nutt, and his wife Cookie, with her question-mark pompadour. Odd bits of philosophy, and recurrent signs carrying bizarre phrases such as "Notary Sojac" and "1506 Nix Nix" were featured in the strip. (Holman described the phrase "Notary Sojac" as Gaelic for "horsecrap" and as Gaelic for "Merry Christmas".)

"Foo" was one of these recurring nonsense words and was taken up by World War II's "Foo Fighters". Foo may have been inspired by the French word for fire, feu, but Holman never gave a straight answer as to the origin.

Nize baby comes from Milt Gross's Gross Exaggerations

"Gross Exaggerations" began as an illustrated column in the New York World. What made it unique, besides Gross' homespun drawing style, was the use of phonetic dialect in the dialogue. The dialect was based on that of Jewish immigrants who were struggling to make themselves understood in a new language.

"Hollo! Hoperator! Hollo! Who's dere by de shvitzbud? I vant Haudabon--hate--vun--ho--fife. Hate! HATE! Vun, two, tree, fur, fife, seex, savan, HATE!"

The column featured the dialogues between stereotypical Jewish mothers conversing out the windows of their tenement. First Floor and Second Floor were the indications of who was speaking, with an occasional interjection from Third Floor. On the Fourth Floor, there's a baby. So not only were the columns about life in New York, they occasionally strayed into what could only be considered Fractured Fairy Tales told to entertain the "nize baby." One might be "Nize ferry-tail from Elledin witt de wanderful lemp", another "from Jack witt de binn stuck."...

Nize Baby was published in book form in 1926 to immediate success.

Banana Oil was another Milt Gross strip, also from the 20s.  Gross might be the source of this expression as slang meaning 'nonsense; insincere or insane talk or behaviour' (OED); the strip did appear before the OED's first cite, from Wodehouse in 1927.

Jeep comes from Elzie Crisler Segal's strip Thimble Theatre (first published in 1919), which eventually became Popeye.  From the Wikipedia page for Popeye:

Other regular characters in the strip were J. Wellington Wimpy, a moocher and a hamburger lover who would "gladly pay you Tuesday for a hamburger today"; George W. Geezil, a local cobbler who speaks in a heavily affected accent and habitually attempted to murder or wish death upon Wimpy; Poopdeck Pappy, Popeye's belligerent and woman-hating father; and Eugene the Jeep, a yellow, vaguely dog-like animal from Africa with magical powers.

According to the OED, the vehicle name jeep comes from "general purpose", "prob. influenced by the name 'Eugene the Jeep', a creature of amazing resource and power, first introduced into the cartoon strip 'Popeye' on 16 March 1936".

Potrzebie comes from Mad magazine.  It even has its own Wikipedia page:

Potrzebie is a Polish word popularized by its non sequitur use as a running gag in the early issues of Mad not long after the comic book began in 1952. The word is pronounced "pot-SCHEB-yeh" in Polish and is a declined form of the noun "potrzeba" (which means "need"), but in "English" it was purportedly pronounced "PAH-tur-zee-bee" or "POT-ra-zee-bee." Its Eastern European feel was a perfect fit for the New York Jewish style of the magazine.

Mad editor Harvey Kurtzman spotted the word printed in the Polish language section of a multi-languaged "Instructions for Use" sheet accompanying a bottle of aspirin, and Kurtzman, who was fascinated with unusual words and Yiddishisms, decided it would make an appropriate but meaningless background gag. After cutting the word out of the instruction sheet, he made copies and used rubber cement to paste "Potrzebie" randomly into the middle of Mad satires.

Not yet in the OED.

Mad was also responsible for axolotl (the name of a salamander-like reptile) as a nonsense reference and ferschlugginer (adapted from Yiddish) as a sort of all-purpose modifier of negative affect.  Ferschlugginer hasn't made it into the OED yet; axolotl, of course, is there, but without a reference to Mad.

So the Griffwords go back 70+ years.  A lot of past to dwell on.

(Thanks to the ADS-L posters on this topic: in alphabetical order, John Baker, Wilson Gray, Larry Horn, Jon Lighter, Alison Murie, and Dennis Preston.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:38 PM

Today's comics crop

There's literature on this:

And as usual, managerial creativity is unappreciated:

Posted by Mark Liberman at 08:53 AM

The basil goes away

Craig Daniel wrote to me a few weeks ago to say:

A local grocery store sells small bags of herbs and spices labelled in English and Spanish. The Spanish labels, however, reveal one small surprise - what is called "basil" in English is, in Spanish, known as "la albahaca se va" - "the basil goes away".

After puzzling over this when I first saw it, I realized it was probably a poor attempt at translating "basil leaves" in which "leaves" was misinterpreted as the verb. I chalked it up to machine translation, probably Babelfish, and left it at that. But the other day a friend of mine discovered that Babelfish translates "basil leaves" as "hojas de albahaca", which is correct.

So I have no idea where the error that led to "basil" being labelled as "the basil goes away" came from.

And I have no idea either. Where did this grocery store get its embarrassingly silly translation from? An ordinary print dictionary?

Posted by Geoffrey K. Pullum at 01:49 AM

January 06, 2007

Benny, call home

This week I got a call from an Illinois lawyer, who requested a copy of an article I wrote over  forty years ago. He  told me that he needed my article for a court case in which his clients were suing a rubber factory because of permanent injuries they allegedly received from using benzene in the process of producing automobile tires. My article was called "Tire Worker Terms," a bit of lexicography concerning the specialized vocabulary of tire workers. Pretty tame text. Hardly the stuff of lawsuits. But it kindled some thoughts about the work linguists do.

Back in those days I wanted to jumpstart my academic career by getting some articles in print so I decided to tap into my earlier grad school days, when I worked the night shift at one of Akron's largest tire companies. With the endorsement and help of the plant manager and other officials, I supplemented my own two-year experience there with additional fieldwork, interviewing dozens of helpful, talkative, and knowledgeable tire workers all over the plant (note: once a fieldworker, always a fieldworker).

My research was published in American Speech (volume 39, number 4, December, 1964). It wasn't much of an article and I haven't had reason to think about it since and, mercifully, neither has anybody else. Until this week, that is.

Then came the phone call. "Of course we used benzene," I told the lawyer. "Why does this matter?" He told me that his clients believe using benzene on the job caused them to contract leukemia. "So why on earth do you want my article?" I asked. His answer: "Because representatives of the rubber company say that benzene, a.k.a. benny, was never used in the production of tires."

I dug out a copy of my article, which listed and defined some 200 terms associated with the tire production process. I thought I could remember "benny" as one of them but after all these years I had to check, just be be certain. Sure enough, in the B section was the following:

BENNY, n. A gasoline mixture used by the tirebuilder (see BUILDER) to clear foreign elements from the surface of the tread or fabric with which he is working and add tack  to the various piles and treads.

I know, there's a "he" in my definition. But please remember that this was 1964, long before we knew any better. As I read this to the lawyer on the phone, he told me that this definition was exactly what he needed to impeach the testimony of the company representative who had strongly denied that benzene ever played a part in the tire building process.

The sobering thing about all this is that linguists often can't predict how our articles and research may turn out to be helpful in ways we never thought about. We find a problem, gather our data, analyze and generalize as much as possible, and write up our results. Then 43 years later we get a phone call.

Neat, huh?

Posted by Roger Shuy at 05:53 PM

Arcana from the cabal

Here in Anaheim at this year's Secret Cabal of the Linguistic Elite, otherwise known as the annual meeting of the Linguistic Society of America, the seers and acolytes are bartering all sorts of fascinating arcana. One that caught my eye was a presentation by Cati Brown, Tony Snodgrass, Michael Covington, Susan Kemper and Ruth Herman, "Measuring propositional idea density through part-of-speech tagging". Their abstract:

We present a computer program, CPIDR (Computerized Propositional Idea Density Rater), that measures idea density automatically through part-of-speech tagging. Idea density, the number of propositions per N words, is a useful measure of discourse complexity and of possible cognitive impairment on the part of the speaker. Propositions correspond roughly to verbs, adjectives, adverbs, prepositional phrases, and conjunctions (Snowdon et al. 1996). By counting these parts of speech and then applying readjustment rules for particular syntactic structures, we closely replicate the proposition counts given by the standard Turner & Greene method.

This is part of the CASPR ("Computer Analysis of Speech for Psychological Research") project. And the part about "idea density" being "a useful measure of ... possible cognitive impairment" seems, believe it or not, to be a bit of an understatement.

The research behind this notion comes out of what is colloquially known as the "nun study", summarized in D. A. Snowdon, S. J. Kemper, J. A. Mortimer, L. H. Greiner, D. R. Wekstein and W. R. Markesbery. Linguistic ability in early life and cognitive function and Alzheimer's disease in late life. JAMA Vol. 275 No. 7, February 21, 1996.

Two measures of linguistic ability in early life, idea density and grammatical complexity, were derived from autobiographies written at a mean age of 22 years. Approximately 58 years later, the women who wrote these autobiographies participated in an assessment of cognitive function, and those who subsequently died were evaluated neuropathologically. [...] Cognitive function was investigated in 93 participants who were aged 75 to 95 years at the time of their assessments, and Alzheimer's disease was investigated in the 14 participants who died at 79 to 96 years of age. [...] Low idea density and low grammatical complexity in autobiographies written in early life were associated with low cognitive test scores in late life. Low idea density in early life had stronger and more consistent associations with poor cognitive function than did low grammatical complexity. Among the 14 sisters who died, neuropathologically confirmed Alzheimer's disease was present in all of those with low idea density in early life and in none of those with high idea density.

I discussed this work at greater length a couple of years ago ("Writing style and dementia", 12/3/2004). I'll reprise here what I wrote about the specific numbers behind the autopsy correlations:

According to the study's summary table, the mean "idea density" in early life autobiographies for nuns whose autopsied brains "met neuropathologic criteria for Alzheimer's disease" was 4.9 (95% confidence interval 4.6-5.3), while for nuns whose brains were free of Alzheimer's symptoms, the mean "idea density" was 6.1 (95% confidence interval 5.6-6.6).

This is major wizardry, which I find striking on several levels.

First, it's amazing that quantification of writing style at age 22 works to predict dementia six decades later, and apparently works so well. The N for the brain-autopsy part of the study is not very large (just 14), but the results are still impressive; and the sample of 93 in whom old-age cognitive function was correlated with early-life writing style is reassuring. It would be nice to see results from a much larger epidemiological study. But the nun-study results themselves suggest that "idea density" in writing samples might be the basis of a screening test for Alzheimer's, whose predictive value would compare favorably to many tests that are in common use to screen for other diseases.

Second, the fact that "idea density" worked better than other metrics that the researchers tried, especially "grammatical complexity", is puzzling and therefore interesting.

And third, the fact that this particular way of measuring "idea density" turned out to work so well is puzzling and therefore interesting.

The "idea density" concept comes from Kintsch, W. (1972) "Notes on the structure of semantic memory", in E. Tulving and W. Donaldson (eds) Organization of Memory, pp. 247–308. New York: Academic Press, and Kintsch, W. & J. Keenan. Reading rate and retention as a function of the number of propositions in the base structure of sentences. Cognit. Psychol. 5: 257-274 (1973). As the title of the second paper suggests, this work developed out of early work in transformational grammar, initiated by Noam Chomsky's Syntactic Structures, based on some earlier work by Zellig Harris, and carried forward in the early 1970s under the rubric of "generative semantics". Kintsch et al. interpreted these theories in a particular way in deciding how to count the "ideas" or "propositions" expressed by an English sentence. Other interpretations of these or other theories, before and since, would come out with very different "idea density" counts for the same sentences interpreted in the same way.

For example, this metric treats "the cat ate the rat" as one proposition, while "the cat ate today" is two. That's because a verb and its arguments (e.g. subject and object) are treated as a single proposition, while modifiers such as adverbs and adjectives are treated as adding separate propositions. As I understand it, determiners like "the" or "a" are not counted, nor are plurals or auxiliary verbs. Explicit connectives are counted, but implicit ones generally are not: thus "the cat appeared; the rats scattered" is two propositions, but "when the cat appeared, the rats scattered" is three. Some complex nominal constituents are treated as elementary units -- thus in Snowdon et al. 2000, they give this example of calculating idea density:

The following sentence from an autobiography illustrates the method used to compute idea density: "I was born in Eau Claire, Wis., on May 24, 1913 and was baptized in St. James Church." The ideas (propositions) expressed in this sentence were (1) I was born, (2) born in Eau Claire, Wis., (3) born on May 24, 1913, (4) I was baptized, (5) was baptized in church, (6) was baptized in St. James Church, and (7) I was born...and was baptized. There were 18 words or utterances in that sentence. The idea density for that sentence was 3.9 (i.e., 7 ideas divided by 18 words and multiplied by 10, resulting in 3.9 ideas per 10 words).

There have been many different ways of thinking about how represent the meaning of sentences and discourses, and for each representational theory, there could be many different ways of quantifying its count of elementary parts. I don't think that most semanticists these days would be inclined to make the same choices that Kintsch et al. did, 35 years ago -- for example, it seems odd to say that "St. James" adds an extra proposition so that " in St. James Church" contributes two propositions, while "in Eau Claire, Wis." and "on May 24, 1913" each add just one, ignoring the nominal substructure in those cases.

But maybe today's choices would be worse ones, I don't know. It's hard to argue with success. Then again, maybe a different metric would result in even better clinical prediction. This is an excellent example of why "executable articles" are a good idea -- if the nun-study texts, the details of their "idea density" analyses, and the associated clinical data were available, it would be just a few hours work to compare alternative metrics. This is also a good example of why the idea faces some non-trivial obstacles, since such data is (properly) protected by privacy considerations that would have to be respected by any method for offering research access.

Meanwhile, it's exciting that Cati Brown and the rest of the CASPR people have designed and implemented a program that computes the "idea density" metric automatically -- and according to their poster, "agreed with the group of human raters appreciably better than the raters agreed with each other (r = 0.942, or 0.969 if one outlier is excluded, vs. r ≥ 0.82 for human vs. human)". Not only that, but they list among their planned future work to "Package CPIDR as a shareable software package", and to "Factor propositional idea density into its components (verb density, adjective density, etc.) and determine the neuropsychological relevance of each". I've been a bit puzzled about how little follow-up there seems to have been to the nun study, both in neuroscience and in linguistics; maybe the CASPR work will change that.

Posted by Mark Liberman at 11:10 AM

January 05, 2007

Pluto got plutoed, but it still won WOTY

Breaking news from Anaheim, where the American Dialect Society is holding its annual meeting: the winner of the 2006 Word of the Year vote is (drum roll, please)... plutoed. To pluto, as the ADS press release states, means "to demote or devalue someone or something, as happened to the former planet Pluto when the General Assembly of the International Astronomical Union decided Pluto no longer met its definition of a planet."

Pluto actually received a double accolade in Anaheim. The American Name Society (which, like the ADS, holds its annual meeting in conjunction with the Linguistic Society of America) selected Pluto as its Name of the Year. ANS President Cleveland Evans said, “Our members believe the great emotional reaction of the public to the demotion of Pluto shows the importance of Pluto as a name. We may no longer believe in the Roman god Pluto, but we still have a sense of personal connection with the former planet."

Perhaps it was this "personal connection" that swayed the ADS and ANS voters, who felt compassion for Pluto's sad relegation to "dwarf planet" status. Or perhaps it had something to do with the selections taking place so close to Disneyland.

If you'd like to relive the plutoing of Pluto, check out the extensive Language Log coverage here:

"Gay marriage and counting the planets" (8/17/06)
"The Planet Tlahuizcalpantecuhtli" (8/20/06)
"New planet mnemonic: Language Log is there for you" (8/20/06)
"Piling on 'pluton'" (8/21/06)
"Good-bye plutons, hello Plutonian objects" (8/23/06)
"Iceballs, revisability, language, and intelligent life in the universe" (8/23/06)
"Pluto is a dwarf planet, but not a planet" (8/24/06)
"New planetary definition a 'linguistic catastrophe'!" (8/25/06)
"Dwarf planets and California lilacs" (8/25/06)
"Make Very Excellent Mnemonics: Just Start Using Noggin!" (8/25/06)
"A bad week for the lord of the underworld" (8/28/06)
"Taxonation without representation" (9/14/06)

[Update, 1/8/07: Don't trust the widely circulated Associated Press article on the WOTY vote. Details here.]

Posted by Benjamin Zimmer at 11:07 PM

The specialness of English

Speakers of English writing about their language are likely to trumpet the specialness of the language, in particular its enormous vocabulary.  We've returned repeatedly to the Vocabulary Size trope, most recently in a posting by Geoff Pullum:

Despite the fact that we have virtually no idea of how to measure vocabulary size rigorously and fairly (which is one thing differentiating vocabulary size from penis length), nobody cares: people are prepared (it would seem) to accept imaginary facts about how many words are known by groups of people about whom they know nothing (or about themselves, as with the Payack claims concerning English) as a reliable assay of intelligence level, or even the sophistication level of a whole language or culture, and to accept any kind raving nonsense anyone comes up with by way of vocabulary counting.

English is said to have a humongous vocabulary, as a result of several factors: the combination of Germanic and Romance sources; within the latter, layers of earlier borrowings and later ones, based more directly on Latin (and Greek); and the willingness of English speakers to take in loans from a great variety of languages.  All this is commonplace, though annoying.  Now it's taken to the next level, in Sol Steinmetz and Barbara Ann Kipfer's The Life of Language (2006), on English words.  After a discussion of doublets like legal/loyal, regal/royal, and tradition/treason, Steinmetz and Kipfer conclude:

This is partly why English is the only language that has books of synonyms like Roget's Thesaurus.

Whoa!  English must be REALLY special, with so many words that it needs a special resource to catalogue them.

Background comment: the specialness of English stands along with claims about the specialness of other languages, Japanese and French most famously, but also a number of others; I have had speakers of Persian go on at length about the marvels of their language, including the ease with which it can be learned and its special suitability for poetry.

Another background comment: Roget's is not just a synonym dictionary (though it can be used as one), it's a thesaurus, a conceptual taxonomy (of the furniture of the world and the organization of thought).  It is one of the monuments of such large-scale taxonomies: Bishop John Wilkins's Real Character in the 17th century, the French Encyclopedists in the 18th, Roget in the 19th, and Carl Darling Buck's Dictionary of Selected Synonyms in the 20th.  [Addendum: to which we can now add the developing WordNet, a combination of dictionary and thesaurus organized on a number of dimensions.]  All are organized conceptually, and the last two are designed to supply lists of words for each of the conceptual categories (for English alone in Roget's case, for the Indo-European languages as a group in Buck's case).

Now, on the the central claim: that English is the only language with a resource like Roget's Thesaurus.  You would have thought that someone making such a broad-brush claim would have at least tried to check it out.  It takes only a few moments to find thesauruses and synonym dictionaries for a variety of languages; here, for example, is a review of four such books for Japanese, all in Japanese only.

Meanwhile, for Chinese, Dan Jurafsky points to two popular modern Chinese thesauruses, organized as semantic taxonomies with synonym lists:

Mei Jia-Ju,, Zhu Yi-Ming, Gao Yun-Qi and Yin Hong-Xiang.  1986.  TongYi Ci CiLin [Synonym Dictionary/Thesaurus]. Hong Kong.  Commercial Press.

HOWNET AND THE COMPUTATION OF MEANING (With CD-Rom).  By Zhendong Dong & Qiang Dong (Chinese Academy of Sciences, China) ISBN 981-256-491-8.  (link)   

and Mark Liberman notes that thesaurus-making has a very long history in Chinese, going back to the Erya (links here and here), said to date from the 3rd century B.C. and organized mostly in terms of a semantic taxonomy.

No doubt there are many more examples to be found, especially given the many modern languages with several strata of vocabulary from different sources (Swahili, for example).  Even languages without such obvious stratification in their vocabulary have synonym dictionaries and/or thesauruses; there's a Duden synonyms dictionary for German, for instance.

My purpose here is not to start an inventory of thesauruses and synonym dictionaries -- please don't bombard me with further examples -- but just to show that it takes almost no work to discover that there are languages other than English with such resources, in some cases significantly antedating Roget's project.  Unfortunately, the idea of English as a special case was so powerfully attractive for Steinmetz and Kipfer that they didn't even make an effort.

Semantic taxonomies are very old indeed, usually organized in a kind of outline fashion, though in prose: things are animal, vegetable, or mineral; of the animals, there are the animals of the sky, the animals of the water, and the animals of the land; of the animals of the land, there are those that go on two feet, those that go on four, those that go on six, those that go on eight, and those that creep upon the land; etc.

It's a very natural idea to attach lists of words to the categories at the bottom level.  So it's not really a surprise that people were doing this a couple of millennia ago.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:47 PM

Language Log in the funny papers

We often comment on the comics, so it's nice when it comes back the other way. And this is the first daily comic strip that I can recall seeing with a URL in it. Bunny for 1/4/2007:

Here's a clickable link to the cited page, since Language Log URLs "look like they were hit with an ugly-stick", as someone once remarked, and you don't want to have to type it in by hand.

(I can remember the first time I saw a URL in an advertisement on the side of a bus, a decade ago -- that's when I knew that the internet had really arrived.)

Posted by Mark Liberman at 11:24 AM

Not the WOTY

It's become a New Year's tradition for individuals and organizations to evaluate the past and predict the future. For example, the American Dialect Society will have its annual "Word of the Year" vote this evening here in Anaheim, where I'm attending the co-located meeting of the Linguistic Society of America. (Ben Zimmer invited WOTY nominations just before Christmas -- "ADS WOTY: Make your nominations", 12/24/2006). A less democratic example of New Year's pontification: the BBC World Service's "Culture Shock" program for 1/1/2007. The website sets the haughty tone:

This week on Culture Shock, as we mark the beginning of the new year, Lawrence Pollard is joined by a council of experts who will tell us what 2007 has in store for us.

Our very exclusive line-up consists of a business innovator, a technology visionary and two of the world's leading trend spotters.

Our four prophets will give us a glimpse of the future as they each tell us what will be the defining trend of the next twelve months.

New trends need new terms, of course, and so the four "very exclusive" experts had four new words or phrases to offer. My own prediction: none of these terms are going to make the short list for next year's ADS WOTY. The concepts are all interesting, though not likely to be news to anyone who been paying attention for the past few years. But I selected this particular display of ritual punditry to present to you because one of the identified trends is the changing relationship between the press and the public -- a topic we've touched on from time to time here at Language Log. And the "expert" who identified the "trend" contributed some interesting comments, along with one of the least promising neologisms in recent memory: zombietiming.

Here's my transcript of the expert's summaries at the start of the show:

Lawrence Pollard: Hello, I'm Lawrence Pollard, and welcome to a very special edition of Culture Shock, four leading experts in the fields of technology, business, marketing and trend prediction are here today, to tell us what 2007 has in store for us. Our four prophets will give us a glimpse of the future as they tell us each what they predict will be the defining trend of the next twelve months. And without further delay, let me introduce our council of experts. With me in London is one of our best-known writers on the latest and future developments in technology David Rowan. whom you can also find writing in the pages of the London Times. Very briefly, David, what's your trend for two thousand and seven?
David Rowan: I'm very excited about crowd-sourcing, which is like out-sourcing, but using the power of on-line networks to make things and develop things.
Lawrence Pollard: Crowd-sourcing coming up soon, look forward to that; Tim Jackson is also here in London. Tim is a business entrepreneur and writer, recently selected as one of the one hundred global leaders of tomorrow, at the world economic forum in Davos, no less. In addition, business people in the new economy recently voted him one of the most important people they would like to have in their contact book. He's in our s- he's here, Tim, thanks for coming, again, briefly, what do you think will be shaping two thousand and seven?
Tim Jackson: I think the biggest trend is gonna be something which I call zombietiming, and it's exceptionally bad news for journalists, because it's a trend where people out there on the web will be checking facts of television, radio and newspapers, and calling people to account when they get them wrong.
Lawrence Pollard: So we could all be out of a job. Thank you Tim. We have more from that in a minute; joining us from Helsinki is Anna Moilanen a former executive of Finland's leading advertising company. She now divides her time between predicting the future and working for Artek, an innovative design company Anna, welcome; give us your idea of what new trend will come up in two thousand and seven.
Anna Moilanen: I want to talk about something that I call new utopia, or a journey from ((??)) to soul and what this means in brief is first of all saying no to mediocre dreams, so being courageous enough to have big dreams, and then also that culture and civilization are back in fashion
Lawrence Pollard: A new utopia, coming soon, thank you very much indeed, from Finland, and last but far from least, Mary Meehan joins us from Minneapolis in the United States. Mary is the co-founder and executive vice president of Econoculture one of the world's leading consumer trend research agencies. She also co-authored the book "The future ain't what it used to be, The forty cultural trends transforming your job, your life, your world." That's what this program is about; Mary, thanks for joining us, in a nutshell, what is your big trend for two thousand and seven?
Mary Meehan: I'm looking at niche as the new mainstream; by ignoring consumer groups outside the mainstream, companies are leaving millions of dollars and pesos and pounds on the table, so it's innovative, progressive who are focusing on these niche groups, developing new products and services and gaining new market share and brand value.
Lawrence Pollard: Mary, many thanks indeed.

We now skip 20 minutes or so about crowd-sourcing, the new utopia, and niche as the new mainstream... Listen to it, if you like. (I'll confess again my secret shame: adolescent experience with Monty Python has left me with an ineradicable stereotype of BBC panel discussions. The accent, the format and the moderator's little verbal tics are enough to make me laugh out loud at entirely inappropriate times.)

The program ends with the discussion of Tim Jackson's trend. (To understand the background of the name, you'll need to read the discussion on the website www.zombietime.com of "The Red Cross Ambulance Incident: How the Media Legitimized an Anti-Israel Hoax and Changed the Course of a War", 8/23/2006.)

Lawrence Pollard: And so finally, we come to Tim Jackson for our last big trend of two thousand and seven. Now, Anna's just been telling us how two thousand and seven will become less cynical but uh Tim, uh you're predicting we're going to grow more skeptical; you've got the best name for a trend: "zombie timing" Uh now this goes back, I gather, to the enormous amount of argument in blogging sites and on the internet over the reporting of a rocket attack during the Lebanon war, basically um whether or not a Red Cross ambulance had been hit by an Isla- Israeli rocket, and if it had been reported properly by the news agencies, and so this is a story that just mushroomed, and went on, and your point is that this is going to go on and grow and it's gonna become one of the big things of two thousand and seven.
Tim Jackson: Absolutely. W- we all know that journalists work under tremendous time pressure; like tennis players or policemen or soldiers, they have to find the right balance between doing the best job they can and getting it done in time, finishing it by the deadline. I remember in- uh the first job I had was working as a reporter in Tokyo and my boss would growl at me across the office "don't get it right, get it written!" The only time that a- that a journalist, whether it's television or radio or newspaper uh tends to actually be subjected to really detailed scrutiny of what he or she is doing is if there's a court case. But I believe in a growing trend this ultimate nightmare is actually going to become an everyday reality for journalists around the world. The oddity is, and what I think what the newspapers fail to grasp, is that something has changed in the world of journalism. It used to be the case that readers decided a newspaper that they trusted, and then relied on the reporter once they picked their newspaper or their television station or their radio station. Now, it seems to be the case that story by story, journalists have to expect that if they're making a controversial claim they've got to back it up with proof. Now wh- the reason I call this trend "zombietiming" is because the website that brought together all this information is called zombietime.com. And do you trust bloggers more than you trust journalists? Absolutely not. The key thing about the story is that it's not about preferring one community of people to another. It's a way in a bit like "crowd-sourcing" that David was talking about before. It's about the power of a large number of people who are each individual experts in some microscopic field coming together to examine something. If I had written a report of something that happened, it would be very uncomfortable to me to find that the ten greatest experts in the world on that particular topic happened to come and read my story and scrutinize it. And I think that's something that's gonna- that's gonna become more and more common. Now how does this get bigger? Because there's- there isn't a limitless number of bloggers reading a limited number of you know BBC copy or CNN copy or- or newspaper or so and so, how does this actually develop? Well I think the fascinating thing is that it's a bit like a beehive. That there's no central organizing force directing the crowd of bloggers, telling them go and focus on this story, or go and focus on that story. What's happening is that thousands and- an increasing number, perhaps in future even tens of thousands of bloggers who are interested in different areas of the news will individually make decisions about what they find interesting.

I'm skeptical of the view that this is about time pressures and deadlines, or that it's about generalists vs. experts.

The bloggers who have detected various mainstream-media mistakes and outright frauds don't have more time on their hands than journalists do -- as individuals, at least, the bloggers generally have less time, since they have day jobs as well. And the overall number of bloggers effectively fact-checking the media is not that large. Overall, I suspect that the number of serious bloggers dealing with any given story is very small compared to the number of reporters and editors working on it.

And in most cases, the bloggers are not experts to start with. Instead, they learn what they need in order to do the critical evaluation that the journalists skipped. That was certainly the case with the famous CBS Rathergate memo, and the faked Reuters pictures, and it seems to be true of the zombietime ambulance story as well.

The problem, it seems to me, is that many journalists and media organizations appear to put accuracy rather low on their list of priorities. Of course, the desire not to look like an idiot is somewhat higher, but in the old days, as Jackson observed, a journalist's version of the facts (and their interpretation) was hardly ever subject to public scrutiny except in court. What's changed? First, web search means that WCFCYA quickly and conveniently; and second, weblogs and other social media publicize the results of fact-checking cheaply and effectively.

This has been obvious to most people for three or four years -- but it's nice to see the punditocracy catching on, even at the BBC.

The discussion continues:

Lawrence Pollard: How does that strike you, Mary Meehan in Minneapolis, is this something that you can imagine happening?
Mary Meehan: Oh, absolutely, this is a really interesting development, I think. The lack of trust that has been developing for institutions, media being one of those, you know, all the corporate scandals, government scandals, the church, you know, there is just this lack of trust out there; that lack of trust drives bloggers' needs to prove the media wrong, bust the myths and expose the lies, to you know, be it lies or not, and they have the power and the platform now to do it!
Lawrence Pollard: Now I've- I can-
Mary Meehan: So they- they just don't trust something they can't see through, it's that transparency that they- that they demand and expect, and they are gong to expose it if they can't get at it. The concern would be, in proving or sourcing the bl- the bloggers' claims as well.
Lawrence Pollard: Mm, that is a-
Mary Meehan: Sounds like a lot more work for journalists, to me.
Lawrence Pollard: David Rowan, uh what do you think about this? You- you're actively involved in newspapers, at the sharp end?
David Rowan: Tim's right, and it's a very good thing that the invincibility of experts is being challenged, and in the Lebanon war as well you might remember there was a- a Reuters photograph of smoke [... somewhat naccurate discussion of the Reuters photoshopped and posed photographs omitted...]
Tim Jackson: I don't think we should expect to see a decline overall in the trust of journalists; I think journalists are no more and no less perfect than investment bankers or policemen or sports stars. What I think it's about is transferring people's allegiance from the brand of the publisher to an individual reporter. And I think that we will come to know increasingly in detail which individual reporters uh can be relied on. And therefore I think the key trend is that journalists are going to have to be especially careful in making it clear whether they're reporting something that somebody claims or whether they are themselves asserting that it's true. I do think however there's one important warning that should be made for the future. Um inside newspapers and television stations there's often a job whose title is to be fireman and to be a fireman or fire fighter means to stay in head office and when something big happens in the world, whether it's a natural disaster or a war, you're the person that gets on a plane with a folder full of cuttings you fly into the country and you then report back within a day or so on the basis of almost zero knowledge. I think the days of the fire fighter are numbered, because fire fighters are inevitably less informed than an intelligent local.

I'm also skeptical that journalistic "branding" is generally being transferred from publications to individual journalists, or that "intelligent locals" are the solution.

It's true that I often evaluate writers individually, whether they work in traditional media or not. Focusing on the Middle East, for example, I tend to trust what Michael Totten has to say on his blog, whether or not I agree with his conclusions, because he's convinced me over time that he's honest and careful and insightful. And I'd continue to trust him if he started writing for the New York Times or doing reports for CNN. On the language beat, there are writers like Michael Erard and Jan Freeman and Nathan Bierma whose work I trust. And it's their bylines that matter, not the publications their work happens to appear in.

But in my opinion, the most accurate and insightful journalism around these days can be found in the pages of the Economist, which has no bylines at all. And I'm much more likely to trust something in the New York Times or the Washington Post if it's by a staff writer than if it comes from the AP or Reuters, whose standards seem to have become remarkably low. And I'm sorry to say that when I read something from BBC News, I start with the preconception that it's likely to be, well, sort of stupid, never mind whose byline is on it.

I don't think this is just me -- there's no sign that (say) Fox and NPR are losing their brand identity as distinctive outlets in the media marketplace.

As for using "intelligent locals" instead of generalist "fire fighters", that also seems to be a red herring. I'm all for getting locals -- whether the locality is geographic, cultural or intellectual -- more involved in public discourse. But locals are no more and no less likely than outsiders to care about accuracy or to think critically; and locals are more likely than outsiders to have a chip on their shoulder.

My fellow linguists are, on the whole, intelligent, knowledgeable and sensible people. And if the mainstream media hired more trained linguists as columnists and reporters, coverage of the language beat would surely improve. But my colleagues have the usual human range of attitudes towards the relative importance of finding the truth, telling a good story, and winning an argument. Expertise is only part of the story here, and maybe not the biggest part.

Posted by Mark Liberman at 10:58 AM


Slavoj Zizek writes, in an op-ed piece in today's New York Times (p. A17), about Muhammad Said al-Sahhaf, Iraqi information minister in the late days of Saddam Hussein's rule:

There was something refreshingly liberating about his interventions, which displayed a striving to be liberated from the hold of facts and thus of the need to spin away their unpleasant aspects: his stance was, "Whom do you believe, your eyes or my words?"


I would have written which (or maybe what), or reworded the whole thing: "Do you believe your eyes or my words?" or "Do you believe what you see or what I tell you?" or something else along those lines. 

I have occasionally collected who(m) referring to decidedly lower forms of life, like bacteria, usually from people who study them and have some attachment to them.  But eyes are not animate creatures, only parts of them, and words are straightforwardly inanimate.  Who(m) -- I'm not going to argue about the case-marking question here -- strikes me as decidedly odd.  You can see how someone would get into using who(m), with your eyes serving as one kind of metonymy (the part standing for the whole) and my words as another (the words standing for the person who produces them).  But it still won't fly.

Eliminating one of the metonymies -- "Who(m) do you believe, your eyes or me?" or maybe with your eyes moved away from the verb believe, in "Who(m) do you believe, me or your eyes?" -- improves things a bit, but only a bit .  Eliminating them both -- "Who(m) do you believe, yourself or me?" -- alters the meaning.  Who(m) has to go.

[Addendum: Zizek is not a native speaker of English, and so can be excused this infelicity.  But the Times has copy editors, and they haven't been shy in the past about altering copy; they should have fixed this one.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:48 AM

January 04, 2007

Less than three years: a policy revision

As you may know, we language writers often have to hold down two or more jobs in order to maintain the lifestyles to which we have become accustomed. I am Senior Contributing Editor here at the great Language Log corporation, but I also moonlight as a Professor of Linguistics at the University of California, Santa Cruz. And I have to participate in the governance of the university just like any other professor. In fact I currently chair a committee of the Academic Senate. And in that capacity I recently received a copy-edited proposed revision of promotion regulations in the Academic Personnel Manual, sent to me for comment by the University Committee on Academic Personnel (UCAP), in which I read the following:

Advancement to Step VI usuallywill not occur after less fewer than three years of service at Step V . . .

The change proposed (along with two other such alterations plus some more substantive changes; old wording in strikeout, new in underlined boldface) is an implementation of an old, old prescriptivist rule that insists less than N  X's is ungrammatical if you can count X's. This rule is bogus. And, I thought, they shouldn't ask professors of linguistics to chair committees if they don't want linguists' opinions. So I couldn't resist writing the following paragraphs as the opening of my committee's letter of comments:

The Committee on Career Advising has carefully examined the proposed new wording for APM 220-18b(4).

First, the Chair of CCA, in his personal capacity as a specialist in English grammatical structure and co-author of The Cambridge Grammar of the English Language, would like to remark that he notes with displeasure the work of the nitpicker on UCAP who has changed "less than N years"to "fewer than N years" in three places in the revised regulations. The change betokens ignorance of the grammatical facts. The phrase "less than" has always been preferred over "fewer than" with expressions denoting time units. For example, Alexander Pope, the greatest English poet of the early 18th century, wrote in a footnote in his superb translation of the Iliad that Priam "loses in less than eight Days the best of his Army." This is not a grammar error!

The rule UCAP imagines it is enforcing here originates in a very modest statement of a personal preference by one Robert Baker in 1770. The modern belief that it is a strict rule bears the same relation to good grammar that shoe fetishism bears to romantic love.

Our committee would have to concede, however, that fighting UCAP's misguided grammatical pedantry, and advising it to obtain a copy of Merriam-Webster's Concise Dictionary of English Usage, is strictly ultra vires. And so we turn to the content . . .

The exciting linguistic bit ends there, and it goes on to boring administrative stuff about the policy itself.

I wish I thought that the scales will fall from the eyes of the Chair of UCAP, and that he will see that he does not have to labor over perfectly intelligible and grammatical phrases like "after less than three years" and alter them to comply with a tentatively expressed personal preference by a modest scholar who has been dead roughly a quarter of a millennium, ignoring the good grammatical taste of the greatest poet of his age. But it won't happen.

The Chair in question is Professor Anthony Norman of the Riverside campus of UC. He is a very distinguished professor (emeritus) of biochemistry, way senior to me. When he got his B.A. from Oberlin in 1959, I was still trying to figure out the fingering of B7 on my first solid-body electric guitar. When he got his Ph.D. in 1963 I was a teenager playing piano in funky bars in Frankfurt with Sonny Stewart and the Dynamos. The likelihood that Professor Norman will ever believe a jumped-up young whippersnapper like me on a question of usage in his native language is zero, zip, nul, nada, zilch.

Professor Norman has, no doubt, a copy of the noxious little Strunk & White book The Elements of Style somewhere around his office, and he probably believes that everything it says is true. "Should not be misused for fewer" is what Strunk & White says about less: "Less refers to quantity, fewer to number." But that's not true. The actual picture of the overlapping uses of these words is much more complicated (for a truly expert discussion of the grammatical facts, see what Rodney Huddleston says on page 1127 of his masterful chapter on comparatives in The Cambridge Grammar of the English Language).

One thing that is particularly clear is that less is fine when you're talking about numbers of time units making up a quantity of time. But Professor Norman will believe William Strunk and E. B. White, not me.

You can't get a leopard to change his spots. In fact, now that I come to think of it, you can't really get a leopard to appreciate the notion that it has spots. You can explain it carefully to the leopard, but it will just sit there looking at you, knowing that you are made of meat. After a while it will perhaps kill you.

I'm quite sure Professor Norman would rather kill me than change fewer than three years back to less than three years the way it always was before in the UC academic personnel regulations. It will not matter that no one who has examined the record of English literature or the content of scholarly works on English usage could think this is a justified view. That's not how it works with usage. People will not listen to linguists on this sort of matter. I know that. But I thought I would write the opening paragraphs into my committee's letter anyway. No one will pay any attention. I'm howling into the wind. But hey, it's part of my job, just as studying the systemic transport and physiological responses of the steroid hormones 1a,25(OH)2D3 and 24R,25(OH)2D3 is part of Professor Norman's job.

The main difference is simply that (for reasons that still are not fully clear to me) I and everyone else will be fully prepared to take him to have professional expertise on the role of 24R,25(OH)2D3 in bone fracture healing, whereas he will never be prepared to take me or any linguist as expert in the matter of the normal distribution of comparative-form determinatives with count and non-count nouns in English NP structure. That's just the way it is.

Posted by Geoffrey K. Pullum at 08:20 PM

Languages with Birthdays

Since the problems with the notion that one language is older than another have come up yet again, I thought I'd mention that there are some exceptions to the principle that all languages are equally old. One sort of exception is that the ancestor of a language may go by another name. There is a sense in which Hittite is older than Lithuanian, namely that we recognize Hittite as a distinct language at an earlier date than Lithuanian. The language that today's Lithuanians were speaking 3,500 years ago was probably something intermediate between Proto-Indo-European and Proto-Balto-Slavic. We don't call that Lithuanian because it hadn't yet differentiated into Lithuanian, Latvian, Prussian, Russian, Czech, and so forth, but it was already Lithuanian in the sense that there was no later discrete linguistic event at which this language was replaced by Lithuanian.

That kind of exception is really just a matter of nomenclature. There are some real exceptions, that is, cases in which we really can talk about a language coming into existence at a certain, relatively recent, point in time. The clearest examples of languages with an identifiable beginning are constructed languages like Esperanto and Klingon. We might include in the same category artificially standardized languages, where the basic language already existed but the standardization can be assigned a specific date. Hindi, in the sense of the artificially Sanskritized Standard Hindi promoted by the government of India, can be said to date from 1950 although the Hindustani on which it was based goes back to Proto-Indo-European and beyond.

Another case in which it may make sense to talk about languages coming into existence at a particular time is that of pidgins and creoles. If it is correct to view pidgins and creoles as newly formed languages with no true parents rather than as descendants of the languages from which they derive their vocabularies, as is now widely, but by no means universally, held, they too can be said to have come into existence at a certain time.

Perhaps the most interesting and dramatic case is that of the signed languages which are now known to have been created de novo in recent times, when enough deaf people came together in a certain place to form a community. One such case that is well documented is that of Nicaraguan Sign Language, which came into existence in the 1970s and 1980s.

There is one other sense in which one can talk about one language being older than another, namely when one language separated from the main stock earlier than the other. We might say, for example, that Albanian, which forms a branch of Indo-European by itself, is older than French because it split off earlier. This is, at first blush, like the usage in biology in which, for example, invertebrates are described as older than vertebrates. Professional linguists avoid this usage, though, because there is a crucial difference in this regard between biology and linguistics. In biology the branch said to be "old" is always the one that is more conservative. "Old" species are primitive in the sense that they are of types that arise early in evolution. In linguistics, there is no such relationship between branching and language type because there is no such thing as a primitive language. There presumably was once such a thing, but if it was ever spoken by modern human beings it was so long ago that we have no knowledge of it. There is a real sense in which jellyfish are more primitive than human beings, but there is no sense in which Albanian is more primitive than French.

Posted by Bill Poser at 04:41 PM

The snow words myth: progress at last

It is not all gloom as regards the media's treatment of language. There are happy stories too. Ash Asudeh just sent me a little "whaddya know" piece headed "Snow Speak" that he scanned from an airline magazine (Holland Herald, published by KLM). It had an illustrative drawing of an Arctic hunter, and it was about snow words. Yawn, I thought. But this one was a real surprise. They had actually been talking to a linguist, it seems, or had at least once met one in a bar somewhere, and although what they said was not accurate, it was a lot closer to being accurate than the familiar nonsense that has been repeated so many times:

The idea that Inuit people have many more words for snow than English speakers is a myth. Most Inuit languages are "polysynthetic". Whereas English uses separate words in the sentence "the snow under the tree" an Inuit person would express this in one word. In fact, English has more words for different types of snow than most Inuit languages.

This still hasn't got everything right. An unsympathetic judgment would be that it's stuffed full of mistakes: (1) the language family is generally called Eskimo or Eskimoan, because it includes the Yup'ik languages of Siberia and Alaska as well as the Inuit languages from the northeastern half of Alaska across Canada to Greenland; (2) all eight Eskimoan languages are polysynthetic to a high degree, not just most; (3) the distinction between bases and derived words isn't even hinted at here, but it's crucial; (4) "the snow under the tree" is not a sentence, it's a noun phrase; (5) I don't think the definite articles in the latter phrase would typically come across in the meaning of a derived word, so the example is a bad one; (6) the point is not about what an Inuit person would do, it's about the structural resources an Eskimo language provides; (7) it's not clear that English has more words (who's counting?), it's just that it appears to be roughly comparable by most sensible ways of counting distinct genuinely snow-related lexeme roots. The point is that we want to count one for each family of derived words like snow, snowy, snowing, snowlike, snowstorm, etc.; if you don't do that, then Eskimoan languages not only have millions of words for snow, they have millions of words for fish, millions of words for coffee, millions of words for absolutely anything, which makes the whole discussion irrelevant to anything about snow.

So I would have written the paragraph more like this:

The idea that Eskimos have many more words for snow than English speakers is a myth. All eight Eskimo languages have extraordinarily rich possibilities for deriving new words on the fly from established bases. So where English uses separate words to make up descriptive phrases like "early snow falling in autumn" or "snow with a herring-scale pattern etched into it by rainfall", Eskimo languages have an astonishing propensity for being able to express such concepts (about anything, not just snow) with a single derived word. To the extent that counting basic snow words makes any real sense (it is often difficult to decide whether a word really names a snow phenomenon), Eskimo languages do not appear to have more than English has (think of snow, slush, sleet, blizzard, drift, white-out, flurry, powder, dusting, and so on).

That would be yet closer to accuracy, even further away from ever being a plausible nominee for a Becky award. It's a bit longer than the original, but not that long. I don't think it overstates anything (there is a great more to say about the different layers of lexicalization in Eskimo derived words, but I'm not trying to do a full essay here). You may think it couldn't possibly be that a language could have words with such complex meaning, but let me just add this. I once browsed for a while in the wonderful Comparative Eskimo Dictionary and came to the conclusion that it looked as if you should be able to make up a single word that would mean "They were wandering about gathering up lots of stuff that smelled like dead fish." I sent an email to Jerry Sadock, who is a serious Eskimologist, asking whether this was true. Back came an email. It contained one (West Greenlandic Inuit) word.

For a couple of discussions of some real facts about Eskimoan snow vocabularies, see the two references given in this post by Bill Poser (which is about New York English rather than Eskimoan, but as you'll see, the theme is there). See also the collection of links given in this post by Mark Liberman if the subject interests you at all.

Anyway, let me end by stressing the positive side again: even the inadequate paragraph that KLM's magazine printed, with all its minor errors, is vastly closer to being true and reliable information than most of what was ever said about Eskimo languages in all the magazines and newspapers and books of the 20th century. Things are improving! Congratulations to Holland Herald for taking a step out of the snowdrift of myth and legend.

Posted by Geoffrey K. Pullum at 12:38 PM

January 03, 2007

Precedent for Executable Articles

I agree with Mark that executable articles would be a good idea, both as a check on the validity of the data and its analysis and as a means of improving comprehension. A version of this idea that might serve as a model was actually created by Stephen Wolfram back in the 1980s. Among his other accomplishments, Wolfram is the creator of Mathematica, a computer system for doing mathematics. One component is what in Mathematica is called a notebook. A notebook is a document in which equations are executable code and graphs and other images are generated from code that remains present in the document so that it may be modified and re-executed.

The reader of a Mathematica notebook can change equations or numerical constants, redraw plots to reflect the changes, or change the graphical properties of plots to suit himself. A notebook is also convenient for the author since it provides a way of embedding the math and images in notes explaining what she is thinking.

Posted by Bill Poser at 09:03 PM

Department of pots and kettles

According to the BBC News ("When celebrities speak on science", 3 January 2007), "Sense About Science has urged stars not to dip their toes into tricky scientific issues without checking their facts first". According to its web site, "Sense About Science is an independent charitable trust promoting good science and evidence in public debates". This is certainly an area where the BBC makes a substantial impact.

In other news, Don Rumsfeld is said to be warning policymakers not to invade countries without first devising a plan for the occupation and committing enough troops to make it work.

Posted by Mark Liberman at 06:09 PM

Post number 4000

This is Language Log post number 4000. That is to say, the ID number that forms the penultimate part of its filename and URL (before the .html part) is 4000. (Following a common modern practice, I am omitting the comma from 4-digit numerals.) The actual number of posts available for you to read is actually less than 4000 because of deleted posts and posts that were begun and assigned filenames but not finished and published (there are about 3938 accessible posts up at the moment), but I want to pretend it is the 4000th post in order to use this post to make a small pedagogical point about the rules for forming ordinal numerals with bases formed from digits (1st, 2nd, 3rd, 4th, 5th, etc.).

The rules are as follows. You use -st after any number ending in 1 other than those ending in 11, and -nd after any number ending in 2 other than those ending in 12, and -rd after any number ending in 3 other than those ending in 13, and -th as the default everywhere else.
[Historical notes: (1) -nd and -rd were spelled -d in American sources in the 1800s and early 1900s, so you get 2d and 3d. But this seems to have faded away: "the 2d" gets almost no hits now, and most are irrelevant references to 2-dimensionality. (2) The practice of typesetting these suffixes as superscripts was built into Microsoft Word as a factory-set default, and that was one of the many small points that made it fairly easy for the blogosphere to show that the Texas Air National Guard memos that pretty much brought Dan Rather's career to an end were crude and stupid forgeries typed on equipment thirty years too young to have come out of a military officer's office in 1972.]

This is, of course, the same rule that a mystery correspondent in Lithuania calling himself "Becky Miranda" violated in an email to me on September 24, 2004, tempting me to read the message with a Subject line about a "tentative meeting on the 2th", and thus revealed that he was in fact a spammer with inadequate experience in using English. No, Becky, I warned him at that time: The number 2 ends in 2 and does not end in 12, so that would be 2nd (or perhaps 2d for spam sent to American addresses). [Dialect variation note: Actually, Americans tend to have a strong preference for "February 2" to "February 2nd". Nerdy note: Josh Millard points out that it might have been random number selection and sloppy scripting that Becky was guilty of: just picking a random number between 1 and 31 and letting the script put "th" on the end would be right more often than not, and perhaps the spammer just didn't care that much about a few million ungrammatical Subject headers about imaginary meetings going out. Josh is quite right, of course. It could have been programmer laziness rather than grammatical ignorance.]

So if we took the file ID numbers as canonical (which as I said, they actually aren't), then we'd say that the last ten posts on Language Log were the 3991st and 3992nd (or 3992d; by Arnold Zwicky), the 3993rd (or 3993d; by Mark Liberman), the 3994th (by Ben Zimmer), the 3995th (again by Arnold Zwicky, actually appearing earlier than the 3994th through an accident of editing delays, another reason why there is actually a metaphysical difference between post number 3995 and the 3995th post), the 3996th and 3997th (by Bill Poser), the 3998th and 3999th (by Mark Liberman), and this one by me, the 4000th. The next one will be the 4001st, and after that the 4002nd (or 4002d). Not the 4002th, Becky. See?

By the way, Becky Miranda has absolutely nothing to do with the nickname "Beckies" for the new Goropius Becanus Prizes that Language Log plans to announce every year. It is simply one of those strange coincidences, that's all.

[Last revision: January 9. 2007. Thanks to Peter Howard, Chris Lance, Josh Millard, and Mart Kuhn for corrections and comments. Who are these people who say that blogs aren't refereed? They're refereed by everybody!]

Posted by Geoffrey K. Pullum at 11:01 AM

Executable articles

Here's an idea whose time has come: scientific and technical papers should include an explicit, executable recipe for generating their numbers, tables and graphs from published data.

Traditional scientific and technical journals require authors to specify their materials, methods and analytic techniques precisely enough to permit replication, because replicability is the foundation of the scientific method and the engine of technological progress. Now that the scientific and technical literature has become a networked digital achive, we can do better. We can expect articles to include an executable -- and readable and modifiable -- procedure for turning published data into the numbers, tables and graphs that play a role in their argument.

In a sense, such executable articles are self-replicating. Of course, genuine replication requires application to new data; but executable articles lower the barriers to such generalization. And there's certainly also a benefit to re-implementation of complex algorithms, to avoid the possibility of bugs or perniciously special cases -- but executable articles make this kind of replication more likely as well, just because they make it so much easier to get in the game at some level in the first place.

Among the many good consequences, I'd like to emphasize three:

  • Minimizing fraud and error, especially the grayer (and commoner) forms, such as selection of atypical examples, omission of important caveats, and use of inappropriate models;
  • Speeding up the virtuous cycle of science and technology, by lowering the barriers to entry into new research areas, and by making larger amounts of data available for re-use;
  • Fostering education on a broad front, by letting students learn by doing, especially students who don't happen to be in one of the few places where crucial new data or new techniques are available.

There are many examples where this sort of thing is starting to happen. But it's far from being the norm, and there are plenty of problems in the way of making such practices more general. For example:

  1. How can authors specify access to data in a durable way? It's not good enough to say "go to my web site and download the tarball" (though this is better than nothing). We need a set of practices for archiving datasets, and for providing standard, durable identifiers for particular versions. And we need stable ways to refer to an arbitrary piece of such a dataset, which can be executed to create a well-defined and reliable output form.
  2. How can authors specify their analysis? There are numerous general- and special-purpose languages that people use in their research. Some are (now) widely available and easy for others to use -- code in Matlab or R or Java or Python -- but even for these, there can be version or operating-system or other contextual problems, and accessing archived data in a portable way poses additional challenges. Should some sort of autoconf system be used? And for how many years do we want to guarantee that an executable analysis should remain executable?
  3. How can authors deal deal with issues of privacy and IPR restrictions on data or code? There will often be IRB ("Institutional Review Board") or similar restrictions on open publication of some raw data -- are there ways to anonymize such data that are morally, legally and scientifically acceptable? Do IRB protocols need to be adjusted to make this possible? What about data whose distribution is restricted by copyright issues?

And the biggest problem, of course, is the cultural conservatism of the academy.

As we look across the disciplines of science and engineering, we can see the seeds of plausible solutions to each of the problems, as well as some subdisciplines where moves in this direction are already underway. But I don't know any subdisciplines where executable articles, in the full sense, have become the norm. And I don't know of any complete and general solutions to the problems -- indeed, I doubt that a single solution is appropriate across all the diverse types of data, algorithms and disciplines.

The way to make progress, in my opinion, is for people to start experimenting more widely. This could be done by encouraging experimental but regular publication of executable articles in existing journals, or by starting new journals that specialize in such papers. Scientific and technical societies in relevant areas could also play a useful role in encouraging this development.

And funding agencies could do a lot to foster the development of needed infrastructure, and to encourage its use.

In the language-related fields where I work, executable articles would (in my opinion) be an especially good thing. If you're interested in pursuing the idea, let me know.

Posted by Mark Liberman at 09:03 AM

The envelope, please

In the larger auditorium at Language Log Plaza, Geoffrey Nunberg, secretary-general of the Goropius Becanus Prize Committee, stands at the podium. As the audience falls silent, and 16th-century Antwerp guildhouses are projected on the screen behind him, Nunberg reads from the Wikipedia:

Johannes Goropius Becanus (1519-1572) was a Dutch physician, linguist, and humanist. He was born Jan Gerartsen in the town of Gorp, situated in the municipality of Hilvarenbeek. As was the fashion of the time, Gerartsen adopted a latinized surname based on the name of his birthplace, Goropius being rendered from "Van Gorp"' and Becanus referring to "Hilvarenbeek."

He studied medicine in Leuven, and became physician to two sisters of Charles V: Marie and Eleonore, who were based in Brussels at the time. Philip II, the son of Charles V, wanted him also as his doctor and offered him a rich income. Goropius refused and established himself as medicus (town doctor) of Antwerp in 1554. Here, free of courtly intrigues, Goropius dedicated himself completely to the study of languages.

Goropius dedicated himself to studying antiquity during this time, and became fluent in many languages. Goropius theorized that Antwerpian Flemish, or Brabantic, spoken in the region between the Scheldt and Meuse Rivers, was the original language spoken in Paradise.

A corollary of this theory was that all languages derived ultimately from Brabantic. The Latin word for “oak,” quercus, Goropius derived from werd-cou (“keeps out cold”); the Hebrew name “Noah” he derived from nood (“need”). Goropius also believed that Adam and Eve were Brabantic names (from Hath-Dam, or “dam against hate"; and Eu-Vat, “barrel from which people originated,” or from Eet-Vat, “oath-barrel,” respectively). Another corollary was the placement of the Garden of Eden itself in the Brabant region. In the book known as Hieroglyphica, Goropius also proved to his satisfaction that Egyptian hieroglyphics represented Brabantic.

Leibniz, himself no stranger to strange ideas about language, is said to have recognized the transcendent quality of this work by coining the French verb goropizer ("goropize") to mean "invent absurd etymologies". So it's fitting that in honor of Dr. Goropius Becanus, an anonymous benefactor has endowed the prestigious Goropius Becanus Prize, awarded to people or organizations who have made outstanding contributions to linguistic misinformation.

As you can imagine, the list of nominees for the first Goropius Becanus Prize (or "Becky" for short) was a long one. Nevertheless, the committee was able to reach a unanimous verdict, which Nunberg will announce today at 12:00 p.m. ET on Fresh Air. Tune in then to learn who won (though it's rumored that an advance text of Nunberg's speech has been leaked).

Some background on likely contenders is available from these Language Log posts:

The BBC: "Parrot telepathy at the BBC"; "The most untranslatable word"; "Tudor linguisic homogeneity"; "The Agatha Christie Code: Stylometry, serotonin and the oscillation overthruster"; "Sad knights who say nü"; "It's always silly season in the BBC Science Section"; "Vicky Pollard's revenge".

Dr. Leonard Sax: "David Brooks, cognitive neuroscientist"; "Are men emotional children?"; "Of rats and (wo)men"; "Leonard Sax on hearing"; "More on rats and men and women"; "The emerging science of gendered yelling"; "The vast arctic tundra of the male brain"; "Girls and boys and classroom noise".

Dr. Louann Brizendine: "Neuroscience in the service of sexual stereotypes"; "Sex-linked lexical budgets"; "Sex and speaking rate"; "Yet another sex-n-wordcount sighting"; "The main job of the girl brain"; "The superior cunning of women"; "The laconic rapist in the womb"; "Open-access sex stereotypes"; "David Brooks, neuroendocrinologist"; "Gabby guys: the effect size"; "Every 52 seconds": wrong by 23,736 percent?"; "Two new reviews of Brizendine"; "Word counts"; "Sex differences in "communication events" per day?"; Busy tongues".

Paul J. J. Payack: "986,120 words for snow job"; "Hackery, quackery, schlock"; "Say anything".

[In this connection, I'd like to draw your attention to a post from the Homeric era of blogging (Language Hat, "Beginnings", 7/31/2002), where Jim Bisso comments that "The sad thing about Goropism is that within it lie the seeds of the evil nexus of nationalism, racism, and linguistic chauvinism." You could turn that comment into a terrific book on the Dark Side of the enlightenment.]

[Update -- Alex Baumans writes:

As a native of Leuven, and hence speaker of the Language of Paradise, I'd like to commend you on the excellent choice of patron for the Language Log Prize. In the version I heard way back at the university, Becanus pointed more specifically to the dialect of Antwerp as the direct descendant of Edenic. Given the generally good opinion the inhabitants of Antwerp have of themselves (they refer to Antwerp as 'De Metropool' or 't Stad', rather like ancient Roman 'Urbs') we found this very typical.

On rereading the Wikipedia article about Becanus, I would venture that that it was written by a compatriot, probably from Dutch Brabant. He carefully avoids every mention of Flemish and accurately describes the language as Brabantic. While in Belgium the term 'Flemish' has come to mean the variants of Dutch spoken there, it is, strictly speaking, a separate dialect (and a quite incomprehensible one at that, made up principally of laryngal fricatives and glottal stops). On the Dutch side of the border, there is no such confusion. An inhabitant of Noord Brabant wouldn't dream of saying that he spoke Flemish, let alone that Adam and Eve spoke it.

I believe that Dr. Goropius Becanus himself called it "Cimbrian", which he derived from the name of Gomer, son of Japeth. Before the pronunciation was corrupted by the Hebrews, of course. By a curious coincidence, "Gomer" has become American slang for a person of certain characteristics, derived (I suppose) from Gomer Pyle of the Andy Griffin show.]

Posted by Mark Liberman at 08:20 AM

Notes on Chinese Character Simplification

Mark's post on Chinese character Simplification cites a number of pieces that might give the impression that critics of Simplification are irrational enemies of progress. There probably are some people who just hate the idea of change, and others who hate any kind of simplification for fear that it will let the rabble in, but I don't think that it is fair to characterize criticism of Simplified Characters as due to unthinking conservatism. Rather, I think that there are good reasons to be critical of many of the changes made as part of the simplification process.

Three observations underlie my belief. The first is that many people, including myself, are critical of Simplified Characters who are not, in general, conservative, and who in fact are sympathetic to writing system reform both in Chinese and in other languages. As regular readers know, I even favor reform of English spelling. The second is that most people who dislike Simplified Characters dislike some simplified characters and not others. The third is that there is a fairly strong correlation across critics of which simplified characters they like and which they dislike. These observations suggest that we critics dislike particular aspects of the simplification process for systematic reasons.

One reason for dislike of many Simplified Characters is that the simplifications have disrupted the relationship between characters and radicals. Over 90% of Chinese characters consist of two parts. The first part, known as the radical, reflects the semantic class of the character. Characters having to do with wood or trees, for example, usually have the radical 木. Some examples are: 橚 "tall and straight (of trees)", 榛 "hazel tree", 柮 "wood scraps", 檀 "sandalwood". Characters having to do with speech often have the radical 言. Examples are 詩 "poetry", 話 "language", 讀 "to read". The remainder of the character, which usually appears to the right of the radical or below it, typically reflects the pronounciation of the character. There is, for example, no semantic connection between "poetry" and the remainder of the character for poetry, 寺 "temple". The combination of 言 and 寺 to make the character for "poetry" is due to the fact that poetry falls into the general semantic category of language and that the word for poetry in Chinese approximately two thousand years ago sounded similar to the word for "temple".

There are many characters that have 雨 "rain" as radical. These include: 雪 "snow", 霏 "to fall (of snow)" 雹 "hail", 露 "dew", 電 "lightning, electricity". This last, however, has been simplified to 电; it has lost its radical. Many people dislike simplifications of this type because they think that delinking characters from their radicals disrupts the system. I've chosen this example in part because this is a case in which one might argue that the principal current meaning is "electricity" and that this has so little relationship to "rain", "snow", and so forth that it is not a disadvantage and indeed is perhaps a virtue to dissociate it from the characters with the rain radical. In most cases, however, the semantic relationship persists and the semantic information provided by the radical is arguably useful to the reader.

Another factor is that many Simplifications violate structural principles governing the well-formedness of Chinese characters. Here is the traditional form of "to study" 學. Its Simplified counterpart is 学. The simplified form has been standard in Japan since the reform of the writing system after the Second World War. I've never met anybody who objected to the Simplified form. It looks just fine. In fact, the traditional form is difficult to write without making it look topheavy, though I think it looks rather dignified in such contexts as the bronze plaques at the entrances to universities.

An example of the problem is the character whose traditional form is 氣. In Japanese the usual form is 気, which virtually no one objects to. The form used in Mainland China is still further simplified: 气. Many people, including myself, object to this simplified form on the grounds that it violates the symmetry principles governing the form of Chinese characters. To put it intuitively, it looks like it is about to fall over on its side as there is no longer anything to hold it up on the left. Contrast it with 汽 "steam", in which the water radical, on the left, holds up the left side.

The fact that even vociferous opponents of Simplified Characters use simplified forms in their own handwriting is not the evidence of inconsistency that some people make it out to be. In Chinese as in English people have different ideas of what is appropriate in informal handwriting and what is appropriate in print. Some of these are subjective, probably aesthetic. An example of this type is traditional 門, which has the Simplified form 门. The simplified form is not an innovation but is a familiar form of long standing, used in informal handwriting. In Japan, where the traditional form 門 is the official form, you'll often see the simplified form 门 on handwritten posters in markets and things like that. Nobody has any problem with the simplified form per se. The problem with it, for critics of Simplification, is precisely that it looks like a handwritten form. To me it looks just as odd in print as a cursive English letter would in printed English.

There are also non-aesthetic reasons for accepting in handwriting simplifications that are not accepted in print. One of them is the fact referred to by Victor Mair, that simplified forms are often more phonological than their traditional counterparts. Chinese "dialects" are quite varied, to the extent that by the criteria used in other places "Chinese" consists of a number of distinct languages. Many forms of Chinese are mutually incomprehensible. Since the syntax of the "dialects" does not vary dramtically, written Chinese is more-or-less accessible regardless of dialect.

Take the character 水 "water". In Cantonese it is pronounced [sui], in Standard Chinese [ʃwe], in Tianjin dialect [swr̩]. If Chinese were written phonologically, this word, like many others, would look quite different in different dialects. The desire for a pan-dialectal writing system has been one of the arguments against romanization, but it is also an argument against greater phonologization of hanzi. The same considerations do not apply in informal handwriting since handwriting is most likely to be addressed to people who speak the same dialect and on familiar subjects.

To this argument some people respond that more and more people know Standard Chinese. That is true, although one can argue over the extent to which Standard Chinese has spread and the extent to which policies favoring Standard Chinese discriminate against poor rural people in some areas. In any case, even if the argument for pan-dialectal writing disappears as a result of the spread of Standard Chinese, the association of Simplification with Standardization is in and of itself a strike against it for the many people who oppose the subordination of their own variety to the Standard.

Chinese character Simplification was based on the naive assumption that reducing the number of strokes would make characters easier to learn and write, without a sophisticated understanding of the tradeoffs involved. That is partly because it was part of a political revolution, in the course of which technical reforms are often botched, but also because the study of reading and writing and writing system design was quite primitive and unempirical. Although there have been some advances, it still is.

Posted by Bill Poser at 02:26 AM

Why is Basque an Ancient Language?

In an article entitled "Peace at Last" in this month's Smithsonian Magazine, Joshua Hammer describes the Basque country as "an enclave marked by an ancient language". What does this mean? Some languages, such as Egyptian, Sumerian, and Hurrian, are referred to as "ancient" because they were used in ancient times and are no longer in use, but that cannot be what Hammer means since Basque is still spoken today. Since most languages are the result of a continuous transmission that stretches indefinitely far back into the mists of time, it doesn't make sense to describe one language as older than another. Most frequently, people who describe one language as older than another are confusing the origin of the language with the date of first attestation in writing, a confusion that Sally Thomason has written about here and here. But that can't be what Hammer means either. Basque is first attested only about five hundred years ago, much later than its Romance neighbors. Even its relative and possible ancestor Acquitanian is attested only from the Roman period.

This seems to be an instance of the same phenomenon as in the description of Yucatec Mayan as "ancient", discussed here, here, and here, where it apparently means that the language is exotic and associated with putatively ancient mysteries. The Basque are different from their neighbors, apparently the only (cultural and linguistic) survivors of the pre-Indo-European population of Western Europe, which once caused them to be regarded as devil worshippers.

Posted by Bill Poser at 01:12 AM

January 02, 2007

Googlefreude, Googleschaden, Schadengoogle...

Lynne Murphy, an American expat teaching linguistics at the University of Sussex, runs a wonderful little blog called Separated by a Common Language, exploring the differences (often quite subtle) between American English and British English. She recently held a Word of the Year contest with three categories: "Most useful import from American English to British English," "Most useful import from British English to American English," and "Best word invented by a reader of this blog." The winner of the first category is the evocative muffin top ('the bulge of flesh hanging over the top of low-rider jeans,' as defined by the American Dialect Society in its 2005 WOTY voting, where it was a runner-up for "Most Creative"). The second category was won by the pejorative wanker (and its derivatives), which certainly has been heard more and more frequently on the left side of the Pond.

The winner of the last category, best word coined by a reader of SbaCL, is Googleschaden. This word's history began when Andrew Sullivan invented the similar Googlefreude, defining it on his blog as "the way in which pundits' past pontifications can now come back to haunt them." When a commenter noted this coinage on SbaCL, Paul Danon suggested Googleschaden might be more appropriate, "since that connotes the grief rather than the joy." An anonymous commenter followed up with Schadengoogle, "to make the parallel to Schadenfreude a little clearer." But by then Googleschaden had already caught on, thanks to a mention on Sullivan's widely read blog. So Googleschaden won out, even though Schadengoogle is a marginally better neologism in my estimation. And I'm still not convinced Googlefreude was all that bad anyway.

Part of the problem, I think, is with Sullivan's original definition of Googlefreude, inspired by a reader digging up a December 2003 post by Markos "Kos" Moulitsas anointing Howard Dean and Wesley Clark as the unassailable frontrunners in the Democratic presidential primary. (Kos is attempting a similar anointment this time around with Barack Obama.) So Googlefreude would seem to evoke not so much "the way in which pundits' past pontifications can now come back to haunt them" but rather "the joy in exposing a pundit's (poorly predictive) past pontifications."  This is not so different from chuckling over such notoriously bad calls as "I think there is a world market for maybe five computers" (IBM chairman Thomas Watson in 1943) or "Guitar groups are on their way out" (Decca Records exec Dick Rowe rejecting the Beatles in 1962). Now it's even easier to unearth such howlers from the blogoscenti, since every misguided pronouncement is archived for the world to see at a moment's Googling.

Thus if the operating emotion is understood as the pleasure in Googling up these unfulfilled prophecies, then there's no problem in using Googlefreude on the analogy of Schadenfreude as the joy (freude) in another's misfortune or harm (schaden). Googleschaden could define something a bit different — the grief suffered by the victim of the Google-enabled disclosure. But if the schaden element is to be preserved, then I agree with the anonymous commenter on SbaCL that Schadengoogle seems preferable, since the whole point is to resonate with that ever-useful German loanword Schadenfreude.

Beyond the semantics of grief and joy, however, Googlefreude works better to my ear because it more closely follows how English speakers create new blends out of lexical material that is from a foreign source or is otherwise compositionally opaque. Previously I've posted about such blends as Jobdango, an employment website combining job with the last two syllables of fandango, and Infogami, a Web application combining info with the last two syllables of origami. The blend components -dango and -gami don't actually mean anything in themselves but are intended to evoke (at least vaguely) the full words fandango and origami. I termed this process "cran-morphing," since segments like -dango and -gami are treated as if they were combinable morphemes semantically linked to the fuller words, akin to cran-grape and cran-apple taking on the cran- of cranberry. Quoting myself:

One significant aspect of cran-morphing is that it completely reanalyzes a segment, regardless of what semantic content the segment may have had earlier in its history, whether in English or another originating language. Cheeseburgers and turkeyburgers don't have anything to do with the inhabitants of a burg, just as Monicagate and Plamegate don't have anything to do with gates.

With this in mind, it's better to think of Sullivan's Googlefreude not so much as a straightforward compound combining Google with the German word freude, but rather a cran-morphed version of Schadenfreude with Google replacing the first two syllables. (It helps that the metrical pattern of the original word is maintained as well.) Most English speakers who use the word Schadenfreude do not actively analyze its composition into the German components schaden + freude. Rather, it's just a long Germanism with an unusual meaning, a meaning that remains crystallized in -freude when combined with other elements like Google.

Another online coinage involving a similar reanalysis of Schadenfreude is Bushenfreude, as used in two articles by Slate columnist Daniel Gross in 2003 and 2004. Like Googlefreude, Bushenfreude suffers from a rather weak definition by its coiner. Gross glosses it as the "weird mix of confusion, annoyance, exhilaration, and anger" suffered by rich Democrats benefiting from President Bush's tax cuts. He writes, "They were enjoying their extra income while loathing its source — a Republican in the White House and Republican-controlled Congress." On the face of it, this doesn't seem to have much to do with the original sense of Schadenfreude, since there's no joy experienced in the suffering of others. Rather, it amorphously recalls the mixed emotions of pleasure and pain encompassed in the German loanword, an evocation that remains even though the first syllable has been overwritten with Bush. (Again, it seems important to maintain the metrical pattern of Schadenfreude in creating the blend.)

There have been other English-language riffs on Schadenfreude that maintain all but the word's first syllable. There's blondenfreude, defined by the New York Times' Alessandra Stanley in 2002 as "the glee felt when a rich, powerful, and fair-haired business woman stumbles" (Martha Stewart being Stanley's case in point). Howard Dean's meltdown in the 2004 Democratic primaries inspired Deanenfreude. Jonah Goldberg of the National Review is partial to Frankenfreude, or "a state of restrained glee at the failures or setbacks of Al Franken." (That one handily bridges the two blend elements with a linking -en-.) And Andrew Sullivan has been down this path before: in an online piece for The New Republic just before Election Day '04 he used votenfreude, which he defined as "the assimilation of other voters' agony." As with his later Googlefreude, Sullivan was unconcerned that the "agony" segment of the original Schadenfreude had been partially overwritten, and I doubt he would have considered schadenvote or voteschaden as possible alternatives. The -(en)freude segment appears to be crucial for such blends, regardless of whether the emphasis is on joy or grief.

Finally, I should note that Sullivan wasn't the first to come up with the coinage Googlefreude. Back in January 2004, the blogger Ann Althouse reported this usage from her son John Althouse Cohen (with a much more straightforward definition than Sullivan's):

They are practicing Googlefraud, and in doing so they're preventing you from maintaining the feeling you had earlier today: what the Germans call Googlefreude (joy derived from Google).

After all of these X(en)freude blends, it may be time to retire this word-coining mechanism for a while. There ought to be a word for this kind of neologism fatigue... How about slangeweile, combining English slang with German Langeweile 'boredom, ennui'?

Posted by Benjamin Zimmer at 02:38 PM

One last squean and knurl

A while ago I reported on a Zits cartoon featuring guidelines for a high school dance, titled:

Winter Mixer guidelines
Provocative dress and lewd behavior are prohibited

and proscribing the following list of activities:

Grinding, bumping, moshing, mashing, licking, squeaning, shoving, sledging, rolling, kicking, wallowing, freaking, pronking, booty dancing, fondling, and whole- or half-body knurling will not be permitted.

In tracking down this vocabulary I eventually extended my search from lewd behavior to take in other sorts of behavior that would be out of place at a school dance, in particular drug use.  Still, some of the items looked suspicious (while others were clearly genuine), and the whole list seemed jokey -- definitely entertaining, in fact -- rather than like a report on current teen vocabulary.

Eventually squean (the noun) was traced to the world of cartoonists.  Now I have a final report on the Zits vocabulary list, including some information from one of the cartoonists.

E-mail, in the order of its arrival:

From our own Ben Zimmer, links to his ADS-L postings of a year ago on the history of freak dancing / freaking.  I'm hoping Ben will post a bit here on the topic.

From Idris Mercer, a link to the 11/19 Mark Trail strip, about antelopes and their inclination to pronk, and a link to the comments section at Comics Curmudgeon, where occasional comments about the delights of the verb pronk began springing up.

From Jake Schneider, identifying rolling as associated with Ecstasy use

From Keith Handley, providing the link between squean and comics, already reported on.  Since then, I've checked Mort Walker's The Lexicon of Comicana to verify that the squean is in there.  But neither the knurl nor knurling is.

From Russell Borogove, suggesting a possible relationship between knurling and the game of knurdling, a.k.a. bottle walking, described as follows:

Mark a line on the floor, and stand behind the line with a beer bottle in each hand. Ensuring that your feet remain behind the line, "walk" forward with the bottles in your hands, and plant one of the bottles upright as far from the line as you can manage. Then, using the remaining bottle only, "walk" back to the line and return to your standing position, still ensuring that your feet remain behind the line.

Not something for a high school dance, I'd think, but probably not relevant here, especially in the context of "whole- or half-body knurling".  [Addendum: also check out nurdle at Grant Barrett's Double Tongued site (thanks to Ben Zimmer).]

And then, from Brad Skaggs, a report on an e-mail exchange with the cartoonists who draw Zits.  Skaggs specifically asked about squeaning and knurling.  Jim Borgman responded with delight at the attention of linguists -- but without an answer to Skaggs's question.  He did say that "most of the terms came straight off a note sent home by my teenage daughter's high school principal in advance of a homecoming dance", adding that he and Jerry Scott embellished the list some, "but not always where your linguists suspected" (though he doesn't identify which items were their embellishments).

This would be a good time to stop searching for the sources of the words on the list.  I suspect that quite a few people were playing around here: the principal's sources (presumably teenagers at the school), maybe the principal himself, and certainly Scott and Borgman.  The overall effect of the list is an entertaining avalanche of words, some familiar, some puzzling.  They might all be teen slang, somewhere, sometime.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:07 PM

Vicky Pollard's revenge

A couple of weeks ago, we batted around the witless lead of a BBC story about an effort to improve British kids' vocabulary:

Britain's teenagers risk becoming a nation of "Vicky Pollards" held back by poor verbal skills, research suggests.
And like the Little Britain character the top 20 words used, including yeah, no, but and like, account for around a third of all words, the study says.
[emphasis added]

Arnold, Geoff and I pointed out that this kind of 20-word vocabulary coverage is true of pretty much everybody, including BBC News and the author of the study ("Britain's scientists risk becoming hypocritical laughing-stocks, research suggests"; "Only 20 words for a third of what they say: a replication"; "An apology to our readers").

But this story also introduced me to Vicky Pollard as character and a stereotype -- and one of the things that I learned, watching Vicky Pollard clips on youtube, is that her character is in fact portrayed as hyper-verbal. Her speech is full of stigmatized lower-class and youth-culture features; but her biggest communicative problem is using too many of the wrong words at the wrong time. So I decided to take a quick look at some measures of her word usage -- partly to confirm my impressions, and partly as a further illustration of some of the issues involved in quantifying vocabulary.

For readers outside of Britain, let's start with an introduction to Vicky. She's a character on a BBC radio and TV show called "Little Britain", played in drag by Matt Lucas as an irresponsible, overweight, rude teenage bully with six (or perhaps even twelve) children and a predilection for shoplifting. She's associated with a few catch-phrases -- "Yeah but no but yeah but...", "Oh my god I so can't believe you just said that!""Don't go giving me evils!" -- and this seems to be what lies behind the BBC News assumption that her vocabulary is generally impoverished.

But how can we really tell whether or not this assumption is wrong? We need a way to measure her vocabulary and compare it with the vocabulary of other (real or fictional) people. One approach starts by looking at how the number of distinct words ("word types") grows as more and more words are spoken (or written). A graphical representation of this function is called a "type/token plot" or a "vocabulary growth plot". Let's look at some examples of vocabulary growth plots, starting with the transcripts of telephone conversations from the published Fisher 2003 corpus.

Mr. Y was the A side of conversation number 50. He was a 43-year-old male with 14 years of education, and on his side of that conversation, he produced 899 total words ("word tokens") involving 384 different vocabulary items ("word types"). There were fewer "types" than "tokens" because 34 of his 899 words were the, 32 were you, and so on. When Mr. Y had produced 100 word tokens, they fell into 77 types; at 200 tokens, he was at 128 types; and so on.

Ms. X, the A side of conversation 2462, was a 21-year-old-female with 14 years of education. She produced 895 word-tokens of 202 lexical types. At 100 tokens, she'd produced 55 word types; at 200 tokens, she was at 89 types.

If we plot the vocabulary growth curves for these two individuals, we get something like this:

You can get a sense of why the two vocabulary growth curves came out differently by looking at a few selected conversational turns from Ms. X and Mr. Y:

X: Yeah I have seen them before, but I don't know, I don't really have time to watch t.v. so I mean I watch them because they're pretty entertaining, but you know I wouldn't actually plan you know to to to plan a time to um to watch them or something like that, no.
I don't know I mean they're just entertaining that that's all it is, I don't think that there's really that these people are really in danger or anything, so mn it's just pure entertainment to me.
So I cannot you know devote my whole attention to whatever's on t.v. 'cause you know there's other stuff that you know I just want to do at the same time, so if I'm watching like news or if I'm watching you know like a talk show or something, I can do everything at once. If I'm watching a movie or if I'm watching, I don't know, something that, you know, I can't really do it um so.
Y: Precisely, plus let us face it, there aren't any commercials on t.v. telling us say something bad about someone else. No, to do that you've got to watch things like America's Stupidest Home Videos or Cops -- an exotic form of gossip, you know.
I'd love to have something where you know we just stick a little old lady with a flat tire by the side of the road and a camera watching.
So unfortunately I came to the conclusion that uh in this day and age what is really needed is the sleazy scandalous tell all about Camelot

It's important to stress that the numbers from one short passage shouldn't be taken as a stable and reliable characterization of how a given person talks. Ms. X, a 21-year-old college student who "doesn't really have time to watch t.v.", is talking about television programs with a stranger, and she's producing a remarkably small proportion of contentful words. But maybe she's a biochemistry major and would totally max out the vocabulometer if the conversation were to turn to protein phosphorylation sites.

Still, a type-token plot is one way to get a good local picture of how someone is deploying words in speech or writing. The Fisher 2003 corpus involved 11,700 conversational sides, so there are a lot more of these vocabulary growth curves to plot. Such a plot is going to get pretty busy, but we can get a sense of what the distribution of word-type/word-token relationships in this corpus is like by plotting all 11,700 end-points:

In fact there are still too many datapoints to be able to see the denser regions very clearly. So here's a 2D kernel density plot with Ms. X and Mr. Y plotted on it, showing where their type-token totals fell in the overall distribution:

I've chosen them, of course, to bracket the top and bottom of the word-type distribution in the middle of the word-token range.

There were 762 speakers (out of 11,700) who produced between 875 and 925 word-tokens, inclusive. The average number of word-types for this group was 270, with a standard deviation of 24.

One of these 762 speakers was Mr. Y, the A side of conversation 50. The 384 different word-types that he produced was 4.7 standard deviations above the mean (for the group producing between 875 and 925 word-tokens).

The 202 word-types produced by Ms. X, the A side of conversation 2462, were 2.8 standard deviations below the group mean.

With some difficulty, I transcribed Vicky Pollard's side of the dialogue in a skit about her attempt to get a job as a telephone sex operator. If we plot her vocabulary growth curve on the same scale as Ms. X and Mr. Y, we see that she's running neck-and-neck with Mr. BigWords:

Vicky has certainly got many problems with what she says and how she says it. But a limitation in the number of different words she uses is not one of them. As evidence, let's compare Vicky's vocabulary growth curve in her sex-worker skit with the start of a weblog entry dated 30 Dec 2006, written by a BBC editor, Kevin Bakhurst, about "Saddam's execution".

The Y (for Mr. Y from Fisher A50), the V (for Vicky Pollard) and the B (for Kevin the BBC editor) are all pretty much superimposed.

Just to make it clear that there are more than two possible vocabulary growth curves, here's Ms. X and Mr. Y again with the start of chapter 1 of Huckleberry Finn (the H curve) and the start of Geoff Nunberg's essay "Label Pains" (the N curve).

Next: how to fit a model to vocabulary-growth curves, and predict the future.

[The Fisher Corpus is part of a larger collection of recorded and transcribed telephone conversations that has been a favorite subject for Breakfast Experiments™ from Language Log Labs: see

"Young men talk like old women" (11/6/2005)
"Another Breakfast Experiment" (11/8/2005)
"Sex doesn't matter" (11/11/2005)
"Gabby guys: the effect size" (9/23/2006)
"Busy Tongues" (12/31/2006)

[Thanks to alert LL readers for pointing out some typos in the first draft of this post. (1) I started with Mr. X and Ms. Y, but then swapped the letters to correspond to the chromosome names, and didn't change all the instances. I think the naming is consistent now. (2) By some trick of the brain and/or fingers, I identified Ms. X as 24 years old in one place and 22 years old in another. When I checked the speaker-demographics database for the published corpus, I discovered that she is in fact listed as 21, which I makes sense for a junior in college, which is what the transcript suggests that she is. Apologies for the errors, and thanks for the peer review, which enabled me to catch and correct the errors within two hours. (It would have been faster, but I was busy with one of my regular jobs.)]

Posted by Mark Liberman at 07:24 AM

January 01, 2007

Girls, boys, and verb forms

The 12/22/06 issue of Science has a brief report (in Constance Holden's "Random Samples", p. 1845) about research on the learning of verb forms by children, research suggesting that "boys and girls employ slightly different strategies in language-learning".  Obviously a topic of interest here at Language Log Plaza.  The description in Science was surprisingly hard for me to follow, and I'm a morphologist (among other things), so I wondered whether non-linguist readers could figure things out.  As it turns out, the abstract of the article in question -- Joshua K. Hartshorne and Michael T. Ullman, "Why girls say 'holded' more than boys", Developmental Psychology 9.1.21-32 (January 2006) -- is significantly clearer than the Science account.

My confidence in the Science account was shaken when I noticed that it said the research was reported in the November issue of Developmental Science, though it turns out that the paper appeared in last January's issue (and was available on-line in December 2005), so it's not exactly late-breaking news.

In any case, the Science report (titled "He Said, She Said") explains:

As tots learn new words, they tend to "overregularize" verbs--that is, apply the past tense "-ed" even to irregular ones, saying "holded" instead of "held," for example.

To see whether the sexes differ, Michael Ullman and colleagues [in psychology at Georgetown] analyzed transcripts of utterances by 25 children--10 girls and 15 boys--between the ages of 2 and 5.  Because girls learn words faster and are more verbally fluent than boys, Ullman's team suspected that the girls would be better at irregular verbs.  But they found that the girls overregularized more than three times as often as did the boys.

Fine so far.  But the reference to overregularization (an entirely standard piece of terminology, by the way) is likely to suggest to readers that RULES are central to the phenomenon.  That is, at this point the reader is probably thinking that in  producing verb forms, girls apply rules much more than boys do; boys, presumably, produce forms they've memorized.  A reader who goes down this path will be mightily puzzled by what comes next:

By comparing how the tots handled words that sound similar, the researchers claim they could distinguish whether the children were using associative strategies or following rules in deciding verb endings.  When boys overregularize, they are more likely to use rule-governed, or "procedural," memory...  But girls are more likely to go with associations--because the past tense of blink is blinked, sink would become "sinked."

Whoa!  How to interpret this?  It sounds backwards.  And where did the associative strategies come from?

What we needed up front is something about how overregularization could happen in two different ways: by using an internalized rule (in "procedural memory") to compose a form "from scratch"; or by analogizing from forms in memory ("declarative memory") on the basis of similarities -- in particular, similarities in pronunciation -- between lexical items.  This would connect the Georgetown research to controversies in psycholinguistics about the roles of these two types of memory in the production of language.  And it would help in avoiding misunderstanding.

The Hartshorne and Muller paper argues that girls use the analogizing strategy more than boys.  Here's the abstract:

Women are better than men at verbal memory tasks, such as remembering word lists. These tasks depend on declarative memory. The declarative/procedural model of language, which posits that the lexicon of stored words is part of declarative memory, while grammatical composition of complex forms depends on procedural memory, predicts a female superiority in aspects of lexical memory. Other neurocognitive models of language have not made this prediction. Here we examine the prediction in past-tense over-regularizations (e.g. holded) produced by children. We expected that girls would remember irregular past-tense forms (held) better than boys, and thus would over-regularize less. To our surprise, girls over-regularized far more than boys. We investigated potential explanations for this sex difference. Analyses showed that in girls but not boys, over-regularization rates correlated with measures of the number of similar-sounding regulars (folded, molded). This sex difference in phonological neighborhood effects is taken to suggest that girls tend to produce over-regularizations in associative lexical memory, generalizing over stored neighboring regulars, while boys are more likely to depend upon rule-governed affixation (hold+-ed). The finding is consistent with the hypothesis that, likely due to their superior lexical abilities, females tend to retrieve from memory complex forms (walked) that men generally compose with the grammatical system (walk+-ed). The results suggest that sex may be an important factor in the acquisition and computation of language.

The paper is a preliminary opening-up of research in this area.  As Steve Pinker notes in his comments on it in Science, the number of subjects is small, and as the authors note in their conclusion, there is probably considerable individual variation within the groups studied.  Science also cites Pinker's cautious characterization of the research as showing that "males and females sometimes use 'different mixtures of underlying processes' to arrive at the same results" (note "sometimes" and "different mixtures" -- "slightly different mixtures" earlier in the Science article).  Everybody has avoided seeing starkly drawn male-female differences in this research.  Let's hope the mainstream media and pop science writers do as well.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:05 PM

Conventionalized oaths

It's a commonplace that taboo vocabulary arises from reference to subjects that are culturally taboo, but then becomes conventionalized.  Eventually, people use the words without a thought to their origins or literal meanings; they're just words.

Case in point:  Jennifer Gilmore's report on her family's Christmas celebration, in the New York Times Magazine "Lives" column of 12/24/06, "Jewish Family Christmas":

My father, who is 100 percent Jewish, has always been obsessed with Christmas. He grew up in Minneapolis, in an unobservant household, and he considers it part of his childhood. ''I remember the lights, the trees,'' he used to say to my little sister and me. ''It was magical.'' He decorates the mantel with Christmas cards and tapes mistletoe to the doorways, and one year he even tried to get my mother, also Jewish, with a much more observant upbringing, to allow an evergreen wreath on our front door. ''I can't live with that,'' she said. ''I just can't. Nothing on the outside of this house. We're Jews, for Christ's sake.''

It's a mild oath, but oath it is, and definitely Christian in its origins.  Lots of Jews use "Jesus" as an exclamation, too.

zwicky at sign csli period stanford period edu

Posted by Arnold Zwicky at 12:13 PM

Happy New Year unless you're an EU bureaucrat

Happy New Year, unless you're a bureaucrat working for the European Union (EU), in which case, commiserations. Now that Bulgarian, Romanian, and Irish have been (as of today) added to the list of official languages of the EU, the usual formula (T = n2 - n) yields a total of (T = 232 - 23 = 506 distinct types of interpreter or translator needed: one for each cell of a 23 by 23 matrix, minus the 23 cells down the diagonal (since no one needs a French-to-French translator, except perhaps in order to understand Derrida, but let's ignore bad philosophy here).

The full list of languages is given at the end of this post, so that those who think they could write them all down can give that a try. I've already given you 3, so you only need to name the others; the maximum possible score for you glottodemography hobbyists is 20 out of 20. A prize is offered for those who get the maximum score: a free subscription to Language Log for the whole of 2007.

The value of the number T can be reduced if you care to assume that anyone who can handle Bulgarian to Irish can also do Irish to Bulgarian, but I have been told this symmetry assumption is typically false, at least for interpreters. If you assume symmetry, you can divide 506 by 2 to get 253 types (the Wikipedia article gives this number). But then again, if you assume asymmetry and also think real-time interpreting is so different from off-line document translation that the two kinds of task would need entirely different types of polyglot, then there may be as many as 1012 different types of linguistic specialist needed in principle to run the EU. By any of these measures, it is clear that the EU needs a massively expensive interpreting and translating bureaucracy, probably more cumbersome and expensive than can possibly be afforded.

As The Economist recently pointed out, what is probably going to happen, paradoxically, is that diversity of language use in the EU will decrease. If there were just two countries in the EU, say France and Germany, it would become de rigueur (may I use French?) for nearly everyone to speak French and German, and one would be embarrassed not to; but with 27 countries now, several of them multilingual, making it all but impossible that anyone could really follow legislative affairs in even a modest percentage of the languages, there will be a drift toward use of a lingua franca, and the most likely thing is that the lingua franca will be English. There won't be Bulgarian-Irish and Irish-Bulgarian translation along with English-Irish, Irish-English, Bulgarian-English, and English-Bulgarian to serve the Bulgarian, English, and Irish speakers, but only the last four of those pairs, if and when needed. (The actual probability of EU Members of Parliament or officials of the Brussels bureaucracy turning up who speak, say, Irish but not English is of course, close to zero; and something similar could probably be said about Maltese, or Danish.)

If all that is needed is people to translate various documents into and out of English, the number of types of linguistic specialist needed (assuming asymmetry, as before) falls to (23 - 1) × 2 = 44 distinct types, which is a bit closer to the realm of possibility.

And the real number needed reduces yet more when one realizes that the relevant officials of many European countries are often excellent at English and will not need to wait for translated documents. Bill Poser pointed out here that a UN survey to see what language different countries preferred to get their official correspondence in, out of the 6 official languages (Arabic, Chinese, English, French, Russian and Spanish), is said to have produced the results: 130 for English, 36 for French, 19 for Spanish, 0 for Arabic, 0 for Chinese, and 0 for Russian. If this is true, it must mean that more than a dozen countries voted against getting documents in their own national language. Go figure that one out! (My guess: better to have the same English text that other countries are using for the working meetings than to use a translation and then have to back-translate at some points to make sure that what you were given, and no one else is using, is accurate.)

Now for the quiz answer: here is the full list of 23 official languages of the EU as of today:


Update: No Happy New Year from Working Languages, where a very irritable post insists that in practice it's very different. The above "calculates, without for some reason attempting to find out what the actual situation is," it says; but in reality, because of multiple competences among translators and chaining (translate A to B and then B to C), about 70 types of specialist suffices, it says. In the first version of this post I mentioned in passing (following an article in The Economist) that only English, French, and German were working languages; but there is dispute about this: apparently all of the languages are working languages, at least in principle, so I took that statement out.

Posted by Geoffrey K. Pullum at 11:17 AM