Language Log: April 2006 Archives

April 30, 2006

Literary shoplifting

I spoke on a radio program this week — Robin Young's show Here and Now on WBUR in Boston (you can hear it here) — about poor Kaavya Viswanathan's sudden catastrophic emergence into notoriety. Listening to the broadcast interview, I realized I sounded very sad and thoughtful. This is not a happy story. I didn't feel like joking around. I feel sorry for this beautiful, articulate, and doubtless intelligent young woman who almost had it all — money, fame, status, and Harvard — and then managed to make herself world famous as a thief. But the fact is that although she has been manipulated and packaged, what has happened to her has very largely been her own fault. She could have prevented this. She shouldn't expect to be able to just smile her way out of it with a few deft locutions crafted by handlers ("wasn't aware"; "may have internalized"; "phrasing similarities"; "completely ... unconscious"; "unintentional errors"). And although she does not deserve to be destroyed for what she did — it's petty pilfering, not an armed robbery at Bank of America — I still think it is reprehensible — and I think that having someone as eloquent as Bill Poser at Language Log trying to construct a defense for her is more than she deserves.

To begin with, look at 22 words of Megan McCafferty's Sloppy Firsts, and the passage in Kaavya's novel that was very clearly stolen from it. It appears as part of a whole scene, where McCafferty's idea was to have a rumination on how far the intelligence of some female character had gotten her. McCafferty chose Sabrina, from Charlie's Angels; Viswanathan simply switched to Miss Moneypenny, James Bond's secretary. But the words expressing the thought were lifted verbatim. Here are the two word sequences, side by side. Pink indicates identity of wording; green indicates changed wording; yellow indicates the place where the exact words were kept but the order of two disjuncts in a coordination of adjectives was switched.

1	Sabrina	Moneypenny
2	was	was
3	the	the
4	brainy	brainy
5	Angel.	female character.
6	Yet	Yet
7	another	another
8	example	example
9	of	of
10	how	how
11	every	every
12	girl	girl
13	had	had
14	to	to
15	be	be
16	one	one
17	or	or
18	the	the
19	other:	other:
20	Pretty	Smart	[switched with line 22]
21	or	or
22	smart.	pretty.	[switched with line 20]

Notice in particular that the word brainy was italicized in both the original McCafferty passage and Viswanathan's copying of it! Unconscious recollection right down to font face selection? No. I think she had the McCafferty book right there, and copied the stuff out of it.

This is not just reliance on a snowclone. With a snowclone like "If the Eskimos have X words for snow, then Y must have Q words for Z", you fill in your choices of X and Y and Q and Z, but you intend your audience to recognize that they have heard "the Eskimos have X words for snow" before, possibly with your choice of X (though people do make up their own values for X); indeed, the potency of this old piece of nonsense stems precisely from the fact that "the Eskimos have X words for snow" (for many different values of X) has occurred thousands of times before, and almost anyone who reads magazines or newspapers or takes social science classes has run into it somewhere. (I did a five-day summer course on management and administration in the University of California once, and I heard the old story about the Eskimos twice from different lecturers during that one week!)

It's quite different with Kaavya Viswanathan's sordid little piece of literary shoplifting. Nobody had ever previously used the sentences and phrases that Viswanathan stole from Megan McCafferty, as far as I know (after some Googling around). And Viswanathan wasn't expecting people to recollect the phrasing and recognize the allusion. She thought, doubtless, that because she had read it years ago when she was a mere girl of thirteen, nobody would recollect it, so she could just purloin bits from it and represent them as her own. And she did it for profit, and she purloined a dozen other passages from the same author as well — an author who is still writing and trying to earn her living from her work.

Bill Poser is right that copyright is being misused today by big corporations determined to protect the profits they make from the intellectual property in which they invest. I don't agree with the extensions of copyright limits way past anything reasonable, so that rights of long-dead authors and film-makers and songwriters can continue to be asserted by greedy publishers or heirs decades after they should have become public-domain material for us all to quote and reproduce. But that greedy, profitmaking business world was the one that Viswanathan was eager to enter, and get her $500,000 contract from. If copyright is being enforced with ever more fierceness, that just makes it all the more important to take it seriously (even if you think the law should be changed). And even if the law should be relaxed, it surely shouldn't be relaxed so much as to permit an author to directly lift another active author's scene ideas and carefully chosen phrasings from works only four or five years old. I think this is a case where a civil judgment might well have been rendered in favor of Little, Brown & Co.

It is quite true that the enthusiasm of greedy publishers and film owners for overly restrictive intellectual property laws often looks unsavory; but you can't logically use that in defending someone who's been doing something equally unsavory — stealing work product from a competitor who's trying to earn a living in the same market.

Similar remarks hold for the observations Bill Poser makes in about the motives of Viswanathan's detractors ("they don't like chicklit; they think that the book isn't very original; they think that she is a spoiled rich girl; they resent the disproportionate academic and business success of Indians and are eager to take one down"). If some people hate prosperous young women or literature written for them, that's their problem. But it's irrelevant to Viswanathan's problem, which is that she got caught being unoriginal and dishonest for profit.

A person who criticizes Viswanathan's conduct because of a dislike or resentment toward a highly successful ethnic group is being contemptible and silly (the Indians who have come to America generally earn their success honestly, and I admire them). But it's not contemptible or silly to hold the view that nobody should steal someone else's phrases, sentences, or paragraphs and pretend they are her own (no, not even just a few).

Especially not when they're competing in a big-money market against the very novelist they're stealing from, or (I give fair warning) when they're writing a paper for a class in which the professor is me.

Posted by Geoffrey K. Pullum at 02:23 PM

Out-of-time, out-of-body seeing and hearing

David Giacalone of f/k/a is puzzled :

Why do so many news broadcasters -- from PBS's Jim Lehrer to ABC's Elizabeth Vargas -- end their show by saying "we'll see you here tomorrow"? Yes, the transitive verb "see" does have many meanings, but not one of them is "be watched by you."

David promises not to contribute to this illogical development ("The f/k/a Gang swears never to use the phrase 'we'll see you here tomorrow.' We might say: 'Come back and let our StatCounter perceive your presence.'"), and he asks for help in understanding it:

Maybe Mark Liberman at Language Log can explain (or, more likely, explain away) this language problem for us.

Although I've never had the pleasure of meeting Mr. Giacalone, this request continues a virtual conversation that I've enjoyed, and so I'm happy to offer what little insight I can. And luckily, it seems to me that Laura Cantrell offers a clue in today's NYT, in an Ideas and Trends piece on Bob Dylan's new radio show ("Play a Song for Me"):

In doing my own program, "The Radio Thrift Shop," on WFMU in Jersey City for the last dozen years, I've tried to make radio by hand, like a good pie, served from my studio right into, well, wherever you happen to be listening.

Clearly atmosphere is most readily evoked with a live format (most commercial and satellite radio not only is not live, it is assembled by computer), but it can be done with the out-of-time, out-of-body approach, and a recording artist like Mr. Dylan probably has a built-in advantage.

The best example may be Hank Williams. In the early 1950's, many artists did double duty behind the broadcaster's microphone, often appearing at early morning hours on programs with names like "Farm and Fun Time." Mr. Williams's "Health and Happiness Shows" were recorded in the WSM studios in Nashville to 16-inch acetate transcription discs, which were copied and shipped to radio stations across the country.

Though these shows were "canned," they have an immediacy that a lot of our modern, technology-assisted radio lacks. The thrill of hearing Hank encouraging his band to "play it like you mean it, boys," or cracking up at some goofy studio banter, switching sincerely to his closing "If the good Lord's willin' and the creek don't rise" give some sense of who he was beyond the records and the songs he left behind.

In fact, the full phrase that Hank always used at the end of his Health & Happiness Shows was "If the Good Lord's willing and the creeks don't rise, we'll be sure to hear from you again". That's pretty much the radio version of the inverted meaning in TV closings like "We'll see you here tomorrow."

It's clear enough, I think, why Hank said "we'll be sure to hear from you again" and not "you'll be sure to hear from us again". In the first place, he's making a promise for himself and his band -- it would be strange and even rude for him to try to commit the listener to tuning in again. He could have promised that "we'll be sure to play for you again", but that would highlight the very thing he wants to overcome, the one-way, non-interactive nature of the medium. He's trying to make listeners feel that he's right there with them, taking in their requests and their reactions as if he were playing a live roadhouse gig rather than a canned radio show.

Hank chose an image that emphasized the empathy he wanted to feel, and if he strayed a bit beyond the strict bounds of logic, surely an author of "one-breath poetry" can forgive him.

[David G. has responded:

To me, by skewing the meaning of the words, instead of saying something factually true that invited the listener to come back, Hank -- and I'm a fan of his music -- is manipulating the feelings of the listener and making our language a bit less useful as a tool of communication. Surely, if Hank tried his "But, I've been here with you all night" baloney with a sweetheart, from out of town, she'd call his bluff. And maybe sing "Your Cheatin' Heart."

I'll take refuge in a quote from Nabokov:

Art is never simple. To return to my lecturing days: I automatically gave low marks when a student used the dreadful phrase "sincere and simple"-- "Flaubert writes with a style which is always simple and sincere"-- under the impression that this was the greatest compliment payable to prose or poetry. When I struck the phrase out, which I did with such rage in my pencil that it ripped the paper, the student complained that this was what teachers had always taught him: "Art is simple, art is sincere." Someday I must trace this vulgar absurdity to its source. A schoolmarm in Ohio? A progressive ass in New York? Because, of course, art at its greatest is fantastically deceitful and complex.

As, in its own way, is ordinary language.

But perhaps we're all over-analyzing. If French speakers end phone conversations with "au revoir", or German speakers with "auf wiedersehen", they're not being telling a lie, misusing their language, or even being especially artful. It's just an expression, as we can tell from the joke that's been repeated by generations of junior-high-school students:

A:	See ya later.
B:	Not if I see you first...

]

[Update: Joe Allegretti wrote:

I'm was reminded of a German-language radio program I listened to in the 1980s out of Trenton, NJ, from which the host would always sign off with "auf Wiederhören," which is, obviously, the more precise aural-oriented equivalent of "auf Wiedersehen." Apparently, this is also the common phrase used for phone conversations. Another interesting example of nuanced language for which precise English equivalents do not exist (except maybe "see you later," "talk to you later"?).

I've also used "auf Wiederschreiben" in correspondence, but found the following links interesting, regarding use of "auf Wiederlesen":
http://everything2.com/index.pl?node_id=977347
http://lists.debian.org/debian-tetex-maint/2004/05/msg00264.html

]

Posted by Mark Liberman at 01:02 PM

Punctuation tip's

Here's yet another complaint of apostrophe abuse in comic-strip form, this time from Steve Breen's Grand Avenue.

(Hat tip's to David Giacalone of f/k/a.)

Posted by Benjamin Zimmer at 10:52 AM

Courtleye makere saith copyinge was withovten entente

For a historical perspective on the Kaavya Viswanathan case, consider the accusations leveled by Frater Thomas Walsingham against Galfridus Chaucer:

Callynge hymselfe a "huge fan" of Mayster Boccacce his poesie, Mayster Chaucere dide adde, "Aware ich was nat of how much the wordes of Boccacce dide stikke in myn imaginacioun." Mayster Chaucere dide apologise to the soule of Boccacce and dide saye that his was the laste tyme he wolde model eny wrytynge upon hym in tyme to come, "saue for a smal werke in a frame-tale that ich endite at presente."

Posted by Mark Liberman at 09:33 AM

Kaavyagate Update

My defense of accused plagiarist Kaavya Viswanathan is definitely in the minority, but I have to say that I'm not impressed with the quality of most of what I've been reading in the press or on the blogs. There's a great deal of vituperation but very little light shed on the situation. People seem to enjoy joining the lynch mob. Many if not most people writing on this topic allege that substantial parts of Opal Mehta are word-for-word copies, which is false. They evidently haven't taken the trouble to look at the problematic passages. The argument that the similar passages must be due to intentional plagiarism continues to be made exclusively by the technique of bald assertion. No one has yet taken up my challenge of providing a plausible scenario for the alleged plagiarism. This piece by Jack Shafer purports to explain "Why Plagiarists Do It", but the reasons he gives don't provide a rational explanation of the instant facts. Some of the more interesting news is appearing at the Harvard Independent.

Some people are making a lot of the fact that Kaavya's publisher, Little Brown, has withdrawn the book. I wouldn't assign that much evidential value. Publishers are scared of lawsuits, even if they aren't likely to lose. Defending a suit for copyright infringment can be very expensive. They may just be avoiding the cost of litigation.

A lot of people, especially on the blogs, seem to give away ulterior motives for their sniping: they don't like chicklit; they think that the book isn't very original; they think that she is a spoiled rich girl; they resent the disproportionate academic and business success of Indians and are eager to take one down.

Posted by Bill Poser at 02:23 AM

Arabic machine translation from Google Labs

Franz Och at Google Labs has announced interactive sites where you can try Arabic-English and English-Arabic machine translation.

I tried a random story on the Al-Hayat web site. Cutting and pasting from the story worked pretty well: the first two paragraphs came out as

In a step considered a retreat in a Tehran crisis of the nuclear file, Mohamed Saidi, deputy head of the Iranian Atomic Energy Agency, his country's readiness to "answer all questions of the International Atomic Energy Agency, including a return to the application of the Additional Protocol to the sudden inspection of nuclear facilities in Iran ".

This coincided with the Organization of the Islamic Conference "deep concern" developments in the Iranian nuclear file. and the adoption of its claim "dialogue and peaceful means in resolving the dispute". As called for the Russian Foreign Ministry Iran suspended active enrichment, and to ensure "full cooperation" with the IAEA.

Since I hadn't read the news since the story came out about Iran's conditional offer of concession, I learned from reading this translation that something had happened, and could see roughly what the new development was.

For some reason, however, submitting the URL for that page didn't work for me, producing garbled text. However, submitting the URL of a random BBC Arabic page worked well, producing fairly readable output. Among the remaining problems, I noted a tendency to fail to deal appropriately with some VSO sentences:

He said the US delegate to the United Nations John Bolton that it is urgent to take firm action in this regard.

He warned Richard Brook former official in the Clinton Administration, The expected nomination for the office of the Ministry of Foreign Affairs if the Democrats win the presidency. to transform Iran into a nuclear power would mean exposing global stability at risk.

You can read some of the background of this work in these early Language Log posts:

The value of evaluation (7/30/2003)
NYT story on DARPA MT ... doesn't mention DARPA! (7/31/2003)

At that time, Franz Ochs was working with Kevin Knight at ISI, who is featured in the second post; and I believe (though I'm not certain) that the MT system whose output was quoted in the second post was one that Franz played a key role in creating.

In any case, you can see that there has been more progress since 2003.

[Full disclosure: Partha Pratim Talukdar, a Penn graduate student, was an intern at Google Labs last summer, working with Och's colleague Thorsten Brants, and I'm one of the authors of a paper resulting from that work: Partha Pratim Talukdar, Thorsten Brants, Mark Liberman and Fernando Pereira, "A Context Pattern Induction Method for Named Entity Extraction", CoNLL-X, June 8-9, NYC. ]

[Update: more detail about the Verb-Subject-Object issue can be found here.]

Posted by Mark Liberman at 02:03 AM

Negation day in the news

A report by Kerry Grens on All Things Considered, 4/29/2006, starts like this:

Today is negation day. The class works out how to say something wasn't, isn't, didn't or won't. Seven linguistics majors arrange themselves in a loose circle around native Kisii speaker Henry Gekonde, and press him for data.

The subject is a language documentation course at the University of New Hampshire, taught by Naomi Nagy, and focusing this year on Kisii or Gusii, with MA student Henry Gekonde acting as language consultant.

There are many such courses around the world -- though not as many as there should be -- but Prof. Nagy's course has at least one uncommon characteristic: local radio and TV crews, and a media page.

I've always thought that such "field methods" classes are not only interesting to linguists, but also inherently mediagenic. Even for people without much interest in other kinds of linguistics, it's fascinating to watch that kind of exploratory analysis unfold. But I never did anything to act on this belief, and I'm impressed that Naomi has involved local journalists in documenting her documentation course.

This particular course has a good personal-interest story line as well: Gekonde, who has a journalism degree from Indiana and worked as a copy editor on a paper near UNH, is getting a linguistics MA and plans to return to Kenya to create a dictionary for his native language, which is spoken by a million and a half people in Kenya but is not normally used in written forms.

[Naomi Nagy got her PhD in 1996 from the University of Pennsylvania, where I teach. Her dissertation was on Faetar, a Francoprovençal dialect displaced about 1,500 kilometers from southern France to a couple of villages in Apulia, near Naples, where migrants (soldiers from the army of Charles of Anjou?) settled some 600 years ago. ]

Posted by Mark Liberman at 12:33 AM

April 29, 2006

Can you speak in rhinoceros?

At the end of the LiveScience article on the starling controversy, a perverse piece of reasoning is attributed to Chomsky:

"[...] if someone could show that other animals had the basic property of human language, it would be of very little interest to the biology of language, but would be a puzzle for general biology," Chomsky said. "It's expected that if a species has some ability that has real selectional advantage, it will use it."

The premises seem fair, but Chomsky's conclusions are topsy-turvy.

Suppose someone has shown us "that other animals had the basic property of human language." In fact, let's suppose we actually have Gerald, a truly great ape who I just reported on. Gerald has some ability that allows him to manifest behaviorally "the basic property of human language": indeed, he's commandingly erudite. It seems clear (though this is only implicated by the above quote, not asserted) that Chomsky thinks the relevant ability would have "real selectional advantage," and that we should therefore expect that the ability is used in the wild. So now, what should we infer?

a. Chomsky's addled conclusion: Gerald would show that standard biological theory is wrong, since sometimes complex abilities evolve without leading to performance of any action that would confer selectional advantage.

b. The correct conclusion: Whatever innate cognitive capabilities Gerald has which enable him to process human language, gorillas must use these capabilities to perform tasks in the wild, tasks which confer selectional advantage. Having observed gorillas in the wild, we take it that those tasks are non-linguistic.

If I show you Gerald, you'll choose (b) every time, right? I mean, duh? Is the headline going to run: Gorillas supersmart: have been hiding it?

So Gerald's impressive language abilities would comprise part of the general intelligence that, if in the wild, he would use for dealing with nuts, bananas, and his mother. There would be no paradox for general biology, except to the extent that we would wonder about the conditions needing to obtain before cultural evolution of language might take place. And yes, of course the result would be of interest to the biology of language! Imagine you're the editor of the journal Language, and a paper reporting that an ape can converse freely in English and Greek comes in the door. The ape is apparently a co-author. Even though you believe every claim made in the paper, you reject it with the comment "Dear Professor Fielding and colleagues, unfortunately we cannot accept your submission to our journal, as the results are of no interest for linguistics." I think not.

Chomsky and his acolytes have long claimed that the ability to process language does not comprise part of the general cognitive abilities of a problem solving animal, but is an entirely disjoint ability dependent on special purpose neural structures. Some of the evidence for this position, e.g. evidence of localization of language processing centers in the brain, is quite compelling. But the hypothetical Gerald would blow all that evidence out of the water.

This week we learned that Starlings have an ability that might loosely be described as linguistic, though you shouldn't expect one to be interviewed on the Tonight Show anytime soon. Here, I must admit I have some sympathy for Chomsky's position. We now know a little more about bird brains, but not much more about human language. Where I disagree with him is in the general principle he invokes, which seems to imply that even animals producing and comprehending grammatically correct English would be of no consequence for linguistics. Such a conclusion would be ludicrous.

By the way, many of you will recognize my title, taken from the song "Talk to the Animals" by Leslie Bricusse. And the immortal answer: "Of courserous, can't you?"

Posted by David Beaver at 08:50 PM

Wild? I was livid!

If only this piece of research from the early eighties had settled the never ending animal language debate:

An aside: the collective noun "flange (of baboons)" originated from Gerald in the above "Not the Nine O'Clock News" sketch. Gerald corrects the professor so authoritatively that I remember being taken in by him myself. "Flange" is now one of three collective nouns wikipedia and answers.com cite for baboons, the other two being "troop" and "congress". Funny thing about exotic collective nouns, as you'll gather immediately from a web search: their main use is in a language game we play that involves listing collective nouns, not in informative narratives about the animals in question. So, exotic collective nouns function primarily as a social marker, presumably showing a speaker's education and class. If eskimos did have thousands of words for snow, I bet they'd use them for the same purpose.

Posted by David Beaver at 08:14 PM

Ma Ferguson, the apocryphal know-nothing

Eric Bakovic recently invoked the famous saying attributed to Texas Governor Miriam Amanda "Ma" Ferguson: "If the King's English was good enough for Jesus Christ, it's good enough for the children of Texas!" Bloggers commenting on Bush's opposition to "Nuestro Himno" (such as aldon of the Daily Kos) have also been recalling the quote, which is a favorite of another former Texas governor, Ann Richards. Though I haven't been able to find any firm attributions of the quote to Ferguson during her own lifetime (she died in 1961), humorous variations on this know-nothing assertion are attested all the way back to 1881.

The earliest variants on the saying that I've found relate to the 1881 English Revised Version of the Bible, with the sentiment being that Saint Paul preferred the King James Version:

"Preaching on the Bible; Pulpit Opinions of the New Version."
New York Times, May 23, 1881, p. 8
The Rev. Dr. Pentecost ... illustrated the tenacity with which people cling to the old Bible by telling a story about an agent of a Bible society who was trying to collect money in a country church for a new translation of the Bible. The agent asked an old farmer in the congregation to contribute. "What's the matter with the good old King James version?" the farmer replied. "That was good enough for St. Paul, and it's good enough for me."

"'The New Covenant' and its Critics."
J W Hanson. The Universalist Quarterly and General Review.
Boston: Oct 1884. Vol. 21; p. 465
Prof. Schaff pertinently observes: There are many lineal descendants of those priests, who, in the reign of Henry VIII, preferred their old-fashioned Mumpsimus Domine to the new-fangled Sumpsimus; even in the enlightened State of Massachusetts, a pious deacon is reported to have opposed the Revision of 1881 with the conclusive argument, "If St. James's Version was good enough for St. Paul, it is good enough for me!"
[Apparently quoting Philip Schaff's Companion to the Greek Testament and the English Version (1883).]

Nebraska State Journal, June 16, 1901, p. 12
"The Sketch," of London, says: "A new book on the history of the English Bible has a good story of a certain sprightly young deacon who, in preaching against the advocates of the revised version, startled his hearers by the contention that, if the authorized version was good enough for St. Paul, it was good enough for him!"
[Story also appears in: Davenport Daily Republican, February 27, 1902.]

Barry Popik of the American Dialect Society turned up a transitional form, where it is English, rather than the King James Version of the Bible, that is deemed good enough for Saint Paul:

New York Times, Jan 15, 1905 (Sunday Magazine), p. 8
Prof. Adolphe Cohn of Columbia University recently, in discussing the teaching of French and German in public schools, said that the attitude of a good many people on that subject was explained to him very aptly by a remark he had once overheard in a street car. Two elderly Irish women were talking about their children, when one remarked: "I won't let my child be taught Frinch."
"Why not?" inquired the other.
"Sure," replied the first, "if English was good enough for St. Paul to write the Bible in it's good enough for me."

"Language of St. Paul."
Puck, Sep 11, 1912, Vol. 72, Iss. 1854; p. 10
Among the Wesleyans of a century ago there was a well-known and eccentric preacher named David Mackenzie. ... He was a lay preacher of the old order, and was admitted without having read the prescribed "Wesley's Sermons," and the rest. He boasted of his lack of "book learning," and scornfully told a student of the new school, who was learning Latin, that "English was good enough for St. Paul; ain't it good enough for you?" — Youth's Companion.

So the saying was circulating widely with "St. Paul" rather than "Jesus" long before Ma Ferguson was elected governor of Texas in 1924. Though the Newspaperarchive database includes many Texas papers from the 1920s, I have yet to find any attribution of the "Jesus" quote to Ferguson during her administration (or afterwards, for that matter). I did find it credited to "a man in Arkansas" in 1927:

Chronicle Telegram (Elyria, Ohio), April 27, 1927
An official of the Rockefeller Institute states that, among hundreds of letters of denunciation received by the institution during the past year, one was from a man in Arkansas who took the view that all this modern education is dangerous, and that the new-fangled practice of grounding preachers in Latin and Greek is especially pernicious. They ought to be taught English, he said, adding in conclusion: "If English was good enough for Jesus, it's good enough for me."

Considering how the quote in all its variants has been used primarily to ridicule the backwardness of unnamed Christians (a farmer, a pious deacon, and so forth) wary of new approaches to the Bible, I highly doubt Ma Ferguson ever said it — or if she did, she probably would have said it in self-effacing jest. My guess is that this was a free-floating bit of preacher humor that unfairly got attached to Ma Ferguson, much as Winston Churchill attracts various apocryphal witticisms.

Posted by Benjamin Zimmer at 04:30 PM

More water cooler chat from Language Log Plaza

Another set of stitched-together emails, this time from Roger Shuy, Arnold Zwicky and Eric Bakovic:

RS:	Eric's excellent post on Nuestro Himno might have been called, Nuestro Himno-no-no. Now what's the next step? Changing the Spanish names of towns and cities in Texas? How about Body of Christ? Saint Anthony? Saint Mark? On the River? Wild River? The Pass? The Field?
AZ:	Why stop at Texas? there's a lot of Spanish out there to expunge. Arnold, in Tall Tree, waiting for a Saint Francis friend to bring two visitors from Flowery...
EB:	Well, you know, what's good enough for Jesus ... -- Eric (of St. James, apparently -- I never knew that!)

Roger was of course referring to Corpus Christi (which is actually Latin, but the similar principles apply), San Antonio, El Paso, and so on. Arnold is in Palo Alto waiting for his visitors from Florida.

And Eric (in San Diego, I think, which in English would be Saint Didacus, as Ambarish Sridharanarayanan pointed out by email) is alluding to a quotation often attributed to Miriam Amanda "Ma" Ferguson, "[c]laimed to be said as she was holding a bible, about her reason for objecting to the teaching of Spanish in schools":

If the King's English was good enough for Jesus Christ, it's good enough for the children of Texas!

As the Wikipedia article on her explains,

She was an educated woman and fairly well-read, so it is somewhat unlikely that she actually ever uttered those words. That quote has also been widely attributed to many others, both before and since the time of Mrs. Ferguson.

The main issues in her first primary campaign were Prohibition and the Ku Klux Klan, with her opponent Felix D. Robertson being in favor of both, and Ferguson opposed. She trailed Robertson in the first round of the primary, but won a run-off, and then defeated the former dean of the University of Texas law school in the main election. More on Miriam Ferguson here, here, here, here.

Posted by Mark Liberman at 03:59 PM

The multilingual anthem

In the Washington Post's reporting on the "Nuestro Himno" controversy, David Montgomery wrote:

At least 389 versions [of "The Star-Spangled Banner"] have been recorded, according to Allmusic.com, a quick reference used by musicologists to get a sense of what's on the market. Now that [Jimi] Hendrix's "Banner" has mellowed into classic rock, it's hard to imagine that once some considered it disrespectful. The other recordings embrace a vast musical universe: from Duke Ellington to Dolly Parton to Tiny Tim. But musicologists cannot name another foreign-language version.

I don't know which "musicologists" Montgomery consulted, but Wikipedians have had better luck finding other foreign-language versions of the anthem. So far contributors to the Wikipedia page for "Nuestro Himno" have turned up examples in German, Yiddish, Samoan, French, and Latin. Not only that, they discovered a number of other Spanish versions reproduced on the website of the U.S. State Department. (Will this page be removed now that President Bush has declared that the anthem "ought to be sung in English"?)

A site on German lieder provides two translations into German, the first by Niklas Müller from 1861 and the second by an unknown composer published between 1861 and 1864. The first stanza of each rendering:

[Version 1:]
O, sagt, könnt ihr seh'n
Bei der Dämmerung Schein,
Was so stolz wir begrüßten
In Abendroths Gluten?
Dess Streiffen und Sterne,
Durch Kämpfender Reih'n,
Auf dem Walle wir sahen
So wenniglich fluten;
Die Raketen am Ort
Und die Bomben vom Fort,
Sie zeigten bei Nacht,
Daß die Flagge noch dort.
O sagt, ob das Banner
Mit Sternen besäet
Über'm Lande der Frei'n
Und der Tapfern noch weht?

[Version 2:]
O! sagt, könnt ihr seh'n
In des Morgenroths Strahl,
Was so stolz wir im schei-
denden Abendroth grüßten?
Die Sterne, die Streifen,
Die wehend vom Wall,
Im tödlichen Kampf
Uns den Anblick versüßten?
Hoch flattere die Fahne
In herrlicher Pracht,
Beim Leuchten der Bomben
Durch dunkle Nacht.
O! sagt, ob das Banner,
Mit Sternen besä't,
Über'm Lande der Freien
Und Braven noch weht?

On Mendele, a Yiddish literature and language mailing list, Leonard Prager supplied the lyrics for a Yiddish version of the anthem by Ber Gri (taken from In dinst fun folk; almanakh fun yidishn folks-ordn, New York: Book League of the Jewish People's Fraternal Order I.W.O., 1947, p. 112). The first stanza:

"Star spengld bener"
fun Frensis Skat Ki
O zog! konstu zen in likht fun sof nakht,
Vos mir hobn bagrist in demer-shayn mit freyd?
Di shtrayfn, di shtern -- in flaker fun shlakht
Fun di shuts-vent mir hobn mit bang in blik bagleyt.
Un der blits fun raket, un der knal fun kanon
Durkh der nakht gerufn hobn zey: es lebt di fon.
O zog! di fon mit di shtern iz zi nokh tsehelt
Iber land fun fraye un iber heym fun held?

Here are the lyrics to a Samoan version, which the Samoa News reports has been proposed as the official anthem of American Samoa:

Aue! se'i e vaai, le malama o ataata mai
Na sisi a'e ma le mimita, i le sesega mai o le vaveao
O ai e ona tosi ma fetu, o alu a'e i taimi vevesi tu
I luga o 'Olo mata'utia, ma loto toa tausa'afia
O Roketi mumu fa'aafi, o pomu ma fana ma aloi afi
E fa'amaonia i le po atoa, le fu'a o lo'o tu maninoa
Aue! ia tumau le fe'ilafi mai, ma agiagia pea
I eleele o Sa'olotoga, ma Nofoaga o le au totoa

The Wikipedia article also notes a reference to a French version of the anthem translated by an Acadian (Cajun) organization in Louisiana, though no lyrics are given. And finally, Christopher Brunelle has translated the third stanza into Latin (found on the abovementioned German lieder page as well as the Classics-L mailing list):

ubi nunc isti sunt
tam superbo voto
furiali pugna territos et clamore
esse nos cum terra
carituros domo?
caligata lues expurgatast cruore!
mercennarius et
servus effugiet
acherunta frustra nec servabit semet
et vexillum stellatum
vibrat in triumpho
libera in patria et in forti domo

In case you're wondering about the original lyrics for the anthem's rarely sung third stanza, here they are:

And where is that band
Who so vauntingly swore
That the havoc of war and the battle's confusion
A home and a country
Will leave us no more?
Their blood has washed out their foul footsteps' pollution!
No refuge can save
The hireling and slave
From the terror of death and the gloom of the grave,
And the star-spangled banner
In triumph shall wave
O'er the land of the free and the home of the brave.

[Update #1: The anthem was sung in the Uto-Aztecan language of Tohono O'odham at the 2004 Democratic National Convention.]

[Update #2: T. Carter Ross sends along a link to a French version of the anthem by the Cajun group Les Amies Louisianaises:

La Bannière Étoilée,
l'hymne national américain
(The Star Spangled Banner)
(Trad., P.D., French words David Émile Marcantel
Vocal arrangement Jeanette Aguillard)

O dites, voyez-vous
Dans la lumière du jour
Le drapeau qu'on saluait
À la tombée de la nuit ?
Dont les trois couleurs vives
Pendant la dure bataille
Au-dessus des remparts
Inspiraient notre pays.
Et l'éclair des fusées,
Des bombes qui explosaient,
Démontraient toute la nuit
Que le drapeau demeurait.
Est-ce que la bannière étoilée
Continue toujours à flotter
Au-dessus d'une nation brave,
Terre de la liberté ?

A clip of the song can be heard here.]

[Update #3: Some interesting discussion on MetaFilter, including a link to a 1936 audio clip of the anthem being sung on the radio in Yiddish, courtesy of the Yiddish Radio Project.]

[Update #4: More good linkage at Boing Boing, including the sheet music for a 1919 Spanish rendering of the anthem (one of the four on the State Department site).]

[Update #5: Jack Balkin links to sheet music for yet another Yiddish version of the anthem, a 1943 translation by Dr. Abraham Asen:

O'zog, kenstu sehn, wen bagin licht dervacht,
Vos mir hoben bagrist in farnachtigen glihen?
Die shtreifen un shtern, durch shreklicher nacht,
Oif festung zich hoiben galant un zich tsein?
Yeder blitz fun rocket, yeder knal fun kanon,
Hot bawizen durch nacht: az mir halten die Fohn!
O, zog, tzi der "Star Spangled Banner" flatert in roim,
Ueber land fun die freie, fun brave die heim!

And how about Polish?

Gwiaździsty Sztandar

O, powiedz, czy widzisz,
w świetle wczesnego świtu
to, co tak dumnie pozdrawialiśmy
w ostatnim migotaniu zmierzchu?
Czyje pasy i błyszczące gwiazdy
podczas strasznej walki
ponad wałami obronnymi obserwowaliśmy
jak wspaniale powiewały?
A czerwień oślepiających rakiet,
bomby wybuchające w powietrzu
dały dowód przez noc,
że nasza flaga wciąż była.
Refren:
O, powiedz czy gwiaździsta
flaga jeszcze powiewa
nad krajem wolnych
i ojczyzną odważnych? ]

[Update #6: Christopher Shea of the Boston Globe provides further background on many of the above translations here.]

Posted by Benjamin Zimmer at 11:05 AM

Bird (syntax) flu

According to an article by Yreka Bakery in the April 2006 issue of the Speculative Grammarian,

"An apparently new speech disorder a linguistics department our correspondent visited was affected by has appeared. Those affected our correspondent a local grad student called could hardly understand apparently still speak fluently. The cause experts the LSA sent investigate remains elusive. Frighteningly, linguists linguists linguists sent examined are highly contagious. Physicians neurologists psychologists other linguists called for help called for help called for help didn’t help either. The disorder experts reporters SpecGram sent consulted investigated apparently is a case of pathological center embedding."

There are unconfirmed reports that the ACL is working on a vaccine.

[See David Beaver's post yesterday for more on the etiology of this disorder.]

Posted by Mark Liberman at 12:18 AM

April 28, 2006

Nationalism in all its star-spangled glory

Today on NPR's Day to Day, and later on All Things Considered, there were short segments about a new Spanish-language version of the national anthem, called Nuestro Himno ("Our Anthem"), which "is getting huge airplay on Spanish-language radio stations across the nation ahead of pro-immigration rallies slated for Monday, May 1." You can hear these NPR segments, and a full version of Nuestro Himno, by following these links. The Spanish lyrics also appear there, followed by English re-translations, but this Wikipedia article appears to be more accurate.

I was struck by something toward the end of the Day to Day segment (italicized emphasis reflects speaker emphasis on that word):

At a news conference this morning at the White House, the president was asked whether the anthem should be sung in Spanish; the president responded, "I think the national anthem ought to be sung in English."

In the original broadcast I heard, there was audio of President Bush saying these words; I'm curious as to why the audio was replaced in the online version by a quotation spoken by Day to Day co-host Madeleine Brand. The audio of President Bush saying these words begins at the one-minute mark in the online All Things Considered segment, followed by what he said immediately afterward: "I think people who want to be a citizen of this country ought to learn English. And they ought to learn to sing the anthem in English."

This latter bit is also quoted directly in this NYT story (emphasis added):

President Bush said today that he thought the national anthem should be sung in English, not the Spanish language version released by a recording company recently. [...] After saying he did not consider the anthem sung in Spanish to have the same value as the anthem sung in English, Mr. Bush said: "I think people who want to be a citizen of this country ought to learn English. And they ought to learn to sing the anthem in English."

I was curious about what "the same value" refers to here, so I went to www.whitehouse.gov and found this release (emphasis added):

Q [from "Kelly"] Mr. President, a cultural question for you. There is a version of the National Anthem in Spanish now. Do you believe it will hold the same value if sung in Spanish as in English?

THE PRESIDENT: No I don't, because I think the National Anthem ought to be sung in English. And I think people who want to be a citizen of this country ought to learn English, and they ought to learn to sing the National Anthem in English.

[ Side note: recall what Madeleine Brand said on Day to Day: the president was asked whether the anthem should be sung in Spanish. Ask yourself: is that what "Kelly" asked? But I digress. ]

So the reason Bush believes the anthem won't "hold the same value if sung in Spanish as in English" is because he thinks it "ought to be sung in English"? I've been unimpressed by Bush's reasoning skills before, but ...

Luckily, there are folks out there who are more forthcoming about their reasons for believing that the anthem should (only) be sung in English. Take George Key from Southern California, who was interviewed for the Day to Day segment (Note the color code below: as above, co-host Madeleine Brand is in green, and guest George Key is in red.)

I think that's a terrible thing, that is awful. My thoughts are they should go someplace else and sing it.

George Key, great-great-grandson of (yes, you guessed it) Francis Scott Key, the man who wrote the original poem that eventually became the lyrics of The Star-Spangled Banner. George Key is half-Panamanian, but he cannot believe anyone singing the national anthem in Spanish could possibly understand the true meaning behind the song.

There was a man standing out on a ship watching the city of Baltimore being bombarded by the British at the time. [...] Had we lost that part of the war, we would be British subjects today. It was the second revolutionary war. And so for somebody to come in here now, who doesn't understand the concept of why that was written and the hardships that were endured by these people -- they just don't understand what they're doing.

You can read more about the story behind the national anthem here. Many of us with an American education may recall having learned (some of) this history, but would any of us have figured much of it out just from the lyrics -- especially given the fact that most if not all of us only learn and sing the first of the full four stanzas? (And if so, why would any of us English speakers need to learn the history in addition to the lyrics?)

Furthermore, on what basis does George Key believe that the Spanish lyrics don't tell the same story that the English lyrics tell? The translation (of the first stanza) is not perfect, of course; important factors such as fitting the lyrics to the same music, for example, accounts for some key differences. But here is the translation of the Spanish version back into English alongside the original English version. Can anyone honestly say that one version says more about the British bombing of Baltimore than the other?

Re-translation of Spanish version

Original English version

Do you see arising, by the light of the dawn,
That which we hailed so much when the night fell?
Its stars, its stripes yesterday streamed
In the fierce combat, as a sign of victory,
The glory of battle, in step with freedom,
Throughout the night they proclaimed: "It is defending itself!"
Oh say you! Is it still waving, beautiful, star-covered,
Over the land of the free, the sacred flag?
Its stars, its stripes, liberty, we are the same.
We are brothers in our anthem.
In the fierce combat, as a sign of victory,
In the fierce combat... (My people, keep fighting!)
...in step with freedom, (Now is the time to break the chains!)
Throughout the night they proclaimed: "It is defending itself!"
Oh say you! Is it still waving, beautiful, star-covered,
Over the land of the free, the sacred flag?

O say, can you see, by the dawn's early light,
What so proudly we hailed at the twilight's last gleaming?
Whose broad stripes and bright stars, through the perilous fight,
O'er the ramparts we watched, were so gallantly streaming!
And the rockets' red glare, the bombs bursting in air,
Gave proof through the night that our flag was still there:
O say, does that star-spangled banner yet wave
O'er the land of the free and the home of the brave?

[ Admittedly, the translation back into English has some significant problems, a taste of which is noted in the text above the re-translation here; I'd like to add that "¡Se va defendiendo!" would be better translated as "It is being defended!" rather than "It is defending itself!", but again, I needlessly digress. ]

I can only conclude from all this that learning and singing the Spanish version of the anthem, in and of itself, will not make anyone more ignorant of the history behind the anthem than those who learn the English lyrics. In fact, I daresay that encouraging folks to learn the Spanish version is likely to make many more of them curious about the history of the anthem (and of the country). How in the world can that be a bad thing?

[ I can also conclude that George Key appears to have some major issues with immigrants, or maybe with his own half-Panamanian-ness. But enough with the digressions. ]

Instead of making this sort of point, though, Day to Day co-host Alex Chadwick turns the tables on "Americans":

A recent Harris survey did show that two out of three Americans don't know the words to The Star-Spangled Banner [...]

The bold italics on Americans above reflects Chadwick's emphasis on this word -- but was it meant to suggest that these were English-speaking Americans who were surveyed, or (gulp) legal Americans, or what? I don't think the survey said anything about this issue; see this ABC News piece on the Harris survey, from almost two years ago, which leads us to The National Anthem Project website (of which "First lady Laura Bush has now become honorary chairwoman"), which has more information about the survey and a very abbreviated version of the history behind the anthem (including links to the anthem code and sheet music for the service version, the mariachi version, and the steel drum ensemble version -- how cool is that?)

In any event, if you just juxtapose this "result" from the Harris survey with George Key's pronouncement, all you get is that we all really just need to learn the English version of the anthem better. But this is wrong, wrong, wrong, as I pointed out above: the (original) English version is not particularly more informative about American history (or anything else of national/patriotic significance) than the Spanish version. The only problem seems to be that it's in Spanish, and that bugs some people -- and I'll never understand why.

With typical timeliness, the Wikipedia article on The Star-Spangled Banner that I have linked to several times above also includes the Spanish version of the first stanza (click here and scroll down) plus a little note under "Other" that says:

A Spanish language translation called "Nuestro Himno" ("Our Anthem") was created in 2006 as a show of support to Latino immigrants in the United States. Similar to the English version of the Canadian national anthem, which was set to the tune of the French version but is not related to the text thereof, this song or himno is merely inspired by and is only an approximate not a word by word translation of the stanzas selected from Key's poem. The lyrics are written above. As such no claim is made that it is the Spanish language version of the United States' national anthem which itself technically is only a part of Key's full poem.

(This is where I found the link to the Nuestro Himno Wikipedia article, also linked a few times above.)

Most interesting and relevant, though, is that there's a Spanish-language Wikipedia article on The Star-Spangled Banner, which has a briefer version of the history behind it (still, more comprehensive than the one at The National Anthem Project), with two full translations (La bandera estrellada and La bandera de estrellas), the latter dated 1919 by Francis Haffkine Snow -- see this Library of Congress entry, where it says that "[t]his version of the song was prepared by the U.S. Bureau of Education". The lyrics of Nuestro Himno appears to be derived from this 1919 translation; it's generally similar, though some lyrics appear to have been "altered to soften war references".

That last quote is from this ABC News article, with the offensive title "Spanish 'Star Spangled Banner' -- Touting the American Dream or Offensive Rewrite?" (This USA Today article title is better: "Spanish 'Banner' draws protest".) The author of the ABC News article (Jim Avila) seems to be unaware of the 1919 translation, and uses an even worse re-translation than I've found of Nuestro Himno into English to compare with the "classic English version" (which, oddly enough, also has errors here):

The current version will likely spark debate, because it is not an exact translation. Some of the classic lyrics have been changed for rhyming reasons while other phrases were altered to soften war references. For example:

English version: And the rockets red glare, bombs bursting in air gave proof through the night that our flag was still there.

Spanish version: In the fierce combat, the sign of victory, the flame of battle in step with liberty through the night it was said it was being defended.

There are several other small errors in Avila's article, two of which are worth noting here. One is the name "Jimi Hendrix" being spelled "Jimmy Hendrix"; Jimi's infamous solo Stratocaster rendition of the national anthem at Woodstock was also brought up in both NPR segments, by way of making the point (somewhat weakly, in my view) that there is at least some artistic merit to so-called "corruptions" of the anthem. The other is that "George Key" is identified as "Charles Key", who is quoted as saying:

"I think its a despicable thing that someone is going into our society from another country and ... changing our national anthem," Key said.

That sure sounds like George to me. But just to be sure, I googled -- and found that indeed there is another descendant of Francis Scott Key named Charles. But I doubt that Charles was quoted above, because the key search result, a January 30, 2005 Seattle Post Intelligencer article, says:

Charles Key, a 56-year-old Vietnam veteran from Bellingham, whose ancestor Francis Scott Key wrote the words of the U.S. national anthem, "The Star-Spangled Banner," says he's leaving because his country is no longer tolerant. "The land of the free and the home of the brave always meant to me that America was supposed to stand for freedom and diversity and tolerance. I don't think it does that any more," he told a reporter.

By contrast, George's next most recent news-worthy activity appears to have been, in 1995 at the age of 71, a push to save the Pledge of Allegiance in Orange County schools. (Speaking of which: more from George Key, and yet another poor re-translation of Nuestro Himno, can be found at the OC Register.)

Final note: in the sidebar of the ABC News article, there's a link saying: "VOTE Spanish-Language National Anthem: O.K.?" As of this writing, here's what it says. (Note in particular the form of the answers.)

[ Comments? ]

Posted by Eric Bakovic at 11:55 PM

Separating species with bullets

One version of the AP starling story, attributed to Seth Borenstein, ends with a quote from Jeff Elman:

What the experiment shows is that language and animal cognition is a lot more complicated than scientists once thought and that there is no "single magic bullet" that separates man from beast, said Jeffrey Elman, a professor of cognitive science at UCSD, who was not part of the Gentner research team.

If I weren't generally so skeptical of the accuracy of journalists' quotes, I'd tease Jeff for producing a self-refuting mixed metaphor. Surely a pretty reliable way to differentiate between human and beast, in cross-species encounters, is to ask who's using a weapon to kill whom? At least, this is a criterion with high positive predictive value though much lower sensitivity ...

(Mixed metaphor alert by email from Margaret Marks at Transblawg)

[Update: Ben Zimmer points out that the same AP story suggests that Seth Borenstein has a special talent for eliciting startling metaphors from cognitive scientists:

But starlings may be more apt vocalizers and have a better grasp of language than non-human primates. Monkeys may be trapped like Franz Kafka's Gregor Samsa, a man metamorphosized into a bug and unable to communicate with the outside world, Hauser suggested.

In the words of Heidi Harley, "What??"

Actually, I think I understand what Marc Hauser might be getting at, if he was quoted accurately. Perhaps he's staking out a position diametrically opposite to Wilhelm von Humboldt, who wrote that "The articulated sound, the foundation and essence of all speech, is extorted by man from his physical organs through an impulse of his soul; and the animal would be able to do likewise, if it were animated by the same urge."

Humboldt's idea was, I think, that the urge to communicate -- to act so as to affect others' knowledge and belief -- is the key thing, with the adaptations of the vocal organs and of the perceptual and motor-control systems being secondary consequences of the (initially inexpert and faltering) practice of communicative action. Hauser seems to be suggesting that monkeys have the urge, but evolution has somehow played them false, so that differential effectiveness of communicative action has not been able to act as a selective force.

Then again, maybe he's just licensing anthropomorphism with respect to monkeys. ]

Posted by Mark Liberman at 06:22 AM

Starlings linguists language loggers readers follow commented on the work of studied are damn smart!

As Mark just reported, it's difficult to know what conclusions we should draw from recursive starlings. The obvious conclusion is just that starlings are smart. Yup, and we humans are pretty smart too. We can do all sorts of tricky recursion. Center embedding, mind you, that's a problem. It normally gets covered in Linguistics 101 under the heading of performance versus competence, or language processing, or psycholinguistics or some such, and the basic point is that certain recursive structures apparently tax our processing abilities to the extent that only a theoretical syntactician could label them anything but ungrammatical.

In case you didn't quite figure out how the the quadruple center embedding at the top of this entry could possibly mean anything at all, here is how it's built up:

Starlings are damn smart!
Starlings linguists studied are damn smart!
Starlings linguists language loggers commented on the work of studied are damn smart!
Starlings linguists language loggers readers follow commented on the work of studied are damn smart!

If your brain is anything like mine, you probably find the third sentence in the sequence gently gliding over a cliff of realtime incomprehensibility, despite it being possible to reconstruct logically what it would have to mean. The fourth can only be understood by drawing mental lines between subjects and predicates and extending to the author of the sentence a deep trust that normally you'd reserve for someone with whom you were hopelessly in love. (By the way, see New speech disorder linguists contracted discovered! for further embedding inspiration.)

Faced with the facts about starlings' innate ability to learn Dyck languages, and with the facts about center embeddings for you and me, a contrarian might well conclude that yes, at last, we have firm and amazing evidence for a biologically unique language module. The trouble is, starlings have it, and we don't.

Posted by David Beaver at 04:08 AM

April 27, 2006

Starlings

There's been a lot of discussion recently in the popular press and in the blogosphere about Timothy Q. Gentner, Kimberly M. Fenn, Daniel Margoliash & Howard C. Nusbaum, "Recursive syntactic pattern learning by songbirds", Nature, 27 April 2006.) (Also see Gary Marcus, "Startling Starlings", Nature, 27 April 2006.)

What's said to be at issue here is whether "European starlings (Sturnus vulgaris) accurately recognize acoustic patterns defined by a recursive, self-embedding, context-free grammar." The background for this question is a hypothesis about human linguistic abilities, which Gentner et al. describe this way:

Humans regularly produce new utterances that are understood by other members of the same language community. Linguistic theories account for this ability through the use of syntactic rules (or generative grammars) that describe the acceptable structure of utterances. The recursive, hierarchical embedding of language units (for example, words or phrases within shorter sentences) that is part of the ability to construct new utterances minimally requires a ‘context-free’ grammar that is more complex than the ‘finite-state’ grammars thought sufficient to specify the structure of all non-human communication signals. Recent hypotheses make the central claim that the capacity for syntactic recursion forms the computational core of a uniquely human language faculty.

Specifically, Gentner et al. are challenging the interpretation of a paper by Tecumseh Fitch and Marc Hauser, "Computatational Constraints on Syntactic Processing in a Nonhuman Primate", Science, January 16, 2004 (discussed in Language Log here). Fitch and Hauser claimed to show that humans are able to handle a kind of grammar called context-free, whereas cotton-top tamarins can't do this, but can only handle finite-state grammars. In previous posts, I've been highly skeptical of Fitch and Hauser's interpretation of their results:

Hi Lo, Hi Lo, it's off to formal language theory we go (1/17/2004)
Humans context-free, monkeys finite-state? Apparently not. (8/31/2004)
Homo hemingwayensis (01/09/2005)
Rhyme schemes, texture discrimination and monkey syntax (02/09/2006)
Learnable and unlearnable patterns -- of what? (02/25/2006)

So I'm not surprised to learn that Gentner et al. were able to teach starlings the kind of patterns that Fitch and Hauser failed to teach monkeys. However, I don't think that Gentner's success tells us any more about grammatical abilities (or the lack of them) than Fitch and Hauser's failure did.

That's not because I think that these experiments were badly executed, or that their results are not interesting. There are two key problems, in my opinion.

First, we're asked to evaluate the claim that a creature does or doesn't have the ability to process types of sequences (conceptually) requiring an unbounded number of states, on the basis of experiments that deal with a small number of short sequences. Any competent computer science undergraduate can set up a finite automaton that can process the sequences used in these papers; and if she's taken a machine learning course, she should be able to set up several sorts of models that can learn the distinctions in question, without being able to deal with general context-free or "embedding" languages at all. (See here and here for some discussions of one way to approach this.)

And second, if we make a serious attempt to investigate what it would be like to have the ability to process general context-free languages, even in the case of fairly short strings, we will quickly find that humans fail the test, at least if we approach the problem in the way that Fitch, Hauser, Genter et al. have done.

I'll try to explain and exemplify this in a minute, but first let me temper the generally skeptical tone of this and previous posts on the subject. I do believe that (most?) human languages genuinely involve recursive embedding; and I think it's quite possible that some types of birdsong, like some other sorts of animal activity, also involve embedding of "plans within plans", though perhaps not recursion in the strict sense of some type of unit forming part of another unit of the same type. I also think that experiments like these are well worth doing, and are likely to lead to some real insights about biological pattern processing.

However, I remain skeptical that such experiments are telling us which animals can process what type of grammar. To see why, let's start with the most basic and simplest context-free language, the Dyck language. As the Wikipedia explains:

In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language (Dyck being pronounced "deek") is the language consisting of those balanced strings of parentheses [ and ].

Sentences of this Dyck language include [], [][], [[]], [[][]], and so on.

Trivially, no strings starting with ] or ending with [ are in the Dyck language. And slightly less trivially, the Dyck language excludes strings like []] and [[] in which the number of [ and ] aren't equal.

Finally, strings with equal numbers of opens and closes are excluded if (as you scan from left to right through the string) a ] ever occurs when there is no earlier unmatched [. Another way to think about this is that as you scan from left to right, you add 1 to a total whenever you see [, and subtract 1 whenever you see ]. If the total is always non-negative as you scan through the string, and is zero at the end, then the string is in this Dyck language. Otherwise it isn't.

Actually, the situation is just a little more complicated: in the general case, a Dyck language can have more than one pair of matching types of parentheses. You could represent these as being like ( ), { }, < >, etc., but the most general way to think of them as parentheses or brackets indexed by integers, e.g. [₁ ]₁ , [₂ ]₂, [₃ ]₃, etc. Each type of parentheses in a Dyck language needs to match up pairwise in just the same way that a single type of parentheses does.

Now, Dyck languages are a simple and obvious case of string-sets that can't be handled by a "finite automaton". The "non-finite" aspect here is that you might need to count up an arbitrary number of left parens before counting down the same number of right parens. You can try to handle that by setting up a state that means "I need to find one right paren", and another state that means "OK, now I need to find two right parens", and another state that means "now we need three right parens", and so on. That will work as long as the input never stacks up more left parens than than the number of states you've set up -- but the number of left parens that might occur in the input is not bounded. (Though to handle a Dyck language with one kind of parenthesis and strings no longer than six, you'd only need three states, etc...)

In fact, Dyck languages have a special relationship to the general class of context-free languages. I'll remind the techies in the audience that according to the Chomsky-Schützenberg theorem, every context-free language can be represented as a homomorphism of the intersection of a regular language and a Dyck language. For everybody else, let's just say that any critter that can handle context-free languages in general ought to be able to deal with Dyck languages.

But anyone who has ever written a computer program knows that it's not trivial for humans to keep track of balancing parentheses. Quick, tell me whether the parentheses in this expression balance or not:

(*p == c && (prop1(*(p+1)) || prop2(*(p+1))))

It's hard to tell -- and that's why many text editors for programmers flash the corresponding left parenthesis when you type a right parenthesis, or offer other forms of paren-tracking help.

It's no easier for humans to keep track of a Dyck language in acoustic rather than textual form. To illustrate this, I've written a little program that will map parenthesis-language strings onto sequences of two pitches -- e and c, if you're keeping track musically. Consider the higher pitch to represent an open parenthesis, and the lower pitch to represent a close parenthesis. Then you can listen to a sequence of such strings, and ask yourself whether each string is in the Dyck language or not.

Go ahead, try it:

Your browser doesn't support EMBED, so listen to the example by <a href="http://languagelog.ldc.upenn.edu/myl/ldc/foo1.mid">clicking here.</a>

Let me make it easier for you. The sequence above either (a) starts out with a sequence of Dyck-language strings, and ends with a sequence of non-Dyck-language strings; or (b) starts with a sequence of non-Dyck strings, and ends with a sequence of Dyck strings. So all you need to do is to get used to whatever the pattern is when the music starts, and raise your hand when the grammar changes :-).

Consider yourself a starling, or a cotton-top tamarin, and try it again.

If you're like every other human I've tested, you find this task pretty hard. If you're quick and you care enough, you can count on your mental fingers, so to speak, adding one for each higher pitch and subtracting one for each lower pitch, and "parse" the sequence that way. But the difference between the Dyck and non-Dyck patterns is not, in the general case, cognitively salient to humans without intellectual scrutiny. In contrast, we don't need to count on our mental fingers to understand the structures of real spoken language. And I'd be astounded if the difference between Dyck and non-Dyck strings is cognitively salient to starlings or to any other animals either.

That doesn't mean that human languages shouldn't analyzed in terms of something like context-free grammars. Nor does it tell us whether starlings' songs should be so analyzed. But I think it indicates that the ability to learn to discriminate auditory patterns generated by different sorts of grammars is probably not a reliable indication of what sorts of grammatical generation and analysis animals employ in natural, ecologically valid activities. At least, if this type of pattern-discrimination is the criterion, then humans can't handle context-free grammars either.

FYI, the "score" for the little musical interlude linked above is here:

[[[]]]][[]]
[][[[]][]]
[[[]]]]]][]
[][[[][[[[[]
[[[[[]][[[]
[[[[][[[]]
[[[[[]]]]]
[[]]][][[[]
[[]][[[][]
[[]][[[]]]]
[]][[]]]]]]
[]]][][]][[][[][[[]
[][]][[][]][]
[][[[][[[]]]]][]
[]]]]][[][]]
[][][[][]]
[[]]]][]][][][[]]]
[[[[[]][[][]
[]][]]]][[]][]]]][[][][[]
[][[][][]]
[[]]]]][]]]
[][[][[]][]]
[[[][][]][]]
[][[]][]
[][[][][[]][[][]][]]
[[][[]]]
[][[]][[]][]
[[[][][]]]
[[]][[][]]
[[[][][][]]][]
[][][][][]
[][][[][]]
[[][]][[]]
[[[[[][[]]]]]]

I created the midi file using the terrific free software program keykit, using a couple of trivial little programs to generate random Dyck strings and random finite-state strings over the same alphabet, and to map them to keykit inputs. If anyone wants the programs to make stimuli for some real experiments, let me know. But I'm willing to place a bet in advance on how the experiments will come out...

[Some earlier Language Log posts on related topics:

Language in Humans and Monkeys (01/16/2004)
Hi Lo, Hi Lo, it's off to formal language theory we go (1/17/2004)
Cotton-top tamarins: on the road to phonology as well as syntax? (02/09/2004)
Humans context-free, monkeys finite-state? Apparently not. (8/31/2004)
Homo hemingwayensis (01/09/2005)
JP versus FHC+CHF versus PJ versus HCF(08/25/2005)
Rhyme schemes, texture discrimination and monkey syntax (02/09/2006)
Learnable and unlearnable patterns -- of what? (02/25/2006) ]

[Update: Noam Chomsky is quoted here as asserting that "[t]he [Gentner] article is based on an elementary mathematical error" and that "[i]t has nothing remotely to do with language; probably just with short-term memory". It's not clear why he didn't offer similar views on the Fitch and Hauser paper; perhaps no one asked him at the time, or perhaps he liked its conclusions better. I guess it's conceivable that he thinks that Gentner et al. committed "an elementary mathematical error" while Fitch and Hauser didn't, and that Gentner et al. are just studying short-term memory while Fitch and Hauser were studying language learning; but it's hard for me to see how he could hold that view.]

Posted by Mark Liberman at 10:24 PM

Around the water cooler at Language Log Plaza

Dramatis personae: Ben Zimmer, Bill Poser, David Beaver.

BZ:	Anyone familiar with Tiago Tresoldi of Brazil, who's posting Portuguese translations of LL posts on his blog? A quick search finds that he's a developer of an open-source MT program called Traduki.
BP:	Cool. I write Portuguese better than I thought.
DB:	Perhaps that's the original blog, and we're doing the translation?
BP:	Could be. Depends on whether time runs forward or backwards.
DB:	Ahh, a prime example of reducing a hard problem to a much harder one.
BP:	That's how science is done.

[To clarify the context and interpretation for readers who might be confused, this virtual conversation is an exact transcription of a series of emails sent by the people to whom the quotes are attributed; and we're all very pleased to find that Tiago Tresoldi is translating some of our posts into Portuguese on his blog. I mention this last point because Tiago at first misinterpreted this post to mean that we were upset. This is a good example of how affect can be misunderstood in virtual conversations, especially when irony is involved. (Though I suspect that similar misunderstandings take place in real life much more often than is commonly realized -- it's just that we don't scrutinize recordings of our interactions, and so we less often come to be aware that such misunderstandings have happened.)]

Posted by Mark Liberman at 06:19 AM

April 26, 2006

The race to the bottom in science reporting

According to a recent LL post by Ben Zimmer, Lance Nathan feels that the recent "English Language Hits 1 Billion Words" headlines represent "new lows in linguistic reporting". Much as I respect Lance, I have to disagree. Having hit bottom, the folks at the Associated Press broke out the heavy excavation equipment and kept right on digging to come up with this:

While many animals can roar, sing, grunt or otherwise make noise, linguists have contended for years that the key to distinguishing language skills goes back to our elementary school teachers and basic grammar.

Sentences that contain an explanatory clause are something that humans can recognize, but not animals, researchers figured.

Two years ago, a top research team tried to get tamarin monkeys to recognize such phrasing, but they failed. The results were seen as upholding famed linguist Noam Chomsky's theory that "recursive grammar" is uniquely human and key to the facility to acquire language.

But after training, nine out of Gentner's 11 songbirds picked out the bird song with inserted warbling or rattling bird phrases about 90 percent of the time. Two continued to flunk grammar.

More later on the science behind those immortal phrases. (And really, I'm being somewhat unfair to the AP's science writers, as you'll see when we go over the press releases they were working from.)

[There's an excellent summary of the experiments, and some discussion of what they might mean, over at TstT. And some interesting discussion by Chris at Mixing Memory.]

[OK, you can read my promised (serious and far too long) commentary here (but it has music!). David Beaver has a shorter comment, from a slightly different perspective, here.

Why did the quoted AP story bother me so much? My main objection is the notion that this experiment has anything to do with "sentences that contain an explanatory clause". Even if we take Gentner's interpretation at face value, this asserts a kind of semantic content that the experimental materials clearly lacked. Secondarily, it's not a helpful or accurate account of what happened in the experiment to say that the starlings "picked out the bird song with inserted warbling or rattling bird phrases".

It's fair to respond that if I'm going to complain about the AP story, I ought to give an example of how to write about this work in a newswire-story style and at newswire-story length. I certainly haven't done this, since my discussion of the paper is way too long, way too technical and way too skeptical for a news story. Well, as an old colleague used to say, if I only had more time I could write less -- and perhaps write more simply and more charitably, too.]

Posted by Mark Liberman at 10:02 PM

A million words here, a billion words there...

It looks like 2006 is going to be a banner year for misinformed reporting on the English language. Numerous journalists have already swallowed the absurdly specious claim that the English language is going to add its millionth word some time later this year. But doesn't "one million" sound a little paltry? Well, never fear. Today the Associated Press trumpets even bigger news:

"English Language Hits 1 Billion Words"

Do I hear a trillion?

That's how the headline reads on Yahoo! News, but you can find identical headlines on the websites for the Washington Post, the Los Angeles Times, the Boston Globe, the San Francisco Chronicle, Newsday, CBS News, ABC News, Fox News, and dozens of other media outlets. We've already seen ample evidence that news organizations relying on the AP wire very often reproduce the headlines provided to them in an entirely uncritical fashion. But this is not a case of circulating a grammatically questionable construction like "Skilling Calls He and Lay 'A Good Team.'" Here we have editors around the country blithely accepting a laughable assertion.

The article itself is relatively straightforward, belying the ridiculous headline:

A massive language research database responsible for bringing words such as "podcast" and "celebutante" to the pages of the Oxford dictionaries has officially hit a total of 1 billion words, researchers said Wednesday.

Drawing on sources such as weblogs, chatrooms, newspapers, magazines and fiction, the Oxford English Corpus spots emerging trends in language usage to help guide lexicographers when composing the most recent editions of dictionaries.

The press publishes the Oxford English Dictionary, considered the most comprehensive dictionary of the language, which in its most recent August 2005 edition added words such as "supersize," "wiki" and "retail politics" to its pages.

Oxford University Press lexicographer Catherine Soanes said the database is not a collection of 1 billion different words, but of sentences and other examples of the usage and spelling.

So there you have it: it's a lexicographical corpus of texts that has hit a billion words, and like any corpus it contains lots and lots of duplicated lexical items. How unobservant does a headline-writer or copy editor have to be to construe this to mean that the "English language" has hit a billion words? Apparently the good people at Oxford have a corpus that encompasses the entire language! Pretty darn impressive.

Kudos to those news outlets that recognized the AP headline as bunk and provided their own, though they're few and far between:

"Wordy? Dictionary database hits 1 billion mark" (MSNBC/Newsweek)
"Oxford database reaches 1 billion words" (CNews)
"English language database reaches 1 billion words" (AZCentral)
"Oxford English Corpus database of 21st century usage reaches 1 billion words" (San Diego Union-Tribune)

(A tip of the hat to Lance Nathan, who observes that the outrageously inflated headlines paradoxically represent "new lows in linguistic reporting.")

Posted by Benjamin Zimmer at 09:28 PM

Beats Workin'

Now that Tony Snow has officially been announced as the new White House Press secretary, I want to remind our readers that Language Log was ahead of the curve in recognizing his rhetorical accomplishments. But the news coverage has clued me in to something that I didn't know about, namely the web site of Snow's rock band Beats Workin'. Though there aren't any sound samples on the site, I doubt that Snow's band can compete with the bad-boy ghost of Lee Atwater...

Posted by Mark Liberman at 12:33 PM

Accidental spelling at Google; Mary Matalin speaks unfortuitously

Searching on Google a few days ago for unfortuitously -- more below on why I was doing this -- I was startled to get (along with ca. 213 webhits) a query from Google asking if I meant unfortuantely. Yes, unfortuantely. My interest piqued, I googled on unfortuantely (I can't tell you how hard it is to type this word) and got, wow, ca. 486,000 hits. AND NO QUERY if I meant unfortunately. All I can say is that this is an unfortuante situation, especially the lack of querying on unfortuantely. (Unfortunately, by the way, gets hundreds of millions of hits, and, whew, no query.)

Why was I googling on unfortuitously, you ask. Because Johannes Fabian had pointed out to me that Mary Matalin had used the word in an NBC morning news interview on 4/20/06 and he couldn't figure out what she meant by it. Here's the exchange, as retrieved by John Baker on ADS-L on 4/21/06:

COURIC: And this shift, Mary, can--can people conclude from this shift that--that the White House is very worried about the upcoming midterm elections and about the Republicans losing control?

Ms. MATALIN: Well, the White House and the Hill is conscious of their reality. This is a very polarized country right now. There are a number of seats that are unfortuitously competitive because of retirements. There's--the Democrats have--have done a good job in recruiting. They have not done a good job in preparing any sort of policies or an agenda. They don't have any vision. So what this comes down to in the fall, as in all elections, are a choice--and we have to make our--the choice of voting for us very clear and the catastrophic consequences of voting for a Democrat.

(Try not to focus on the glitches in the speech of someone speaking both passionately and off the cuff. Focus on "unfortuitously competitive".)

MWDEU has a fairly long article on fortuitous and its development from the meaning 'by accident, by chance' to 'by fortunate accident, by lucky chance' (the meaning that Baker reports as his own) all the way to 'fortunate, lucky' (a usage that people have been complaining about from Fowler's time on, possibly because it is so widespread). Also a shorter article on fortuitously. Given this background, unfortuitous ought to mean either 'not by accident' (this would be the historically defensible usage), 'not by fortunate accident' (which Baker suggests would refer to something that is both unfortunate and not by chance), or simply 'unfortunate'. And unfortuitously would be the adverbial version of this.

Unfortuantely, by the time we get to unfortuitously, the historical meaning seems to have vanished from the web; not one of the Google webhits has the word clearly being used to mean 'not by accident'. In fact, they can all be seen as merely conveying 'unfortunately'. As for Mary Matalin's use, that's my best guess now, though she might be understood as saying that the Democrats' success at recruiting candidates for the seats vacated by retirements is what makes those contests competitive, so that the competitiveness results not from the accident of retirements but from the intentional acts of the Democrats. If so, then I'd take her to be conveying the 'both unfortunately and not by chance' meaning.

But you see how hard it can be to tease the meanings apart. And Matalin herself is unlikely to be able to reconstruct what she intended to convey in the heat of the interview moment.

Meanwhile, Fabian (whose native language is not English, a fact that causes him to think more about the details of the language than your average anthropologist would) observed to me that he would have expected the negated version of fortuitously to be nonfortuitously, not unfortuitously. As it happens, there's only one legitimate example of nonfortuitously on the web (in a legal decision, where it does, however, mean 'not by chance'), so his expectation is not borne out on the web. There is a possibly subtle point here, though: my first impulse would be to read nonfortuitously as 'not in a manner that is fortuitous' and unfortuitously as 'in a manner that is not fortuitous', though I find the two scopings remarkably hard to distinguish in the real world, and in any case the more I think about it, the more I think that if there is a distinction in semantics here, both words can convey both meanings.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:16 AM

Who is the decider?

OK, it's time for our occasional over-interpretation of a cartoon. This one was submitted by Pekka Karjalainen to the Fellowship of the Predicative Adjunct:

Zeno at Halfway There interprets this cartoon in terms of layers of irony:

In his Bizarro cartoon panel for April 22, 2006, Piraro takes a worthwhile shot at couples who inflict their self-written wedding vows on their guests. ... [T]ake a look at the minister's words: “And now, having each recited the vows they have written themselves, we all realize the importance of education.” Whoa, what a burn on the feckless bride and groom! However, ... what exactly is the referent for the participial clause “having each recited the vows they have written themselves”? It's obviously supposed to be the bride and groom, but the actual subject of the sentence is “we,” the spectators and the celebrant at the wedding. Piraro has committed misplaced modification and left a participial phrase dangling.

Zeno thinks that Dan Piraro has committed this error unwittingly, and thus fallen victim to "the Iron Law of Nitpicking", which Zeno expresses this way:

You are never more likely to make a grammatical error than when correcting someone else's grammar.

I'm not so sure.

I can read the cartoon in one of two ways:

1. Piraro imagines that the minister commits this breach of grammatical etiquette unwittingly, while sincerely praising the couple's compositions; or
2. Piraro imagines that the minister is using the dangling adjunct on purpose, as an echoic criticism of the couple's home-made vows.

Under interpretation #1, the minister falls victim to Zeno's "Iron Law"; under interpretation #2, no one does. Under either interpretation, the cartoonist Piraro himself is innocent of linguistic fault. When I first read the cartoon, I interpreted it in mode #2; after some reflection, I switched to mode #1; I didn't think of Zeno's interpretation until I read his blog entry. What do you think?

Zeno ends with a plea for mercy:

In the present instance, it behooves me to keep this post short, thereby reducing the likelihood that I will commit some egregious error therein. When you, Dear Reader, find the inevitable faux pas, please try to be gentle as you denounce my sin in the comments.

Not having the temperment of a nitpicker, I'll just observe that Zeno has independently re-invented the Hartman/Skitt/McKean Law of Prescriptivist Retaliation, which itself was independently discovered by three different linguistic adventurers within a short span of time in 1999:

Jed Hartman ("Words & Stuff", April 20, 1999): "Any article or statement about correct grammar, punctuation, or spelling is bound to contain at least one eror".
Perchprism/Skitt: (alt.usage.english, April 26, 1999): "Any post correcting an error in another post will contain at least one error itself" or "The likelihood of an error in a post is directly proportional to the embarrassment it will cause the poster".
Erin McKean: (Verbatim Magazine, Summer 1999) "Any correction of the speech or writing of others will contain at least one grammatical, spelling, or typographical error."

And Jason Streed of Finches' Wings has observed that a closely-related concept was explored in 1909 by Ambrose Bierce:

In neither taste nor precision is any man's practice a court of last appeal, for writers all, both great and small, are habitual sinners against the light; and their accuser is cheerfully aware that his own work will supply (as in making this book it has supplied) many "awful examples" — his later work less abundantly, he hopes, than his earlier. He nevertheless believes that this does not disqualify him for showing by other instances than his own how not to write. The infallible teacher is still in the forest primeval, throwing seeds to the white blackbirds.

When it comes to language, none of us is the decider, and yet each of us is.

[Note: Zeno's reference to "the referent of the participial clause" is a non-standard use of the term "referent" -- usually that term would apply only to words and phrases that (are taken to) refer, such as nouns and pronouns. In this case, I think that most linguists would prefer to talk about "the subject of the clause" rather than the "referent of the clause". I wasn't going to say anything about this, since it's clear enough what Zeno means, and I'm not a nitpicking kind of person; and for that matter, as a mere phonetician I might have missed a new syntactic theory out there according to which participles refer to their subjects. But since several readers have written in to raise the point... ]

Posted by Mark Liberman at 06:43 AM

April 25, 2006

Arizona knows

What does it mean to KNOW something anyway? The US Supreme Court is currently trying to deal with this as it considers the appeal of Clark v. Arizona, No. 05-5966. The Arizona Court of Appeals affirmed the conviction of a 17 year-old named Eric M. Clark, who shot and killed a Flagstaff policeman in 2000 (see here). He was found incompetent to stand trial and after three years of treatment in a mental hospital, he could no longer have an insanity defense because of the odd statutes in the state of Arizona that bar the defense from using evidence of diminished capacity. This state only allows a defendant to plead "guilty except insane," apparently ignoring the fact that Clark was so mentally disturbed that he thought he was shooting a space alien. At trial he was found guilty of INTENTIONALLY killing a law officer and now that he has recovered his sanity, he has to pay for the crime that he didn't KNOW that he committed.

The M'Naghten rules, followed by most states, say that insane persons do not KNOW the nature and quality of the criminal act and that they don't KNOW that they are doing anything wrong. But these rules appear to be unimportant in Arizona. Now that Clark has been deemed sane, he still has to account for what he did six years ago, when he wasn't. The state's lawyer tried to explain this saying, "the state has discretion to define insanity as it sees fit." He went on to say that based on the evidence in the bench trial, Clark KNEW he was killing a police officer and that he PLANNED the crime in advance. When Chief Justice Roberts asked him about what was so different about barring mental illness as evidence of intent but not barring other evidence, such as failure to be able to understand English, the state's lawyer replied that this question is too complex to ask a judge to decide. It's hard to decide where to begin with this kind of reasoning.

So how do we KNOW that somebody KNOWS something? One might say, "Since I just washed my car, I KNOW it is going to rain," or "I suspect that it will rain," or "I figure it's going to rain." You can't really KNOW it though. When a person really KNOWS something, three steps seem to obtain:

1. One believes it to be true.

2. One has good reason to believe it to be true.

3. There is a substantial probability that it is true.

Everyone agreed that Clark was so deluded that he believed the officer was a space alien (1). A sane person would have good reason to believe it was a policeman (2) and that his uniform or something else would provide probability of this (3). But it's hard to understand how a mentally ill person who believed the victim was a space alien would even get to steps 2 and 3. When the police stopped his car because his loud radio was creating a nuissance, Clark claimed he was trying to drown out the voices in his head. Despite this, the court claimed that Clark INTENTIONALLY PLANNED the murder. Now we somehow add intentionality to knowing.

The court's admission that Clark was "guilty except insane" is tantamount to admitting that he was, indeed, insane. The intentions and plans of insane people are even more difficult to infer than the plans and intentions of sane people. Except in Arizona, where the courts seem to have managed to figure out how to KNOW what people KNOW, PLAN, and INTEND -- even insane people. Amazing. 48 other states do not treat insanity in this way. I wouldn't want Arizona to write my dictionary.

It looks like Arizona could use a large dose of sociolinguistic audience context. Clark clearly intended to shoot a space alien but he had no reason to believe he would have done this if he didn't have a twisted sense of reality. In his mind, he shot a space invader, not a policeman. Arizona could also do with a dose of term clarification. Maybe the state can define insanity how ever it wants, but that sounds a lot like Humpty Dumpty's way to define words: "words mean what I want them to mean, nothing else, nothing more." By judging Clark "guilty except insane" (itself a mind-boggling expression), Arizona falls into the same category of the four states that have totally abolished the insanity defense: Kansas, Utah, Idaho, and (regretfully) Montana. It would probably be better for Arizona to abolish the insanity defense than to pretend that it can KNOW what people are thinking, especially when they are insane.

Posted by Roger Shuy at 06:47 PM

In defense of Kaavya Viswanathan

I have to disagree with my friend Geoff Pullum's comments on the allegation of plagiarism against Kaavya Viswanathan, the Harvard undergrad whose novel How Opal Mehta Got Kissed, Got Wild, and Got a Life, contains passages similar to passages in Megan McCafferty's Sloppy Firsts. I don't think that Ms. McCafferty has a valid suit for copyright infringement, nor do I think that Ms. Viswanathan is guilty of plagiarism.

Let's dispose of copyright infringement first. Even if Ms. Viswanathan had copied 24 passages literally from Ms. McCafferty's work she would not be guilty of copyright infringement because those 24 passages constitute a tiny fraction, less than 1%, of both novels and there is no sense in which they are of any particular importance. That falls well within the amount of copying permitted by the Fair Use doctrine. I doubt very much that any court would decide in favor of Ms. McCafferty on this point. Moreover, Ms. McCafferty would receive no actual damages because Ms. Viswanathan's borrowing cannot be said to have reduced the market value of Ms. McCafferty's work. I am not a lawyer, much less Ms. McCafferty's lawyer, but I don't think she's got a case.

Turning now to plagiarism, I agree with Geoff that the similarities between certain passages in the two novels are very unlikely to be due to chance. I don't think that a monkey sat down one day at Ms. Viswanathan's computer and in the course of randomly pecking away at her keyboard typed them in. That means that they are not accidental., but there is a critical difference between not accidental and intentional. The decisive question with regard to the charge of plagiarism is whether Ms. Viswanathan knowingly copied from someone else's work. I submit that there is no good reason to believe that she did and moreover that it is highly implausible that she did.

Let's look first at plausibility. If you think that Ms. Viswanathan engaged in plagiarism, what exactly do you think that the scenario was? It is pretty clear what happens in cases of out-and-out plagiarism. For example, a junior faculty member, under pressure to publish before his or her tenure review, copies someone else's paper and submits it to another journal. The potential gain is clear: an additional career-advancing publication at the cost of a small fraction of the time it would take to write it. The potential cost is also great - loss of a job and possibly loss of a career - but the "author" reasons that the chance of detection is small enough to be worth the risk. Or, a student, lazy or out of time for reasons good or bad, copies someone else's term paper and submits it as his or her own. Here again the potential benefit is clear: a much better grade than if the student were to submit no paper or one hastily assembled. The potential cost is also fairly great - ranging from a failing grade to expulsion - but if the source of the plagiarized paper is well chosen the student may well consider the risk of detection to be small.

In such cases as these, the perpetrator clearly derives a substantial benefit at what he or she believes, often with reason, to be a small risk. Without meaning to justify their actions, one can easily see why a dishonest but rational person would engage in plagiarism. The question is, is there any plausible parallel scenario in which, if we assume, for the sake of argument, that Ms. Viswanathan is dishonest but of normal rationality, she would have plagiarized the passages in question?

I submit that there is not. On the one hand, unlike term papers and obscure academic journal articles, best-selling novels are read by a lot of people, and novels of the same sort are likely to be read by the same people. The risk of detection is therefore fairly high. On the other hand, what had she to gain? Remember, the passages in question are a miniscule part of her book and of no particular importance or salience. What could she possibly have hoped to gain by copying a few passages out of an entire book? With the risk of detection high and the potential gain nonexistent, to believe that Ms. Viswanathan engaged in plagiarism requires us to believe that she was utterly irrational.

What about the putative evidence for plagiarism? The argument consists of the assertion that no one could unintentionally reproduce 24 passages word-for-word. Well, to begin with, that isn't what the evidence is. The evidence is that there are 24 similar passages. The portions that are word-for-word the same are much shorter, the longest the 14 word example cited by Geoff. Could Viswanathan have unconsciously remembered such little snatches from another novel? Why not? This is easily within the range of human memory. Many people remember comparable portions of songs that they once heard without remembering the rest of the song or who sang it or wrote it or where they heard it. Linguists often remember facts or theoretical claims but cannot, at least off the cuff, remember where they come from. That a person with a strong literary orientation should have little snatches of other peoples' words floating around in her head is not only plausible, it seems likely. (I note that, although I know nothing about her background, her very name, which means "poetry" or possibly "poetess", suggests a literary orientation on the part of her parents.) I find it entirely plausible that Viswanathan's use of a tiny bit of material from McCafferty's book was unconscious and blameless. I could be wrong, but frankly I will not be persuaded by the bleating of publishers and their lawyers or literary critics - I want to hear from experimental psychologists who actually know something about memory.

I'm writting about this partly because I think that an innocent young woman is being unfairly condemned, but there is a larger issue at stake here too, namely the increasing privatization of our common culture. No creative work, whether scientific or literary, is the exclusive product of any single individual, or even of a large group of individuals, such as a corporation. All such works build on a tradition of thousands of years created by innumerable people, from which they draw ideas, facts, words, and expressions. It is in the nature of culture for people to make use of elements of previous work in composing new ones, whether by reciting the same facts, presenting or disagreeing with the same ideas, pursuing the same themes and plots, and using the same words and expressions. In music one piece is influenced by another, sometimes only in broad matters of style or performance, sometimes in reusing sequences of a few notes.

It is reasonable to demand that people not present as their innovation ideas that they have taken from others, and in some situations, for economic reasons, to impose restrictions like those of copyright law, but we must recognize that fundamentally everyone borrows from others and that this is normal and proper. To deny this is what leads to absurdities like the claim that the heirs to the man who coined the mathematical term googol should receive royalties from Google, or to the increasingly greedy and invasive demands of some segments of the entertainment industry for "protection" of their "rights", the latest of which is the Bush Administration's Intellectual Property Protection Act of 2006 recently introduced in Congress, which "strengthens" (makes more virulent) the Digital Millenium Copyright Act, which a report by the Electronic Frontier Foundation found to restrict free speech and stifle innovation. The idea that people should "own" sequences of 14 words or four musical notes or words pronounced the same as googol is not only ridiculous, it runs counter to the entire history of human civilization.

Disclosures: I went to Harvard. Beyond that I have no connection to either author. Prior to my birth my mother edited children's books at Random House, Ms. McCafferty's publisher. My sister-in-law is an intellectual property lawyer, but we haven't discussed this.

Subsequent related posts by me

Lament for Port and Starboard

Eric Bakovic's comment on my post lamenting the disuse of port and starboard on aircraft suggests that the loss of this terminology results in no loss because it is too unfamiliar and moreover is defined in terms of left and right. This is quite true if the two situations you are comparing are the current one and the current one with instructions to passengers given using port and starboard. However, this is not the correct comparison to make in deciding whether something has been lost. The valid comparison is between the current state of affairs and the hypothetical preceding state of affairs, in which the passengers WERE familiar with the terms port and starboard. On that comparison, something is indeed lost. Although it is true that present-day passengers CAN interpret left and right with respect to the axis of the aircraft rather than relative to the speaker or addressee, nothing forces them to, and we have no evidence that they do. port and starboard have the virtue of being completely unambiguous in this respect.

Before we proceed too far down this path I should point out, in case any of our readers haven't figured it out, that when I, and, I think, other Language Loggers, lament the loss of this or that fine point of language, we are for the most part not being serious: for the most part, we're making fun of language pundits who apparently do think in all seriousness that the sky will fall because people do not adhere to whatever often quite silly idea they have of correct usage. So let me assure you that I don't actually think that flying is significantly more dangerous due to the disuse of port and starboard. Insofar as we're not making fun of the pundits, we're just pointing out some interesting older usages and distinctions.

There is also a point to be made here about dictionary definitions, namely that they are poor evidence regarding psychology, and indeed, much of anything else. Even definitions from good dictionaries are often plainly inadequate if one looks at them closely. One common problem with dictionaries is the circularity of their definitions. In the obvious case, X is defined in terms of Y and Y is defined in terms of X. We often don't notice this in dictionaries of our own native language, but if you have used monolingual dictionaries of languages of which your command is imperfect it soon becomes apparent. In many cases it is less obvious because the chain is longer: X is defined in terms of Y which is defined in terms of Z which is defined in terms of X. Most dictionaries do not define words in terms of a set of primitives, nor are their choices of definition based on any sort of psychological research. The fact that port and starboard are defined in terms of left and right is not a good argument that this is so psychologically. And even if it is true now, it may well not have been in the past, when people may have been more familiar with ships and nautical terminology.

Posted by Bill Poser at 03:54 PM

Probability theory and Viswanathan's plagiarism

I have recently mentioned just how much undergraduate plagiarism disgusts me, and I will not repeat any of those remarks in the context of 19-year-old Harvard undergraduate Kaavya Viswanathan's debut novel How Opal Mehta Got Kissed, Got Wild, and Got a Life, now widely known to have included passages plagiarized from Megan McCafferty's Sloppy Firsts (2001). But let me just point out that at least one of the plagiarized passages was 14 words long.

That may seem short to you, but according to modern estimates of the entropy in ordinary running English text [thanks to Fernando Pereira for information that led me to revised this post on April 26], if you graph the word positions in English text against the number of words that would be grammatically possible as the next word given the last few words of the text, although the numbers vacillate wildly, the average across them all tends to settle in at something approaching 100. If that's right, then at any arbitrary starting point in an arbitrary text, if text was being composed at random, the probability that you will find the next 14 words match some previously designated sequence of 14 words is very roughly in the region of 1 in 10²⁸, i.e., 0.0000000000000000000000000001.

That number is so close to zero that we don't really need to ask any more. This is evidence of copying. And when there are a dozen other cases of plagiarism from the same source, as the The Harvard Crimson has shown there are, the probability plummets to something vastly lower. One could quibble with some of the assumptions behind the application of probability theory here (I'm assuming a novelist is free to choose each word independently from all the grammatically legitimate ones available at that point), but it won't really change the fact that the chances of this being accidental are not just small but nonexistent.

Viswanathan now says "I was very surprised and upset to learn that there are similarities between some passages in my novel, and passages in these books." Give me a break. We're not talking about "similar", we're talking about identical. She claims "any phrasing similarities between [McCafferty's] works and mine were completely unintentional and unconscious." I don't believe her. It's impossible. Nobody memorizes 14-word sequences accidentally and writes them under the delusion that they're original. Nobody accidentally borrows the phrase "and 170 specialty shops later", where any number would have done as well as "170" because it was picked arbitrarily by McCafferty. Nobody comes up with a phrase like "a pink tube top emblazoned with a glittery Playboy bunny" through some unlucky accidental half remembering. Sorry, but I'm not buying it. This is a sorry case of fraud, lying, copyright infringement, and abdication of the writer's intellectual responsibility. It's sickening.

And the notion of the honest Dan Brown getting sued for plagiarism while Kaavya Viswanathan does not really boggles the mind.

P.S. Most of the people emailing me about this are saying they object to the idea that I can read the author's mind with statistics. I'm not saying that. I'm not saying anything about her motivation or state of mind. I'm saying the hard evidence of straightforward copying of text is so extreme that we can regard it as conclusive.

Posted by Geoffrey K. Pullum at 03:25 PM

Full tilde

Jim Gordon recently complained (in an update to a post on pronouncing sauna) about how the New York Times crossword puzzle elides diacritical marks from foreignisms even when this results in a different word in the relevant language. The most egregious example, Jim noted, is the use of "year, in Spanish" as a clue for ANO, even though ano differs crucially from año. Below the jump, a real-world example illustrating the perils of de-tildeing año(s), provided by Matthew Baldwin of The Morning News.

My friend Rebecca is a prosecutor and, whenever I see her, I insist she fill me in on her recent cases. Though most involve routine litigation, she occasionally tells a gem of a tale.

The last time I asked, she told me about the Anus Motion.

"This guy gets pulled over on suspicion of a DUI," she said, "And it turns out that he only speaks Spanish. So the cop radios for a Spanish-speaking colleague. A second officer shows up, reads the driver his rights in Spanish off of a little card that all cops carry, and they administer the breathalyzer test. Sure enough, the guy is soused.

"We figure this case is a slam dunk. But a few weeks later the driver's lawyer submits a motion to have the results of the breathalyzer voided, saying that the defendant didn't understand his rights before we gave him the test. And we're all, like, 'Nuh-uh! We read him his rights. In Spanish, even.'

"But the defense somehow got a copy of the Spanish language card that the officer read from, and noticed that the little squiggle was missing from above an 'n' in the sentence: '¿Tiene veinteuno años?' In English that literally translates to 'Do you have 21 years?' — in other words, this was just a routine question to make sure the guy was an adult. But without the tilde over the 'n', the word 'años' becomes 'anos' — Spanish for 'anus.' [sic: it's Spanish for 'anuses.']

"They're claiming that the driver thought the officer asked 'Do you have 21 anuses', despite the fact that the officer reading the card spoke fluent Spanish and would have pronounced it 'años' anyway. And the defendant said 'si.' We're supposed to believe that the guy genuinely thought he was being asked if he had multiple anuses and answered with an enthusiastic 'yes!'

"The best part is that the defense attorney can't even bring himself to say the word 'anus.' Instead, he calls it 'the back region.' We're going in front of a judge next week, and I'm going to make a point of saying the word 'anus' as many times as I can during the proceeding. I even got them to call the legal brief 'The Anus Motion,' so he won't even be able to refer to it by title.

"What do you think the judge will do?" I asked her.

She shrugged. "Probably throw the case out," she said. "And we'll have to go back and change all the cards."

[Update: Eric Bakovic observes that "the años/anos thing is an old joke among Spanish speakers"...

Speaking of jokes surrounding "ano" ... (from my mom, who loves this kind of joke):

A reporter interviewing newly-elected president of Bolivia Evo Morales asks him about his unusual first name. "I was named after my mother, whose name was Eva," Morales says. "Good thing her name wasn't Ana," says the reporter. ]

Posted by Benjamin Zimmer at 02:11 PM

Apocalypse not now

True, starboard and port are not synonymous with right and left -- but it would be more than a little difficult to find a dictionary in which these (aero)nautical terms are not defined in terms of right and left. Consider the OED's definition of starboard:

A. n. a. The right-hand side of a ship, as distinguished from the LARBOARD or PORT side; the side upon which in early types of ships the steering apparatus was worked. (See LARBOARD note.) Also used with reference to aircraft. Often in the phrases a, on, upon, to starboard.

Note that there's not even a point of reference in this definition, though if you go to the "LARBOARD note", you find:

The side of a ship which is to the left hand of a person looking from the stern towards the bows.

Now why couldn't that also have been in the starboard entry?

I think the uses of right and left on an airplane are simply understood to be relative to someone facing the front of the plane (as most people on a plane are, most of the time). Even if I'm running toward the emergency exits behind me, I think I'd know to turn left if a flight attendant cries out: "Use the exits on the right!" -- and I'm sure I'd have to pause to think if I heard: "Use the exits to starboard!"

[ Comments? ]

Posted by Eric Bakovic at 11:07 AM

The latest in anti-social media

Far from the madding Crowd's ignoble Links: isolatr (beta).

[Obligatory linguistic hook: the -r phenomenon, discussed here and here.]

Posted by Mark Liberman at 10:06 AM

Straw creatures great and small

Apparently the substitution of "straw dog" for "straw man", discussed here a few weeks ago by Ben Zimmer, has become pretty common. According to a recent post by Kate Trgovac at My Name is Kate.ca,

In the last two weeks, I've been in a number of meetings at work where the phrase "straw dog" has been used. As in, "I'm putting this up on the whiteboard as a straw dog. I'd like your feedback."

Kate tracked down Ben's post, and also a post by Jon at Sprachgefühl, who observes that this malapropism for "straw man" has made it into the Urban Dictionary, that reliably unreliable compendium of lexicographic befuddlement.

The use that Kate cites is not exactly covered by the current entries in more serious dictionaries for "straw man", however. The AHD gives these glosses:

1. A person who is set up as cover or a front for a questionable enterprise. 2. An argument or opponent set up so as to be easily refuted or defeated. 3. A bundle of straw made into the likeness of a man and often used as a scarecrow.

But over the past 30 years or so, "straw man" has acquired a new meaning: a tentative or rough-draft proposal, put forward to begin the process of discussion and consultation that will lead to a final version. The AHD's sense 2 ("An argument or opponent set up so as to be easily refuted or defeated") suggests a rhetorical trick, but this new "rough draft" sense refers to an honest and necessary stage of the development process. According to the Wikipedia entry,

A "straw-man proposal" is a simple draft proposal intended to generate discussion of its disadvantages and to provoke the generation of new and better proposals. As the document is revised, it may be given other edition names such as "stone-man", "iron-man", and so on, etc.

The succession of names comes from the requirements document for the programming language Ada. The various stages being Strawman, Woodman, Tinman and Ironman. Later another Ada document, coined the following sequence of men: Sandman, Pebbleman and Stoneman.

Presumably this rough-draft sense of "straw man" was influenced by the term straw vote (or straw poll or straw ballot), which the AHD defines as

An unofficial vote or poll indicating the trend of opinion on a candidate or issue.

In committee work, a "straw vote" is useful to get an idea of how close to consensus the group is, which alternatives have enough support to be worth considering further, and so on. A "straw man proposal" plays an analogous role in helping the members of a group to clarify their ideas and reach a consensus.

As Ben Zimmer explained, the "straw dog" malapropism for "straw man" is primed by the (somewhat enigmatic) title of Sam Peckinpah's 1971 movie Straw Dogs. (If you're interested in the background, the commenters at Language Hat followed up, sinologically speaking, on Ben's discussion of Peckinpah's reference to Lao Tzu.)

Ben suggested that phrases such as "that dog won't hunt" might also be involved in Senator Gregg's statement that "I think that's a straw dog, to be very honest with you, this argument of amnesty". I'd expect that the idiom "stalking horse" would lead to the blend "straw horse" as well, and I'd be right -- a Google search turns up several examples in serious discourse, for example:

(link) This is a straw horse sort of argument. Fundamentally, it suggests the hiring process is a one-way street, and that the applicant has power. In reality, the hiring process is a two way street, involving discrimination on the part of both actors.
(link) I expected an analysis of the difficulties of teaching from a feminist standpoint within the current cultural climate of neo-conservatism and feminist backlash. Instead, I believe Whisnant's article contributes to that trying environment. Whisnant makes a straw horse of all feminisms but her own, lumping together groups as diverse as performativity theorists and liberal feminists, who have defined themselves in part in distinction from one another, and linking both/all with precisely the appropriative corporate tactics they criticize.
(link) Turkewitz complains about 'the false premise underlying the basic anti-copyright position [...] In this formulation, the "public's" interest is exclusively defined as the ability to get copyrighted materials as cheaply as possible, with free obviously being the best (since it is the cheapest) option.' However, this complaint is something of a straw horse.

And of course, if you've got a straw horse, someone is sure to beat it:

(link) Those who keep saying that there is no way of distinguishing between not controlling Ramallah and not controlling Tel Aviv are beating a straw horse.
(link) Not to beat a straw horse, but Rudy & Keith note that "drug administration, gene manipulations or brain lesions could all alter the manner in which the rat contacts the relevant features in the environment rather than the neural mechanisms involved in learning," an opinion echoed by Cain.

Note that all of these malaprops are in formal academic discourse: we intellectuals should not be too quick to throw stones at politicians who produce a novel idiom blend from time to time.

Generalizing from all these straw creatures, the word straw itself has come to mean something like "provisional, put forward for discussion". Examples:

(link) Step 1 of the Management Action Process is the creation of your straw draft MAP Document while step 5 of the Process is the creation of your complete MAP Document based upon your Bottom Up Program Review.
(link) This straw proposal is made by the Office of Clean Energy and has not been reviewed or approved by the Board, the Board President or the Chief of Staff.
(link) The IMO’s approach is to fully stakeholder the straw-plan for market evolution to ensure that the needs and desires of market participants and other stakeholders are known and considered.
(link) The following column of information was created by Turnitin.com to show the attributions for the straw document created to facilitate the development of the UNT academic plan.

[ Here's the OED's citation history for the phrase "straw man":

1594 T. B. La Primaud. Fr. Acad. II. 567 A scarre-crowe to make them afraide, as wee vse to deale with little children and with birdes by puppets and *strawe-men.
1890 FRAZER Golden Bough II. 247 Sometimes a straw man was burned in the ‘hut’.
1896 L. T. HOBHOUSE Theory of Knowl. 59 The straw man was easily enough knocked over by the critic who set him up.
1934 A. WOOLLCOTT While Rome Burns 76, I have often challenged one of these straw-man authorities.
1946 KOESTLER Thieves in Night 328 The authorities..only got the Rumanian captain and his crew, who couldn't give away much as all their dealings had been with straw men under assumed names.
1981 ‘M. HEBDEN’ Pel is Puzzled xviii. 180 He seemed active enough, but there seemed an awful lot lacking in him... Was he really just a straw man?

And for "straw vote/poll/ballot":

1932 C. E. ROBINSON Straw Votes iv. 52 The newspaper or magazine conducting a *straw poll by the ballot-in-the-paper method prints a straw ballot in the publication for a certain period of time before an election.
1944 Chicago Tribune 26 Oct. 12/2 (heading). New deal area lifts F.D.R. in N.Y. straw poll.
1958 Spectator 6 June 722/1 In my own straw poll I found two electors who were going to vote Liberal for the first time.
1978 Nature 6 Apr. 484/3 A straw poll taken three weeks ago at a meeting of faculty professors..voted 23 to 3 against approving the proposal.

There are no straw dogs or straw horses in the OED. Yet.]

[John Cowan points out that

The metaphor in "straw poll" is derived from "straw in the wind"; you can tell which way the wind blows by watching the movement of straws.
As for "strawman", I used to correct it to "trial balloon", which is a perfectly good expression for what is usually meant by "strawman" nowadays -- but I've given up.

And in a demonstration of the prophylactic power of metaphor, no one seems to have been tempted to use the blend "straw balloon". ]

[Grant Barrett writes:

There's one more connotation of "straw" which seems to stem directly from "straw man" sense 1 in AHD4. It's something like "false, questionable, fraudulent."
These two cites are examples: ( straw sale) ( straw contribution).

]

Posted by Mark Liberman at 08:19 AM

April 24, 2006

Battling blang

David Giacalone at f/k/a is drawing yet another line in the sand:

We campaigned long and hard against the ugly-little word "blog" invoking our duty to a joint language legacy. Earlier this year, we crusaded adamantly to make the word "blawg" obsolete. Today, the f/k/a Gang proclaims its dissent over another spawn of "blog" - the neologism "blang." See New York Times, "Coming to Terms with a Wired Age, Part 2," by Lisa Belkin, April 23, 2006, in which -- perhaps trying to be a bit too hip and youthful -- the Old Gray Lady becomes an accesory [sic] to languicide.

David can rest easy in this case, I predict, because blang is not going to make it. At least, he can rest easy if he wants to -- far be it from me to stand in the way of long and hard campaigns, adamant crusades and proclaimed dissent-- where would the blogosphere be without them?

It's not because it's "ugly-little" that I predict that blang will fail, nor because it's any more of a threat to the English language than blog and blawg were. No, Blang will fail because the specific things that Lisa Belkin says that blang is supposed to denote (cutesy invented words like "cybermoment", "cylences", and so on) don't actually exist; and blang will fail because descriptive phrases like "web language" are perfectly serviceable for the relatively rare occasions when someone wants to talk about real instances of this concept; and blang will fail because it's not a striking, evocative or clever blend for "web language"; and finally, blang will fail just because nearly all neologisms do.

Why did blog succeed? For one thing, its referents are relatively concrete and very commonly referenced: people with web logs felt the need to reference "my web log" and "X's web log" and "the growing number of web logs" and so on, many times a day. For another thing, people talking about web logs often felt the need to use the term as a modifier ("web-log design", "web-log software") or as a verb ("I haven't been web-logging very much lately"; "I'm so web-logging that"). All of those uses are facilitated by a compacted form. And finally, blog is a clever blended reduction of "weblog", initially founded on the string-parsing pun "web log → we blog".

The interesting thing about neologisms, as I suggested yesterday in reference to Belkin's column, is not that they threaten our "language legacy". They never have, and they never will, no matter how much they annoy some people. What's interesting about neologisms, at least to me, is that so many people enjoy inventing them or reading about other people's inventions, while so few of these inventions actually make it into general usage.

Posted by Mark Liberman at 08:55 PM

Yet Another Sign of the Apocalypse

We've seen several Signs of the Apocalypse recently, so I thought I'd mention another. I keep thinking of it every time I travel by airplane. Not only my fellow passengers, but flight attendants, who ought to know better, refer to the sides of the vessel using the words left and right. It seems as if I am the only person left on the planet who knows that an airplane is like a ship and has a port side and a starboard side. Sheesh.

There is actually a practical advantage to using port and starboard in some situations, due to the fact that they are not synonymous with left and right. port and starboard are defined with respect to the vessel, whereas left and right are defined with respect to the speaker, or the addressee, or possibly some other person. If the flight attendant tells you to evacuate using the port emergency exits, it is clear which ones to use, but if he or she says to use the left emergency exits there is sure to be confusion. Which left? The flight attendant's? The passengers'? And is the left of the passengers who are facing forward or the left of the passengers who are heading back to the restrooms?

What is curious about this is that such sources as www.answers.com, dict.die.net, and wordnet mention that starboard is applied both to ships and to aircraft, so it isn't exactly secret knowledge. What I wonder is, do people no longer find it comfortable to treat aircraft as ships, or have they simply lost touch with nautical terminology, so that they no longer use port and starboard at all, in reference to ships or aircraft?

In a fair Darwinian world those of us who retain the more refined terminology ought to have a survival advantage. I guess that ships and aircraft just aren't dangerous enough.

Posted by Bill Poser at 01:35 PM

Hollywood glamour, activist passion, false rhetoric

The May issue of Vanity Fair is the magazine's "first 'Green Issue'". The press release explains that "[t]he May cover features a quartet of eco–power players, capturing Hollywood glamour and activist passion: Robert F. Kennedy Jr., Al Gore, Julia Roberts, and George Clooney, photographed by Annie Leibovitz." The issue features an article by Gore, "The Moment of Truth", which starts like this:

Clichés are, by definition, over used. But here is a rare exception - a certifiable cliché that warrants more exposure, because it carries meaning deeply relevant to the biggest challenge our civilization has ever confronted.

The Chinese expression for "crisis" consists of two characters: 危機. The first is a symbol for "danger"; the second is a symbol for "opportunity."

Senator Gore is right about the cliché part: millions of business pep talks have used this rhetorically-convenient deconstruction of wēi+jī as danger+opportunity. Unfortunately, they're all wrong about the linguistic facts.

As Victor Mair put it, in an essay on pinyin.info more than a year ago,

While it is true that wēijī does indeed mean "crisis" and that the wēi syllable of wēijī does convey the notion of "danger," the jī syllable of wēijī most definitely does not signify "opportunity." ... The jī of wēijī, in fact, means something like "incipient moment; crucial point (when something begins or changes)." Thus, a wēijī is indeed a genuine crisis, a dangerous moment, a time when things start to go awry.

Victor goes on to explain that

Aside from the notion of "incipient moment" or "crucial point" discussed above, the graph for jī by itself indicates "quick-witted(ness); resourceful(ness)" and "machine; device." In combination with other graphs, however, jī can acquire hundreds of secondary meanings. It is absolutely crucial to observe that jī possesses these secondary meanings only in the multisyllabic terms into which it enters. To be specific in the matter under investigation, jī added to huì ("occasion") creates the Mandarin word for "opportunity" (jīhuì), but by itself jī does not mean "opportunity."

Thus wēijī is roughly "incipient moment of danger", while jīhuì is roughly "occasion of incipient moment". These decompositions should not be taken too literally, since such compound words acquire their own particular meanings over time. Looking at additional combinatoric possibilities of jī underlines this point. For us English speakers, it might help to consider the role of the core meaning of script in the modern English words inscription, description, prescription, transcription, ascription, conscription. Prescription can mean "medicine", and has script as a slang reduction. But this hardly licenses us to analyze con+scription as with+medicine , and to use this as a rhetorical device to introduce the idea that reviving the military draft would be a healthy thing for the American body politic.

Anthropogenic climate change is clearly a serious "incipient moment of danger". It's unfortunate that Senator Gore's essay "The Moment of Truth" undermines its own credibility by opening with a moment of false linguistic analysis. The false deconstruction of wēijī was debunked here just about a year ago, under the helpful title "Crisis ≠ Danger + Opportunity". Avoid rhetorical embarrassment, politicians: have your staffers read Language Log.

[Update: several readers have written to point out that most Chinese dictionaries include "opportunity" among the glosses they give for jī (traditional character 機, simplified character 机) . Indeed CEDICT, which I linked to in the body of the article, gives the glosses "machine; opportunity; secret". Victor Mair's point, which I've quoted above, is that

Aside from the notion of "incipient moment" or "crucial point" discussed above, the graph for jī by itself indicates "quick-witted(ness); resourceful(ness)" and "machine; device." In combination with other graphs, however, jī can acquire hundreds of secondary meanings. It is absolutely crucial to observe that jī possesses these secondary meanings only in the multisyllabic terms into which it enters.

and that "opportunity" enters lists of glosses for jī only because of the influence of multisyllabic terms such as 机会 jī huì. I'm taking Victor's word for this, seconded by Mark Swofford at pinyin.info.

I guess the question could be put another way: does the notion of "opportunity" enter into the meaning of the compound word wēi jī, either historically or psycholinguistically? Victor believes that the answer is "no". I have no independent basis for making a judgment, but I welcome opinions from scholars and native speakers.]

Posted by Mark Liberman at 06:02 AM

Scrapie in ancient China?

In an interesting letter to Science last year (Wickner 2005), Reed Wickner suggested that there is evidence of the occurence of scrapie, the transmissible spongiform encephalopathy found in sheep, in China over two thousand years ago, long before its early 18th century first attestation in Europe. He observes that the character for the word yang³ "to itch, to tickle", is 痒, and that 痒 is composed of ⽧ "illness" plus 羊 yang² "sheep", and makes a similar observation about another character. Since scrapie makes the poor sheep itchy and as a result they scratch themselves, he suggests that this reflects an ancient Chinese observation of sheep with scrapie. It's a nice point, but as Zhang (2006) and Li and Xing (2006) point out, it doesn't work.

Wickner mistakenly took the character "to itch" to be composed of two semantic units, a type of character known as 會意 in Chinese. (For the various structural types of Chinese characters see: Wikipedia: Chinese character classification.) Such characters do exist, e.g. 明 ming² "bright", which is composed of 日 ri⁴ "sun" and 月 yue⁴ "moon", but over 90% of Chinese characters are of a different type, known in Chinese as 形聲 "phono-semantic compounds". Such characters consist of a radical, which represents some aspect of the meaning of the character, and a phonetic, which as its name suggests is chosen for having, in Old Chinese, a sound similar (but not necessarily identical) to, the word in question.

For example, the character ⾔ yan² "to speak" is both a character in its own right and the radical underlying several hundred other characters, including:

詩 shi¹ "poem", with the phonetic 寺 shi⁴ "temple"
談 tan² "to converse", with the phonetic 炎 yan² "inflammation"
計 ji⁴ "plan, calculate", with the phonetic 十 shi² "ten"
訓 xun⁴ "to teach, explain", with the phonetic 川 chuan¹ "river"
訕 shan⁴ "abuse, slander", with the phonetic 山 shan¹ "mountain"
詰 jie² "interrogate", with the phonetic 吉 ji² "lucky"

It turns out that 痒 is not the original character for yang³ "to itch". The original character was 癢, which is composed of the radical ⽧ "illness" plus 養 yang³ "nutrition". This phonetic was later replaced, no doubt due to the complexity of the character, by the homophonous "sheep". (This example is parallel to the change in the writing of the Japanese word for "syphillis" that I have previously discussed, except in that case an entire character was replaced with a simpler, homophonous one, rather than part of a character.) The current character, therefore, does not demonstrate an ancient Chinese association between itching and sheep.

References

Li, Ping and Xing, Hong Bing (2006): "Disease but no sheep," Science 31 March 2006, p. 1867.
Wickner, Reed B. (2005): "Scrapie in ancient China," Science 5 August 2005, p. 874.
Zhang, Hong-Yu (2006): "Scrapie and the origins of the Chinese 'itchy'," Science 31 March 2006, pp. 1866-1867.

Posted by Bill Poser at 02:44 AM

April 23, 2006

An overature to the nucular family and the doctorial committee

About a year ago, I looked at some things that aren't eggcorns, but instead arise from some sort of morphological reanalysis that assimilates word structures to common morphological patterns -- in particular, the famous nucular, the well-known doctorial (in many lists of errors, including Paul Brians's Common Errors), and the less-known overature. I've now accumulated more examples of all three types.

Along the way, I came across another type, so far illustrated only by the verb fellatiate, discussed in my first posting on the vocabulary of toadying. Otherwise, we have the overature type, with -ature for -ture; the nucular type, with medial -ul-; and the doctorial type, with -i- before a final Latinate suffix beginning with a vowel.

First, the overature. Here there are quite a few, beginning with aperature for aperture, suggested to me by Coby Lubliner on 5/16/05:

I feel like opening up the larger aperature and closing the smaller one before proceding. ... For now waveguide opening is wall-of-microwave oven aperature. (link)

But also fixature, mixature, and strucature (I stopped searching at this point, since I was getting hits on almost everything I tried):

Dominic Heutelbeck is a fixature in the miniature painting community. (link)

Law of Partial Pressures (Dalton's Law): The total pressure exerted by a mixature of gases is the sum of the partial pressures of the individual gases. (link)

John Mills: "So i think 52 million it's there, we have to look at it and re-strucature county governement." (hint)

(The hits for fixature include many that are substitutes for fixative, but there are still plenty that are clearly substitutes for fixture.)

On to the nucular family. Here we have perculate - perculation - perculator, esculate - esculation - esculator, nuptuals (pointed out on ADS-L by Alison Murie on 3/3/06 and discussed at some length thereafter), jubulant (offered by Victoria Neufeldt on ADS-L on 3/7/06), and simular(ity). A few examples, from enormous numbers that can be Googled up):

Also this perculator graces the counter with its beautiful old-fashioned design ... Farberware Cordless 12-Cup Perculator is the best of the past with the ... (link)

Now the violence will esculate as you go blow for blow attack for attack and you have lowered yourself to the lowest common denominator and therfore have no ... (link)

Wedding Advice You Must Have Before Planning Nuptuals
A tiny fascinating book you must read before planning a wedding. The compiled experience from many years as a wedding planner. (link)

Atop his warhorse, he trotted up through Haraguchi Kenichi's men (all of them bloody, dirty, and oddly jubulant) ignoring the men as he passed by. (link)

I suppose I might be able to use a less accurate algorithm and use a simularity matching algorithm like Levenshtein distance on the smaller set. (link)

Finally, the doctorial committee: mischievious (in Brians), grievious (in Brians), and intravenious(ly), all with -ous; pastorial (in Brians), and pectorials, with -al, like doctorial; galiant (reported on ADS-L by Matthew Gordon and discussed at some length thereafter, since it might be a blend of gallant and valiant), with -ant; and similiar(ity), with -ar. A few examples:

Mischievious cartoons from the CartoonStock directory - the world's largest on-line collection of cartoons. (link)

My sin, guilt, grievious errors, or hatreds associated with the knots, that lay deep within my conscience. (link)

He was an intravenious drug user and a homosexual. He was a great guy, and I will never forget him. (link)

Pastorial Care. The primary principle of the College Charter states that:. "The needs of the children and their learning shall be paramount." (link)

Other muscles used include abdominals, back, shoulders and arms (including deltoids, pectorials and biceps). (link)

The 7th and last match was a galiant effort by the little Moxies, ... Despite a galiant effort by Teresa's team, Barb's team display of scrappy defense and ... (link)

Programs Similiar To GeekLog...? (link)

The ADS-L discussion of galiant noted that gal(l)iant was a fairly common version of this word in the 19th century, but it seems unlikely to me that the modern examples are merely continuations of this usage.

Throughout these examples, we see reshapings of Latinate words to conform to common morphophonological patterns in the language. No doubt there are plenty more out there. New ones come by every few weeks.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:41 PM

Myopia in applied linguistics

This may sound like the grumblings of an old man, but sometimes when I look back over what I had a hand in creating, I'm led to tears about what has happened since. Over three decades ago, at a meeting of the International Association of Applied Linguistics (AILA) in Montreal, Bernard Spolsky, Dick Tucker and I spent a couple days creating the American Association of Applied Linguistics. Three problems stimulated our thinking. First, unlike most other developed nations, the US at that time didn't have such an organization, which was something of an embarrassment for our country. Second, we noticed the growing distance between applied and theoretical linguists that we thought needed repair. We hoped that by holding annual meetings of applied linguists in conjuction with the LSA, we could bring these two closer together. We believed that both could learn from and nurture each other rather than continuing to grow more and more apart. Third, we believed that applied linguistics involved considerably more than language learning, teaching, and testing. In any case, at that time the TESOL organization was handling those topics very well. We believed that the horizon of applied linguistics should include many other topics, involving such areas as medical communication, law, diplomacy, business, and advertising.

Within weeks of the AILA meeting Spolsky produced a constitution, I was strong-armed into being the second president, and the new AAAL was up and running. In our first year we had only about a hundred members but we grew rapidly over time. Then, as leadership changed, so did the original mission.

After only a few years, it was decided that AAAL would no longer meet with LSA. Too many members were not interested in linguistic theory and complained that theoretical linguists had little interest in applied concerns. They believed that the noble experiment of bringing theory and application together was simply not working. I lamented this publicly, arguing that it might take some time for this to happen, but to no avail. Since then, AAAL has met in conjunction with or parallel to other applied linguistics organizations, such as TESOL.

But even more disheartening to me is that AAAL is still relatively silent about the founders' third orginal goal -- that of expanding the scope of applied linguistics to areas other than language learning, teaching, and testing. Evidence of this silence can be seen in a recent publication called Directions in Applied Linguistics (Multilingual Matters 2005), which was reviewed on the Linguist List on April 21 (here). This book's stated aim is to give "insights into the nature and scope of applied linguistics presenting plurality of views, intersts and styles"(p. 4).

The first chapter, called "Perspectives in Applied Linguistics," contains a very general account of the need to work in an interdisciplinary context, but the author is not specific about any particular field of application. This is as close as the book gets to the founders' third goal. The other 16 chapters deal with such topics as language policy, language education, sharing community languages, elementary school language education, non-native teachers of English, foreign language education, English for academic purposes, second generation English speakers, teacher's perception of errors, the writing of ESL students, contrastive rhetoric research, style in non-native writing, examples of L1 and L2 prose, cross-cultural variation in turn-taking in the classroom, language planning, and discourse planning. These are all worthy and important topics for language learning, teaching, and testing but they are, unfortunately, the same old conventional topics of applied linguistics.

If this book claiming to present the directions of applied linguistics accurately represents what it says in its title, the field hasn't expanded its horizons and scope much in the past three decades. The third goal of the founders of AAAL (expanding the scope of applied linguistics) appears to be meeting the same fate as the second goal (reducing the distance between applied and theoretical linguistics). I no longer attend the annual meetings of AAAL, partly because the papers pretty much reflect the topics of this book and partly because I look back on what could have been and am discouraged at what seems to me to be the myopic vision of applied linguistics in the US.

Despite what one might think, based on the above-cited book and from the papers at AAAL meetings, it's not true that no progress is being made in applying linguistics to some other areas of life. Great strides are being made in language and law, medical communication and, to an extent, business communication. But somehow this doesn't seem to fall under the aegis of applied linguistics these days, at least as far as most applied linguists are concerned. A pity.

Posted by Roger Shuy at 01:22 PM

New ideas and new words

We've often mentioned examples of the strange view that speakers of a language with "no word for X" are therefore unable to grasp the concept of X (see below for some examples). Perhaps the most efficient refutation of that view is the enduring popularity of word-invention contests such as Bob Levey's Neologism Competition, Barbara Wallraff's Word Fugitives, and so on.

These exercises are fun partly because wordplay is fun, but the main attraction (I think) is the conscious and public scrutiny of familiar but previously unexamined concepts. For example, Lisa Belkin's Life's Work column in the NYT jobs section for 4/23/2006, "Coming to terms with a Wired Age, Part 2", features the coinage

Cylences — The long gaps in phone conversation that occur when a person is reading e-mail or cybershopping at the same time.

Like most neologisms, this one will probably never graduate from the game into real life, although Belkin implies that it's already a feature of subculture usage. She describes the source as a list of "names for many ... new concepts" sent to her by Eve Fox:

She calls it "Blang," as in "Web language," and says it is spoken by "Web wraiths" — Tolkienesque creatures (i.e., most of us) who feel chained to their computers day and night.

I'm skeptical of this implication, since none of the examples of "Blang" that Belkin cites can be found in any quantity on the web. "Cylences", in particular, returns a Google count of zero this morning, so that any Web wraiths who use it must be operating on a spectral plane that Google doesn't index.

Another "Blang" word is schoogle, which Fox (and Belkin) want to mean "A popular pastime, consisting of Googling the names of old classmates". A Google search for {schoogle} finds 11,400 pages, which seems promising -- until we look at the examples, and find that they all seem to interpret schoogle as a reference to Google Scholar (e.g. a 2004 message from Eric Hellman on the Web4lib list with the title "Welcome to the Schoogle Era"). On the first ten pages of hits, I was unable to find any examples of the usage that Belkin cites -- again, those web wraiths are hiding out pretty effectively.

Anyhow, Fox's proposed Blangish meanings for cylences and schoogle are familiar concepts to me, and I expect to you as well, even though we had (and still have) no standard terminology for naming those concepts. This situation is normal and apparently unproblematic, as indicated by the fact that neologistic seeds so rarely sprout and prosper in the culture at large.

We're quietly proud of two neologisms associated with Language Log, {eggcorn} and {snowclone}, which seem to be making their way into general use. The development of useful terminology can be a genuine help to rational inquiry -- both as a label for discussion of examples and explanations, and as a stimulus to clarification of the associated concepts. We think that these two neologisms are succeeding because they're examples of that process. But the associated concepts were familiar to many people, at least in a rough form, before the words were invented -- that's a crucial part of why the words have spread.

Some LL "No word for X" posts:

No word for robins (11/16/2004)
Arctic folk at loss for words again (11/23/2004)
It's like a glimmer on the horizon (12/3/2004)
No word for sex (3/12/2005)
No word for "lazy hack parroting drivel"? (4/1/2005)
Crisis ≠ Danger + Opportunity (4/29/2005)
Football in Navajo, anyone? (9/23/2005)
The miserable French language and its inadequacies (9/30/2005)
Snowclone blindness (11/19/2005)
"60 Minutes" doomed to repeat itself (12/24/2005)
Ayn Rand, linguist? (3/15/2006)
Ayn Rand psychologizes a trope (3/19/2006)
Whorf in a bottle (5/5/2006)
No word for thank you (5/6/2006)
No concept of the future, no yuccas either (5/11/2006)
Does anyone have a word for this? Probably not (12/2/2006)
Solving the world's problems with linguistics (12/17/2006)

Posted by Mark Liberman at 01:13 PM

Prior artwork

It must be hard to be a copy editor. I especially admire the ability to focus effectively on the form as opposed to the content of texts. Distracted by meaning, I tend to read right past flocks of missing function words, wrong inflections, and even derivational substitutions like "substitution" for "substituted". For catching the many errors in my Language Log posts, I rely on the kindness of readers, since my attitude towards blogging is similar to the "first rule of Italian driving" explained by Raul Julia (as Franco) in the movie Gumball Rally: "[Franco rips off his rear-view mirror and throws it out of the car] What's-a behind me is not important."

But copy editors also need to understand the content of the stuff they're working on. The hardest part of this, I think, is understanding (and respecting) the use of ordinary words as terms of art. And of course it's the failures that readers notice, just as I noticed a mistake in John Markoff's 4/16/2006 NYT story "In Silicon Valley, a Man Without a Patent".

Markoff's story describes Geoff Goodfellow's 1982 efforts to start a wireless email service. The news hook is the recent $612.5-million payout from the Blackberry company Research in Motion to NTP, the holders of apparently bogus patents on wireless email that were granted to Thomas J. Campana Jr. a decade after the (economic) failure of Goodfellow's attempt.

The editorial mistake is a missing space:

"I think there is a potential ethics issue," said Mark A. Lemley, a Stanford professor who specializes in patent law. "The basic key is the attorneys have the obligation to disclose everything they know about his prior artwork and make him available as a fact witness." [emphasis added]

The spelling "prior artwork" implies a constituent structure like this

and a reference to the compound word artwork, defined by the AHD as

1. Work in the graphic or plastic arts, especially small handmade decorative or artistic objects. 2. An illustrative and decorative element, such as a line drawing or photograph, used in a printed work, such as a book.

But Professor Lemley is not talking about decorative or artistic objects or illustrative and decorative elements. He's talking about the fact that Goodfellow's work (and publications) in the 1980s are "prior art" in the legal sense, perhaps invalidating NTP's patents, which NTP's lawyers would be ethically required to disclose. As Merriam-Webster's Dictionary of Law explains, the legal term "prior art" refers to

: the processes, devices, and modes of achieving the end of an alleged invention that were known or knowable by due diligence before and at the date of the invention
also
: the knowledge or description of such processes, devices, or modes (used chiefly in patent law)

Thus Lemley meant "prior art" to be a compound word used to modify "work", or at least combined into a higher-level compound with "work", in a structure like this one:

Goodfellow's work may well have been "prior art" relative to the NTP patents, but Goodfellow himself never attempted to patent the idea of wireless email, because of a different concept in patent law (and common sense notions of equity as well): obviousness. Goodfellow explains:

"You don't patent the obvious," he said during a recent interview. "The way you compete is to build something that is faster, better, cheaper. You don't lock your ideas up in a patent and rest on your laurels."

In principle, the U.S. Patent and Trademark Office agrees:

Even if the subject matter sought to be patented is not exactly shown by the prior art, and involves one or more differences over the most nearly similar thing already known, a patent may still be refused if the differences would be obvious. The subject matter sought to be patented must be sufficiently different from what has been used or described before that it may be said to be nonobvious to a person having ordinary skill in the area of technology related to the invention.

However, in practice, the question of what is "nonobvious to a person having ordinary skill" is a tricky one. From Goodfellow's point of view (which I share), once you have email, then the simple idea of accessing your email via a different kind of device is obvious. Unfortunately, the USPTO (or rather, its examiners) have gone very far in the direction of allowing "inventors" to lock up whole areas of technological innovation by simply running through the cross-product of services and devices and communications channels, and patenting the idea of offering service X on device Y via channel Z -- often without any concrete contribution to the technical problems that may be involved. This is not invention, it's blind combinatorics.

Anyhow, my guess is that Markoff transcribed Lemley's phrase correctly, as "prior art work" (or "prior-art work"?), but a copy editor mechanically applied a rule about which compounds are to be written solid, and substituted "prior artwork", violating the intended constituency. As evidence that this spelling is a general policy when art and work really form a unit, I note that the NYT index finds (since 1981) 4,732 examples of "artwork" vs. 625 examples of "art work". Similarly, Google news this morning has 5,220 instances of "artwork" vs. 441 of "art work".

Posted by Mark Liberman at 10:12 AM

April 22, 2006

When is a phrase not a phrase?

... when it's just two words that happen to be next to each other. There were apparently almost 400 letters to Vanity Fair about their controversial Hollywood issue cover (and portfolio) two months ago. Explaining the practical reasons why they couldn't address more of those letters in this month's issue, the editors write:

Is there a sweeter phrase in the English language than "space prohibits"? Not today.

Not any day, eds. -- "space prohibits" is at best a noun phrase subject plus a verb that usually takes both a noun phrase object plus a prepositional phrase headed by from. Here the editors are using the verb intransitively for effect; we can all basically infer what should follow, something like: "... us from addressing more of your letters."

Still, this highlights a small problem often encountered with constituency tests, a standard way (at least in introductory linguistics courses) of determining the phrasehood of a string of words (in a given sentential context). One of these is the Stand-Alone test, whereby "[t]he ability of a string of words to stand alone as a reply to a question is an indication of their being a constituent". Applying this test to "space prohibits", we get something like the following:

Why didn't you print my letter, V.F.? - Space prohibits.

Which is fine, for the same reason as before: the verb is being used intranstively for effect.

The good news is the "space prohibits" passes none of the other standard constituency tests (though see below). Even though Wikipedia says that "if a sequence of words we want to analyze passes one of the tests, this is sufficient to prove the constituency of this unit", I'm sure all syntacticians would agree this is too permissive; in my view, constituency tests are arrayed on a scale of reliability, and the Stand-Alone test is somewhere near the bottom -- good enough to confirm what you already suspect, but not good enough on its own.

Another weakly-reliable test is the Coordination test; it's more reliable as a test for the kind of constituent or phrase you have, but even then it's got some problems. Note that "space prohibits" can be made to pass this test:

[Space prohibits], and [our editors forbid], us from addressing more of your letters.

But, as indicated by the commas, there need to be intonational breaks around "and our editors forbid" for this to work, unlike a more typical use of the Coordination test:

Space prohibits us from addressing [more of your letters] and [any of your hate mail].

I have to admit that I'm usually disappointed by the discussion of (the reliability of the various) constituency tests in introductory textbooks ... anyone know of a text that goes into this sort of thing?

[ Comments? ]

Posted by Eric Bakovic at 11:59 AM

Loose sallies of the mind

Dr. Johnson characterizes the sort of thing we turn out here at Language Log Plaza:

Johnson had further opportunity to expand and aerate his thoughts in his periodical writings. Throughout the time he was at work on the Dictionary he turned out essays. He needed an income, and journalism for him was an easy way of earning one. His second Dictionary definition of 'essay' ('a loose sally of the mind; an irregular indigested piece; not a regular and orderly composition') gives an indication of how he viewed this kind of writing. (Henry Hitchings, Defining the World: The Extraordinary Story of Dr. Johnson's Dictionary (NY: Farrar, Straus and Giroux, 2005), p. 115)

Hey, we work HARD for those fees!

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:12 AM

An Eats, Shoots & Leaves moment

Lane Greene sent in another example where final punctuation has apparently been copied into the body of a phrase in order to indicate emphasis, as in Best. Day. Ever. This time, though, the meaning is inverted:

NO GUEST WORKER AMNESTY!
becomes
NO! GUEST WORKER AMNESTY!

Posted by Mark Liberman at 06:29 AM

April 21, 2006

Adventures in celebrity onomastics

When Tom Cruise and Katie Holmes announced the birth of their daughter on Tuesday, celebrity-watchers were eager to find out what to call TomKat's offspring (besides TomKitten, of course). The couple's publicist revealed that the baby's name is Suri, further explaining that the name means 'princess' in Hebrew and 'red rose' in Persian. Given the immense scrutiny the couple has gotten, it was no surprise that even this offhand comment stirred up some controversy.

Actually, the Persian derivation is relatively uncontroversial. Various websites on Persian names list Souri/Suri (transliterations of سوری) with the gloss 'red rose.' And a check of the 1892 Steingass Persian-English Dictionary does indeed find an entry for sūrī, defined as "a beautiful red rose of an odoriferous and exhilarating flavour." It's apparently a shortened form of guli sūrī or gol-e suri (transliterations of گل سوری), where suri modifies the noun gul (or gol) meaning 'rose' in particular or 'flower' more generally. An article in Encyclopedia Iranica explains that gol-e suri is one of several names for the red rose in Persian poetic usage:

The rose has had a predominant place in classical Persian poetry ... where it is sometimes called gol-e sork or sork -gol "red flower," and gol-e suri (suri "red"; cf. Kurd. sō/ur, Pashto sur, Baluchi so/uhr, etc., cognates of Mid. Pers. suxr > Pers. sork, Av. suxra-, all meaning "red"), probably to avoid confusion with gol "flower" in general or to stress redness (because not all roses are red).

(As a side note, the Persian term for 'rose-water' is gul-āb, which was extended to various drinks sweetened with syrup or sugar. The word was borrowed into Arabic as julāb, eventually becoming julep in English via Spanish and French. So, as the American Heritage Dictionary note for rose helpfully explains, "it is etymologically correct to drink a julep while watching the Run for the Roses.")

If TomKat had just offered the Persian 'red rose' derivation via their publicist, they would have been in the clear. But the purported meaning of 'princess' in Hebrew has led to much debate among speakers of the language. One news report claims that Suri only had two possible meanings, neither of which is anything like 'princess':

According to Hebrew linguists, Suri has only two meanings — one is a person from Syria and the other "go away" when addressed to a female.

Hebrew expert Jonathan Went says, "I think it's fair to say they have made a mistake here. There are variations of the way the Hebrew name for princess is spelled but I have never seen it this way."

The AP reports that "bemused Israeli TV and radio presenters have debated the word's origins" since the announcement of Suri's birth, but they quote an expert who provides a plausible explanation for the publicist's comment:

"Nobody here has ever really heard of it," an announcer on Israel's Army Radio said during a discussion Thursday. The Yediot Ahronot newspaper agreed in its half-page splash on the celebrity birth.
"We seem to have learned a new Hebrew word — and from Tom Cruise, no less," said a Channel 2 TV anchorman.
Avshalom Koor, who has for years presented TV and radio spots on the intricacies of Hebrew, said Suri was a derivation of Sarah — the name of Biblical patriarch Abraham's wife — as pronounced by some Central European Jews.
Suri is a pet name for Sarah," Koor told Army Radio. "The Ashkenazi (Jews) of Poland and Hungary pronounce it Suri."
In ancient Hebrew, Sarah is the feminine form of "Sar," or lord. In modern Hebrew, the word means a Cabinet Minister.

Roger Friedman of Fox News put forth another explanation for how Suri might have been derived from Sarah:

To get the name Suri, you actually have to subscribe to Kabbalah, a very distant offshoot of Judaism.
Suri would really be Sarah, except Kabbalah — as it is now taught to celebrities — is all about taking letters and making new words out of them. ... Suri is derived from Sarah mathematically.

Friedman gives no source for this theory, and it seems a little improbable that a Scientologist couple would go the Kabbalistic route. In any case, Suri does seem to be some sort of alteration of Sarah, usually defined as 'noblewoman' or 'princess.' The happy couple could very well have consulted a book like The Comprehensive Dictionary of English & Hebrew First Names, which gives Suri as one of many variants of Sarah, all glossed as 'princess.' That hasn't stopped journalists and bloggers from finding alternate meanings for the word in various languages: 'pickpocket' in Japanese (Times of London), 'pointy nose' in the southern Indian language of Todas (AP), an epithet for Lord Krishna (Gawker), a breed of alpaca (Tabloidbaby), and so on and so forth.

This should keep people occupied until the child of Brad Pitt and Angelina Jolie is born. Word has it they're going to pick a Namibian name. That should be interesting, since according to this article, "in the Namibian context every name should have a meaning or convey a certain message." Names from Namibian languages include Gadoes ("a name given by the Nama to a baby girl born during a drought situation when people were moving in search of grazing for their livestock") Axarob (a baby who was "very thin at birth"), and !Khaeb-Khoeb (a baby "born when there was a stranger in the village"). If they have a girl, they could give her a name much like her mother's: Angaleni, "a name given to a girl to tell the people to be on the alert." I think the people are already on the alert with this baby, though.

[Update, 4/23/06: A Reuters article suggests some Israelis are still a bit hung up on the polysemy of suri in Hebrew, particularly the sense of 'go away' noted above:

"I really don't know what they were thinking when they chose this name. It's a term that denotes expulsion, like 'Get out of here'," said Gideon Goldenberg, a linguistics professor at the Hebrew University of Jerusalem. "It's pretty blunt."
Yaron London, a cultural commentator for Israel's Channel 10 television, had this rhetorical question for Suri's proud parents: "Why didn't you just go back to your ancestors' language, and call the kid 'Scram Cruise'?"

The article does note that Suri as a nickname for Sarah, though "all but unknown in Israel," is still attested. For instance, there's Jerusalem journalist Surie Ackerman, whose given name is "a formalized version of a nickname given by fellow ultra-Orthodox Jews in her native United States."

Reuters also gives some further alternative meanings for Suri, such as the name of a Nubian tribe and 'sun' in Sanskrit. Languagehat pitches in with Hausa 'anthill,' Pushtu 'large sack,' and Hindi (from Sanskrit) 'wise, learned.' (Commenters on the LH post have been adding examples from other languages.) As Steve from Languagehat writes, "When she gets old enough, she can take her pick."]

[Update, 4/27/06: Mississippi Fred MacDowell of On the Main Line debunks Roger Friedman's Kabbalistic explanation, clarifying that Suri is indeed an unremarkable Yiddish diminutivization of Sarah.]

[Update, 4/30/06: Further light on the origins of Suri as a diminutivized form of Sarah is shed by Ben Sadock at Positive Anymore:

There is an isogloss that runs roughly along the Ukraine/Belarus border, north of which the name is Sore and south of which it is Sure, but who says 'Suri'? The answer? Americans and Israelis, who have adopted the English and Israeli Hebrew custom of making diminutive forms of names ending in /i/. Among Hasidim, in fact (most of whom speak a southern dialect of Yiddish, this new diminutive ending has almost entirely replaced the older Yiddish diminutive suffix /-l/ with names. Thus Suri joins a large group of Suris in Brooklyn and Bnei Brak. I find this funny. ]

Posted by Benjamin Zimmer at 10:45 PM

The Provenzano Code

Unless you've been living in a cave you've probably heard that Bernardo Provenzano, suspected of being the capo di tutti capi of the Sicilian mafia, was recently captured after eluding the police for over forty years. What hasn't been emphasized in many of the news stories is the fact that his capture was due to his lack of knowledge of linguistics.

According to this account at the Discovery Channel, Provenzano was caught after police intercepted messages between Provenzano and other members of his organization written in a code that they were able to break. Details can be found at http://www.bernardoprovenzano.net (in Italian).

Actually, in technical terms, they weren't using a code but a cipher, a variant of the Caesar Cipher, said to have been used by Julius Caesar. The Caesar Cipher is one in which each letter of the alphabet is mapped onto the letter three places farther down. In the version used by Provenzano, each letter is first replaced by the corresponding ordinal, to which three is then added. Here is the resulting cipher:

Code	Letter	Code	Letter
4	a	15	n
5	b	16	o
6	c	17	p
7	d	18	q
8	e	19	r
9	f	20	s
10	g	21	t
11	h	22	u
12	i	23	v
13	l	24	z
14	m

Here is the cipher text of a bit of a message from Angelo Provenzano to his father:

1012234151512 14819647415218

and here is the decryption:

10 12 23 4 15 15 12	14 8 19 6 4 7 4 15 21 8
 g  i v  a  n  n  i	 m e  r c a d a  n  t e

(Mr. Provenzano misspelled Giovanni.)

Ciphers of this type have been around for a very long time and are known to be easy to break. If you suspect you're dealing with a shift cipher, you can check without all that much work since there are only N of them for an alphabet of N letters, and one, the one with shift N, doesn't conceal anything. The class of monoalphabetic substitution ciphers, in which each letter is replaced by some other letter, is much larger: for an alphabet of size N, there are N! of them. The number of different substitution ciphers on an alphabet of 21 letters is 510,909,400,000,000,000,000,000 (over 500 sextillion). That is far too many to break by brute force without a computer. Even with a computer able to check 1,000,000,000 possibilities per second it would take 1618.974 years. This kind of problem is easily parallelizable, meaning that many computers can work on it at the same time, so you can cut the time by using more machines. That only gets you so far though. If you put 1,000 computers to work it would still take 1.619 years. It is worth noting, too, that Italian makes the task easier since it has only 21 letters. For an alphabet of 26 letters like that of English, there are 403,291,500,000,000,000,000,000,000 (over 403 septillion) monoalphabetic substitution ciphers. With those same 1,000 computers trying them all would take 12,779,535 years.

In point of fact, monoalphabetic substitution ciphers are easy to break if you've got a reasonable amount of cipher text. The reason is that the frequency of the letters in natural languages is widely dispersed. Here are the frequencies of the characters in an English version of Machiavelli's The Prince:

Rank	Percent	Count	Code	Character
1	17.213	29,900	0x20	SPACE
2	10.417	18,095	0x65	e
3	7.357	12,779	0x74	t
4	6.121	10,633	0x6F	o
5	5.993	10,411	0x61	a
6	5.755	9,996	0x69	i
7	5.619	9,760	0x6E	n
8	5.328	9,255	0x68	h
9	5.014	8,710	0x73	s
10	4.619	8,024	0x72	r
11	3.129	5,435	0x64	d
12	2.714	4,715	0x6C	l
13	2.247	3,904	0x63	c
14	2.129	3,698	0x75	u
15	1.918	3,331	0x66	f
16	1.895	3,291	0x6D	m
17	1.624	2,821	0x0A	LINEFEED
18	1.623	2,819	0x77	w
19	1.501	2,607	0x2C	,
20	1.474	2,560	0x79	y
21	1.357	2,357	0x70	p
22	1.315	2,285	0x62	b
23	1.253	2,177	0x67	g
24	0.883	1,533	0x76	v
25	0.398	692	0x2E	.
26	0.379	659	0x6B	k
27	0.212	369	0x3B	;
28	0.175	304	0x78	x
29	0.088	153	0x71	q
30	0.075	131	0x6A	j
31	0.059	102	0x7A	z
32	0.033	57	0x2D	-
33	0.024	41	0x3A	:
34	0.014	25	0x27	'
35	0.008	14	0x2A	*
36	0.006	11	0x28	(
37	0.006	11	0x29	)
38	0.006	11	0x3F	?
39	0.003	6	0x5D	]
40	0.003	6	0x5B	[
41	0.002	4	0x31	1
42	0.002	4	0x22	"
43	0.002	3	0x35	5
44	0.001	2	0x34	4
45	0.001	1	0x32	2
46	0.001	1	0x38	8
47	0.001	1	0x33	3
48	0.001	1	0x30	0
49	0.001	1	0x36	6

As you can see, after the space character the letter e is the most frequent by a substantial margin, followed by the letter t. The details of the rank ordering will vary with the nature and amount of text that you analyze, but the first few are pretty stable. If you guess that the most frequent letter of a cipher text is e and that the next most frequent letter is t, you will be right most of the time, and once you have these two, you won't find it too difficult to figure out the rest. Of course the letter frequencies in very short messages may be rather far off so it helps to have a reasonable amount of cipher text.

If only Mr. Provenzano had known a little about linguistics and mathematics, he could have chosen a much more secure form of encryption.

PS: Those wishing to calculate numbers of substitution ciphers and the like yourselves and wondering how to do it should check out R, the Free statistical system that I used. I obtained the frequency counts using the unihist program from my unidesc package, after first using the Unix utility tr to convert the entire text to lower-case.

Posted by Bill Poser at 04:27 PM

Is a cow a negotiable instrument? Can a woman be a "reasonable man"?

Simon Musgrave writes:

Relevant to the Language Log posts on the status of cows are two sections from a favourite book of mine, A.P. Herbert's Uncommon Law (London: Methuen, 1935). Case #32 in this book reports the actions (heard together) Board of Inland Revenue v. Haddock and Rex v. Haddock, where Mr Haddock attempts to pay his tax bill by writing a cheque on the side of a cow, and causes a public disturbance when he delivers the cheque. Various arguments are given as to whether the animal constitutes a negotiable instrument, the presiding judge ruling that it certainly does and that the Collector of Taxes erred in not accepting it. Case #20 reports argument from the Court of Appeal in which Mr Haddock (again!) suggests that motor cars have the legal status of wild beasts -- and finds sympathetic judges.

I gather that the 66 cases covered in Herbert's book were each originally the subject of one of his columns in Punch, which ran under the heading "Misleading Cases in the Common Law". According to the Wikipedia entry on Herbert, "[o]ver his lifetime he published sixteen collections of the Misleading Cases".

Simon continues:

Many of Herbert's wonderful fantasies are aimed at particular provisions of British law from the period in which he worked, but others are of more general interest. For example, Case #1 investigates whether the common law allows for the existence of 'the reasonable woman' and Case #8 considers whether it is libellous to call someone 'highbrow'. My particular favourite is Case #58, in which a bright young lawyer argues that, as people receiving a salary from the state, no judge is eligible to hear income tax cases, as all are liable to a conflict of interest.

What is apparently a digital version of Herbert's essay "The Myth of the Reasonable Man", representing Case #1, can be found here. The anonymous presenter tells us that

The Reasonable Man is reproduced and cited in many modern British and United States legal books as it illustrates a key axiom of the common law — that many jury decisions are centered around the concept of the 'reasonable man'.

Without this concept, trespass and the law of torts (for example negligence, or bad faith breach of contract) and many other legal cases could not be decided. It should be noted that any apparent "sexism" in the essay should not be taken seriously. Even when it was written, in the early half of the twentieth century, its serious message was intended to be accompanied by a waggish sideswipe at the attitudes of an older era, and the peculiarities of legal definitions as they developed in the common law. This should not be interpreted as an endorsement of neolithic attitudes.

I'm skeptical of the view that the "apparent 'sexism' in the essay should not be taken seriously". Here's the legal background as given in the essay:

In this case the appellant was a Mrs. Fardell, a woman, who, while navigating a motor-launch on the River Thames collided with the respondent, who was navigating a punt, as a result of which the respondent was immersed and caught cold. The respondent brought an action for damages, in which it was alleged that the collision and subsequent immersion were caused by the negligent navigation of the appellant. In the Court below the learned judge decided that there was evidence on which the jury might find that the defendant had not taken reasonable care, and, being of that opinion, very properly left to the Jury the question whether in fact she had failed to use reasonable care or not.

The jury found for the plaintiff and awarded him two hundred and fifty pounds damages. This verdict we are asked to set aside on the ground of misdirection by the learned judge, the contention being that the case should never have been allowed to go to the Jury; and this contention is supported by a somewhat novel proposition, which has been ably, though tediously, argued by Sir Ethelred Rutt [the appeals judge] ...

The essay takes a few digs at the notion of "reasonableness" and at everyday hypocrisy:

Devoid ... of any human weakness, with not one single saving vice, sans prejudice, procrastination, ill-nature, avarice, and absence of mind, as careful for his own safety as he is for that of others, this excellent but odious character stands like a monument in our Courts of Justice, vainly appealing to his fellow-citizens to order their lives after his own example.

I have called him a myth; and, in so far as there are few, if any, of his mind and temperament to be found in the ranks of living men, the title is well chosen. But it is a myth which rests upon solid and even, it may be, upon permanent foundations. The Reasonable Man is fed and kept alive by the most valued and enduring of our juridical institutions -- the common jury.

Hateful as he must necessarily be to any ordinary citizen who privately considers him, it is a curious paradox that where two or three are gathered together in one place they will with one accord pretend an admiration for him; and, when they are gathered together in the formidable surroundings of a British jury, they are easily persuaded that they themselves are, each and generally, reasonable men.

So far, so good. But the essay (or the judge's opinion -- it's not clear to me whether Herbert is channeling the judge or merely quoting him) concludes that the stereotypes exempting women from reasonableness are to be endorsed.

To return, however, as every judge must ultimately return, to the case which is before us -- it has been urged for the appellant, and my own researches incline me to agree, that in all that mass of authorities which bears upon this branch of the law there is no single mention of a reasonable woman.

It was ably insisted before us that such an omission, extending over a century and more of judicial pronouncements, must be something more than a coincidence; that among the innumerable tributes to the reasonable man there might be expected at least some passing reference to a reasonable person of the opposite sex; that no such reference is found, for the simple reason that no such being is contemplated by the law; that legally at least there is no reasonable woman, and that therefore in this case the learned judge should have directed the jury that, while there was evidence on which they might find that the defendant had not come up to the standard required of a reasonable man, her conduct was only what was to be expected of a woman, as such.

It must be conceded at once that there is merit in this contention, however unpalatable it may at first appear. The appellant relics largely on Baxter's Case, 1639 (2 Bole, at page 100), in which it was held that for the purposes of estover the wife of a tenant by the mesne was at law in the same position as an ox or other cattle demenant (to which a modern parallel may be found in the statutory regulations of many railway companies, whereby, for the purposes of freight, a typewriter is counted as a musical instrument).

And there is our old friend, the typewriter as musical instrument for purposes of railway rate regulation! But the context is an argument that women are not members of the class of rational beings. The essay continues:

It is probably no mere chance that in our legal text-books the problems relating to married women are usually considered immediately after the pages devoted to idiots and lunatics. Indeed, there is respectable authority for saying that at Common Law this was the status of a woman. Recent legislation has whittled away a great part of this venerable conception, but so far as concerns the law of negligence, which is our present consideration, I am persuaded that it remains intact.

It is no bad thing that the law of the land should here and there conform with the known facts of every day experience. The view that there exists a class of beings, illogical, impulsive, careless, irresponsible, extravagant, prejudiced, and vain, free for the most part from those worthy and repellent excellences which distinguish the Reasonable Man, and devoted to the irrational arts of pleasure and attraction, is one which should be as welcome and as well accepted in our Courts as it is in our drawing-rooms-and even in Parliament.

The odd stipulation is often heard there that some new Committee or Council shall consist of so many persons 'one of which must be a woman': the assumption being that upon scientific principles of selection no woman would be added to a body having serious deliberative functions. That assumption, which is at once accepted and resented by those who maintain the complete equality of the sexes, is not founded, as they suppose, in some prejudice of Man but in the considered judgments of Nature.

I find that at Common Law a reasonable woman does not exist. The contention of the respondent fails and the appeal must be allowed. Costs to be costs in the action, above and below, but not costs in the case.

Here the notion that married women are members of the natural class that includes idiots and lunatics is endorsed, on the grounds that "[i]t is no bad thing that the law of the land should here and there conform with the known facts of every day experience", and that the courts should welcome "[t]he view that there exists a class of beings, illogical, impulsive, careless, irresponsible, extravagant, prejudiced, and vain, free for the most part from those worthy and repellent excellences which distinguish the Reasonable Man, and devoted to the irrational arts of pleasure and attraction".

Is this just "a waggish sideswipe at the attitudes of an older era, and the peculiarities of legal definitions as they developed in the common law"? Well, it's certainly waggish, and there are peculiarities a-plenty on display, but the attitude of indulgent paternalism seems all too sincere.

[Update: John Cowan writes:

I think -- nay, I am reasonably sure -- that you have just added yourself to the long and honorable list of those who have been deceived into thinking the Misleading Cases are actual law cases that Herbert simply reported on. You speak of "cases covered", of "the anonymous presenter", of an "essay", and finally profess yourself uncertain whether Herbert is "channelling the judge or merely quoting him".
In fact, Herbert made up the case, the judge, his opinion, Sir Ethelred Rutt (the attorney for the appellant) and all. His intent was the old-fashioned one of amusing and instructing through light-hearted satire; as always with satire, some will take it quite seriously. It is also a tribute to his literary art that his concocted judicial opinions sound quite convincing in their slide to the entirely daffy conclusion (as in this case). They have in the past appeared in the newspapers as fact (notoriously so in the case of the negotiable cow), and why not on Language Log?
I do urge you to read them: the "Port to Port" case, in which the Admiralty Court reluctantly decides that, on a flooded highway, as between a motor vehicle keeping cautiously to the left and and an oncoming rowboat audaciously keeping to the right, the latter has the right of way, is a particular favorite of mine.
(I had second thoughts about sending this at all; perhaps you realize all this all too well and are simply being too subtle for me -- but I decided to send it anyway, just in case.)

I was not being subtle -- in fact I was entirely taken in, though also puzzled about the identity of the essay's authorial voice.]

Posted by Mark Liberman at 09:38 AM

Ali G in the land of colorless green ideas

If you've had enough of linguists talking about Ali G (the fake purveyor of Jafaican), why not watch Ali G talking about linguistics? A YouTube video has been circulating with Ali G, aka Sacha Baron Cohen, chatting in his inimitable way on linguistic topics with none other than Noam (or "Norman") Chomsky. And apropos of our recent discussion of Multicultural London English, Ali G wants to know the linguistic future of his four-year-old cousin Sanjeev, who's got a Bangladeshi mum and a dad from Staines (represent, West Side). He's shocked to learn that Sanjeev might grow up bilingual, because... oh, just watch the video.

It's unclear whether Chomsky is in on the joke, as he plays the straight man throughout the inane questioning — though he does crack a smile by the end. It's possible that Cohen set this up before his Ali G persona was widely known in the U.S., or perhaps Chomsky's just not a fan of the HBO show. (The interview apparently never aired on HBO but was included as part of the bonus material on the Season 2 DVD, released last September.)

Posted by Benjamin Zimmer at 06:00 AM

Croatian an endangered language?

There's an article in the New York Times about two up-coming literary festivals that will highlight endangered languages. What struck me was the languages mentioned:

Kuranko	305,000 speakers
Basque	660,000 speakers
Croatian	6,215,000 speakers

My first reaction was to giggle at the choice of languages with so many speakers as examples of endangered languages. Carrier has no more than 1,000 speakers, almost all of them middle-aged or older. Some of the native languages here in British Columbia are down to 20 speakers, or six or even one.

On reflection, though, what they are saying is actually not so crazy. Languages like Carrier are a lot closer to extinction than Basque, Croatian, or Kuranko, but there is reason for concern about these larger languages. In a world in which a small number of major languages with English at their head increasingly dominate the media, entertainment, business, and academia, there are fewer and fewer domains in which smaller languages are used and more and more pressure to shift to one of the major languages.

Posted by Bill Poser at 01:23 AM

April 20, 2006

Are we an it or a they?

Brian Weatherson's remark about Far From the Madding Gerund, quoted by Mark earlier today, provides a lovely example of the tangles that can sometimes result from the interplay of (i) subject-verb number agreement, (ii) pronoun-antecedent number agreement, and (iii) the semantics of noun phrases denoting collections of people or human institutions:

Language Log is having a book published of their best posts for the last few years.

What are we according to him? Are we an it who is, or are we a they who are?

Don't get me wrong: I'm not criticizing Brian's syntax. Saying "Language Log are having a book published" would have been perfect and unremarkable in British English but too distinctively British for the American segment of the blogosphere. (Brian is an Australian, and Australian English is closer to British than American, and it is British English that notoriously favors the use of plural agreement with nouns denoting organizations or collections of people, as in "The government are worried". Americans strongly avoid that usage.) But saying "its best posts" would also be too weird to get away with: a mindless singular non-human entity referred to as it can't write posts all on its own without help from a group of human writers, can it?

I think Brian's syntactic compromise here — switching from a singular agreement choice to a later plural pronoun choice — was just about optimal, given the resources the English language makes available. The situation was certainly an awkward one; but one has to do what one can. Human languages are not finely-honed tools for the expression of thought, perfectly designed, optimally suited to their purpose. They evolved. They get by most of the time, as one would expect, but they have ragged edges, and sometimes little corners of them are a bit dysfunctional.

By the way, the LSA's journal Language has just published a big, interesting article on the psycholinguistics of singular and plural agreement with subjects like Language Log and the government; I don't have time to discuss it here right now, but I might later. The reference is:

Bock, Kathryn; Sally Butterfield; Anne Cutler; J. Cooper Cutting; Kathleen M. Eberhard; & Karin R. Humphreys (2006): ‘Number agreement in British and American English: disagreeing to agree collectively.’ Language 82.64-113.

Posted by Geoffrey K. Pullum at 11:14 AM

"That stuff" and "the genre of 'blog'"

Over at Crooked Timber, Brian Weatherson said nice things about Far from the Madding Gerund, without even seeing a copy.

Language Log is having a book published of their best posts for the last few years. Although there won’t be anything new in this, it should be a fun record of what has long been to my mind one of the best academic blogs around. [...]

When I started blogging it was with the hope that it would genuinely be an alternative publishing source. That is, it would be a place where I put things that were finished pieces, but which wouldn’t, couldn’t or shouldn’t end up in traditional print journals. But in fact it has turned into a repository for transient thoughts, not a publishing place. Language Log has, to a large extent, gone the other way.

In response, Lauren Squires at Polyglot Conspiracy defended us from the charge of excessive fit and finish:

I don’t think [Language Log has] largely gone the way of a “publishing place” any more than most (academic/political) blogs. They still write on things that just kind of pop up as interesting that aren’t researched uber-scientifically (Google’s one of their fave research tools), and they write colloquially and personally. Especially considering their academic field, this still feels QUITE different from published writing. And it’s also very internet-centric in a lot of ways.

But Lauren, astutely, raised another concern:

What’s most difficult to process about this transition of blogs to books, for me, is how to deal with the inherently linked-up nature of blogs. That’s part of what people like about them, part of what also makes them so interesting: they are so interconnected with other online content. I’m curious to see how that shakes out in print in the Log’s book (which I will definitely be buying, or at least investigating in the store).

[...]

So this is what I’m most intrigued to see, and Pullum doesn’t mention it in the book announcement: how will hyperlinks, references to other blogs, etc. be treated? If you take away that stuff, is it really worth printing? What I mean is, may as well you just call them essays, rather than posts? To what extent does that stuff (such technical terminology I’m using here, I know! “that stuff”!) define the genre of “blog,” and what’s special about blog material (what makes it worth reprinting in another medium and another market) otherwise?

About a year ago, when we were still thinking about whether and how to turn some LL posts into a book, Tom Sumner (editor of The Informed Citizen Series at William, James & Co.) brought up the same issue:

To make a book like this as much fun as the blog is, [you should] remove some links and references to other posts. Footnotes or other glossing should be kept to a minimum, and the posts should be as self-contained as possible.

You can see that Tom is being polite here, but he's politely suggesting something that sounds very much like actual work. This seemed disturbingly unbloglike, at least to me, for whom blogging is entirely recreational. My response was:

The links in our posts are of three kinds:

1. links to the things we're talking about -- newspaper stories, journal articles, other people's blog entries, software or sevices, and so forth.

2. links to other Language Log posts, put in so that we don't have to summarize them and/or to get people to read them.

3. other informational links, essentially to help readers who want to learn more, and (speaking for myself) as aids to memory for my own future reference.

I think that we probably do want to keep crucial references of type 1, though they might be given in parentheses or square brackets rather than footnotes.

Most links of type 2 could be deleted, except where the content is required to understand what is being said. We could either omit posts requiring such links, or provide a short summary of the linked material. Where the referenced posts are also being reprinted, a cross-reference could be provided.

Nearly all references of type 3 could be deleted without problems.

In this connection, though, I was intrigued by the way that David Foster Wallace's piece on talk radio (in the most recent Atlantic) was laid out -- in effect it has footnotes, but the scope of the footnotes is indicated with colored backgrounding, and the footnotes are actually given as sidebars in boxes with the same colored backgrounds.

This may be more readable than conventional footnotes. It's certain more readable than the interminable endnotes that DFW uses in some of his other writings. And without question it looks hip. Of course, I imagine it's expensive in typographical and printing terms, since it requires special layout and multiple-color printing.

But something of this kind, if it could be done without much extra work or expense, might be a good way to present some of what blog links are good for.

I was talking about "Host", by David Foster Wallace from the April, 2005 Atlantic. The internet version uses mouse-over colors and curious little pop-up windows -- which in my opinion don't work as well as the typography used in the paper magazine, described in my note to Tom. You can get a slightly better idea of the typography from this .pdf of page 5 of Wallace's article, taken from the copy which I downloaded at the time (since I'm a subscriber, of course, as you also should be). And the .pdf is still not as easy on the eyes and the mind as the paper version, which used colored backgrounds rather than colored outlines to link marginalia with phrases in the main text.

Anyhow, Tom started from the general idea of marginal notes keyed to the text, and found a way to do it that works very well, I think, without the indulgence of expensive multiple-color printing. You can check out his solution in the .pdf of the first 20 pages of chapter one that's available on the William, James & Co. web site for the book. It works even better in the context of the physical object, in my opinion, though you'll need to get a copy in order to see if you agree. And the beauty part is that Geoff and I didn't have to rewrite anything.

The resulting book retains the flavor of a blog -- or at least of our blog -- while taking on some of the advantages of a book. You can read it in the bathtub, for example. And as it turns out, I'm old-fashioned enough that well-printed words bound into a book acquire some mysterious extra oomph for me, even in rooms without plumbing. Paging through the blog entries in book form, I keep asking myself, "wait a minute, did I write that?" I've never reacted that way to reading the printed version of stuff that I wrote for print in the first place.

More than once, the feeling has been so strong that I've checked, and yes, I did write that. Somehow all those little pieces of crystallized conversation morphed into a book. I feel like one of those people who becomes an author by telling stories to a ghostwriter.

[David Foster Wallace, a self-confessed "snoot", has taken some lumps in this blog. But he (or his editor at The Atlantic) had a good idea about how to render links (or footnotes) more readable in print. So as Ali G said to Sir Rhodes Boyson, "Respect, man. Respect."]

Posted by Mark Liberman at 07:18 AM

Heated words about "sauna"

In the Apr. 14 installment of Jef Mallett's comic strip "Frazz," the title character (an enlightened school janitor) argues over the proper pronunciation of the word sauna with Caulfield (a young student at the school).

In the first panel, Frazz "corrects" Caulfield's pronunciation of sauna, though we don't yet know how since the word is spelled the same way in both speech balloons. The second panel elucidates the distinction Frazz is trying to make by way of pronunciation spellings: Frazz explains, "It's pronounced sow-na," presumably indicating /ˈsaʊnə/ to match the pronunciation of sow meaning 'female hog,' not sow meaning 'plant seeds.' He continues, "You said saw-na," suggesting a pronunciation of /ˈsɔːnə/ with a first syllable like the word saw. (I've represented the vowel in saw with the IPA symbol for an open-mid back rounded vowel, but American pronunciations can differ quite markedly from this, particularly among speakers with the cot-caught merger.) When Caulfield stakes a laissez-faire position on pronunciation in the third panel ("It's just sounds"), Frazz pretends to agree. Then he gives Caulfield a taste of his own medicine in the final panel by intentionally mispronouncing his name as "cow field," shifting the /ɔː/ in the first syllable of Caulfield to /aʊ/ and vocalizing the /l/ for good measure. Hoisted on his own petard, Caulfield objects to the mispronunciation and presumably learns a valuable lesson about the perils of permissivism.

But wait... among Americans, who pronounces sauna as /ˈsaʊnə/? This was the question raised by Washington Post columnist Gene Weingarten in his online chat of April 18. The "Frazz" strip was ripe fruit for Weingarten, combining two of his favorite topics: daily comics and language use. (The previous week's chat contained discussion of Jesse Sheidlower's piece in Slate about the controversy over the New York Times crossword puzzle using the word scumbag, with forays into the putative offensiveness of dork and schmuck.) Weingarten launched this opening salvo in his chat:

I now direct your attention to last Friday's Frazz, which contains a perfecty good gag, well told, and beautifully drawn, as Frazz always is. Does anyone notice a small problem with this cartoon, namely that ITS ENTIRE PREMISE IS WRONG? Every source I have consulted pronounces the word, foremost, SAW-na. Some say it is also SAH-na. Most don't even list SOW-na. We are all patiently waiting for Jef Mallett to explain himself, as we are sure he will.

Since Weingarten and Mallett are old friends, it wasn't long before the cartoonist himself weighed in:

Jef explains himself: Gene checked every source for the pronunciation of sauna EXCEPT for the people who invented the damn things, and use them the right way, and sell and maintain them, and ... and eat lutefisk on purpose. I'm not sure that last bit helps in the credibility department, but hey.
Nordics of the world, stick up for me. In my experience, this is actually kind of a sticking point for Scandinavians, with whom I share some heritage whenever it's convenient or flattering.
In my experience, which I seem to be compiling at an alarming rate, I'm also finding that it's a good idea to run to the dictionary and check even those "facts" that seem obvious to me. That, or stop drawing a comic strip that asks its readers to do it every so often.
I stand humbled and chastised. I promise to avoid such inexcusable lapses from here on. Because my only alternative is to draw a crass, crude comic strip for the simpler folk, and I promised ["Pearls Before Swine" cartoonist Stephan] Pastis I wouldn't horn in on his territory.
The Finns and Swedes and Norwegians are still encouraged to give Gene hell and salvage a little bit of my day, though. I'm off to flog myself with birch branches.

Mallett is quite ready to fall on his sword, accepting Weingarten's appeal to lexicographic authority. But perhaps he should have put up more of a fight, since Weingarten overstates his case about the accepted pronunciations of sauna. True, most if not all dictionaries list /ˈsɔːnə/ as the primary pronunciation, but most also include /ˈsaʊnə/ (or something close to it) as a secondary choice. This is true of the major collegiate dictionaries from Merriam-Webster, American Heritage, Oxford American, and Random House. The Oxford English Dictionary lists the pronunciations as (ˈsɔːnə, ǁˈsɑuna). According to the OED's special characters page, the double-pipe preceding the secondary pronunciation (charmingly labeled "tramlines") represents an "alien status marker," so it appears to indicate the proper Finnish pronunciation before being nativized into English phonology.

The /ˈsaʊnə/ pronunciation did get support from Scandinavians and other sauna snobs, much to Weingarten's disbelief:

Porcupine, S.D.: The Frazz cartoon only proves that Mallett is a serious sauna junky. The Finns, who invented the things, pronounce it SOW-na, and since it's a Finnish word (or rather, a Suomi one, to use the Finnish word for Finnish), that's probably technically correct. Doesn't make the joke any better to anyone who hasn't sat through a long lecture on the history of saunas by a Finn, though, I'll give you that much.
Gene Weingarten: But that's ridiculous. If we cared how things are pronounced in other countries, we would say "Osterreich," instead of Austria, and pronounce everything the way the Brits do. The Brits invented English.

SOWNA!: I can't believe I'm finally seeing this in a public forum! As a Finn who married an Irishman (there has to be a joke there), it took me years to teach him to say sauna correctly.
Thanks Jef!!!!
Gene Weingarten: Good god, people. Sowna is not correct, it is simply Finnish.

Mallett's right about this pronunciation peeve being a "sticking point" for those of Nordic descent. See, for instance, this discussion among "Yoopers" (those hailing from Michigan's Upper Peninsula), many of Finnish or ~~other~~ Scandinavian ancestry:

* Oh yeah - - nuttin like a nice hot SOW'-nuh when yer chilled to da bone by swimmin too long in da river or da Big Lake. Slide over, kiddies, and make room for ole Toivo.

* (Toivo's right, it's sow-na, not saw-na)

* And yes....Don't ever forget...it is SOW-NA

* People out in the west will argue forever on that pronunciation of sauna. They say SAW NAH. And when you tell them it's Sow-Na, they repeat SAW NAH. They tell me I'm wrong. Then, I get my swedish secretary to say it for them.

* As a former inhabitant of the Left Coast, not only did I forever hear the name pronounced wrong, I was in constant argue with the majority of people who considered a "hot" sauna about 120 degrees F, when a true yooper knows you don't even get good "laulua" until you exceed 160 degrees.

* Cracks me up/ticks me off how people here in Sconie are so adamant that it is saw-na and how people (even of finnish heritage) that say sow-na are wrong. Try asking one what they call the minnow looking fishing lure (Made in Finland). Amazing how angry pronunciation can make these people!!

Yes, it is amazing how angry pronunciation can make people, especially when there are two conflicting claims to authority: in this case the authenticity of the Finnish-style pronunciation on the one hand, and the standard English nativization of the word (as recognized by all major dictionaries) on the other. For the /ˈsɔːnə/crowd, the /ˈsaʊnə/ variant sounds plain wrong, even when the Finnish origin is explained. Meanwhile, the embattled minority sticking to /ˈsaʊnə/ seem to treat the standard nativized pronunciation (obviously modeled on such forms as fauna) as an affront to their Scandinavian-American identity. Call me a loosey-goosey latitudinarian, but I think there's plenty of room for both variants without people getting too steamed about the difference.

[Update #1: Emailed comments are arriving fast and thick here at Language Log Plaza. First, Nicholas Sanders (among others) points out my sloppy usage of Scandinavian to encompass Finns:

Just one thing - Finland is not actually part of Scandinavia. Nordic yes, Scandinavian no!

Sorry about that — I was a bit misled by Jef Mallett's use of the term. But it's fair to say that the pronunciation of sauna is a sore spot for those of Finnish descent and those of Scandinavian descent.

David Williams also catches the Scandinavian goof and raises a question about pronunciation spelling:

However hot and bothered the Fins might get about getting sauna right, it's nothing compared to the scandal caused by calling them Scandinavians, as you and the quoted others seem to do in your recent post. OED will back me up on this, but seems equally restrictive on "Nordic", also used in your post, which I take to include Fins and Estonians as well as Scandinavians. OED also seems to deprive Icelanders of a natural class.
BTW, I think there's an interesting point to be made about amateur phonetic spelling in that cartoon. The first time I read it I heard SOW as "to plant", and so didn't get the joke at all, even wondering if anyone actually said SOH-na. Even looking at it now I still have to think hard to hear the sound meaning "lady pig". Presumably for the author the default hearing is reversed. For me the unambiguous phonetic spelling [other than IPA] is actually SAU-na, which is how it's actually spelled.

Roger Shuy writes:

I now take great pride in my Finnish pronunciation of that word ever since my Finnish friends pounded it into me during my visits there. I guess they take pride in it too — one thing they have over us snooty Americans. I hate to admit this but I can't think of many pronunciations that make me feel that I really "know" something that my listeners don't. But this is one. Better than Latin words even.

The post also generated responses about other attempts at reproducing "authentic" pronunciations of loanwords into English. Melissa Fox writes:

At least those pronouncing the word 'sow-na' (I don't have IPA in my e-mail, alas) have a leg to stand on (a couple of legs!) when they claim that's how it's pronounced in the language of those who invented it. A friend of mine says 'haggis' with the 'a' vowel of 'father', which is just silly; granted, the correct low-front 'a' in Scots is more back than the equivalent vowel in most North American dialects (my friend is from North Carolina, but if he were from Minnesota he'd no doubt say 'haygis'), it's still a recognizably different vowel than the low-back 'a'. 'Hahgis', indeed. I finally asked him where on earth he learned to pronounce the word that way, since no Scot I've ever known has said anything of the kind, and he said Oh, that's just how I pronounce it. (One can only think of Alice saying to Humpty-Dumpty, "But 'glory' doesn't mean 'a nice knock-down argument'!")

Bruno von Wayenburg chimes in from the Netherlands:

Your Language post about sawna vs sowna reminds me of a story by a Dutch newspaper correspondent in London, who was chronicling his blending in with the British. He met with uncomprehending (or even disgusted?) stares when he pronounced the famous Dutch name 'Van Gogh' the Dutch way (with plenty of throaty friction in the g's), instead of 'Van Go'.
It would feel very awkward to apply the accent of the foreign language you are speaking to a name from your own language. But apparently, the (dare I say) 'correct' pronunciation still sounds sort of snobbish to English ears, even if they know you and Vincent are countrymen.
Later on, he noticed that the same thing goes for English French like 'deja vu' or 'je ne sais quoi'. Pronouncing these in his best (Dutch) schoolboy French would raise eyebrows. You're supposed to say Dayzha Voo and Zhe ne say quah, except if you're a snob, he concluded. I wonder if a French speaker would get away with it. Or does my correspondent just hang around with ubercritical (glad I don't have to pronounce that) journalists too much?

And from Jim Gordon:

Your comments begged a question (or an entire discussion, perhaps), though: If we should respect the original form of a word borrowed from another language, does that respect have to continue forever? When we take ownership after some period of time, are we allowed to impose an English or American pronunciation? And if so, why and when should we shift to the version preferred by the original owners? E.g., look at the names of national capital cities such as Yangon or Mumbai. Contrast those with antennas, octopuses, repertories, lox and gefilte fish, and anything that evolved from a precursor language. You could even stray into looking at the effect on English of arbitrary "romanization" systems used to transliterate Chinese or Arabic or other languages.
And since you mentioned the NYT crossword, one of my cherished peeves is their willingness to use foreign words without accent marks that make a significant difference. The prime example is "year, in Spanish." /Ano/ is different from /Año/.

Finally, another recent comic strip ("FoxTrot," Apr. 19) continues the theme of pronunciation pet peeves:

One wonders if the precocious character Jason has been reading Going Nucular by our own Geoff Nunberg.]

[Update #2: Two readers write in with surprisingly similar comments on the Finnish-Swedish sauna connection. From Bertilo Wennergren:

There are several mentions of Swedes insisting on the "sow-na" pronunciation of "sauna" in English. Being a Swede myself I'm totally mystified by that, since in Swedish the word is "bastu". (It's not really relevant now how "bastu" is prononunced since that's a completely different word.)
As far as I know Swedes living in Finland say "bastu" as well.
The mistake of including Finland in Scandinavia has already been mentioned, but this seems to be another weird confusion that goes more or less in the other direction: treating Swedes as if they would speak Finnish.

And from Ken Arneson:

Just an FYI, the assumption that a Swede would have an opinion about the correct pronunciation for "Sauna" is incorrect. The Swedish word for sauna is "bastu".
Now, how in the world Swedish ended up with that word for it, I have no idea. But the Swedish language seems to have a strong aversion to importing Finnish words. Why? The Wikipedia entry for Finland-Swedish says:

Swedish as spoken in Finland is regulated by the "Swedish department" of the "Research Institute for the Languages of Finland". There is an officially stated aim that Finland-Swedish should remain close to the Swedish spoken in Sweden, thus the Swedish department strongly advises against loanwords and calques from Finnish.
So as a result, in Swedish, Helsinki is called "Helsingfors", Turku is called "Åbo", sauna is called "bastu", and Nokia is called "Ericsson".
(that last one's a joke...)

This article provides some fascinating history on the Finnish sauna, the Swedish bastu, and other Nordic baths.]

Posted by Benjamin Zimmer at 01:31 AM

April 19, 2006

Four subjects of a book review

Geoff Pullum recently announced the forthcoming appearance of the very first book ever published by Language Log (here). This is great news indeed, but the blessed event forebodes an obvious next step, which heralds the equally forthcoming appearance of the dreaded book review. At this very moment some snarling critic is lurking out there, ready to promote his/her own career by writing a scathing and clever criticism that will show the reading public that he/she is intellectually, morally, and ethically superior to the authors of this slug-a-bed collection of strange essays with a funny name. To be perfectly certain that whoever writes this forthcoming review of Far From the Madding Gerund fully understands this important task and in keeping with the recent trend of listing four subjects about everything I offer four things that any good book reviewer really ought to do:

1. After giving only slight mention to the theme or point of the book, move immediately to your own theory, connect it somehow to the book, and show how superior your own points are by comparison. Don't fall into the trap of thinking that readers want to know what's actually in this book in the hope that it might help them decide whether to buy it. They don't want to know. They don't want to buy it. All they are interested in is what you, the clever reviewer, have been waiting up to this moment to proclaim.

2. Cite all of your own works that you possibly can in your review. And don't forget to repeat these in your "references"at the end. This shows that your work is quantitatively superior. For example, six citations to yourself outnumber the lone book that you review.

3. Ignore what the authors claim to be the purpose and scope of their book. You know very well that they should have had a different purpose and scope. And you know what this is. So tell the readers in no uncertain terms. They'll respect you for this.

4. Point out lots of trivial errors in the book. Any typographical errors that you can find will show that you are a vastly superior scholar. If the authors misquoted something or cited a wrong date, this can be a gift from heaven. Finding these is what book reviewing is all about.

Posted by Roger Shuy at 11:06 PM

McClellan's mangled sentences: where are they?

Michael Wolff's recent Vanity Fair article about the just-removed White House ex-spokesman Scott McClellan insists that his performances have been characterized by "mangled sentences, flat-footed evasions, and genial befuddlement." I'm a syntactician, not a detective or a clinical psychologist, so I concentrated on the first charge: I turned to the article looking for an example or two of these alleged "mangled sentences". I read the article with care, and found several other references to language, but not a single example that suggested any sentence-mangling at all, nothing even close. What McClellan says is dull and hackneyed; but all of it that Wolff quotes seems to be grammatical.

Why do people say these things about language that they simply can't back up? Another case of it's all grammar, I suppose. McClellan talks; that's language; the press representatives don't believe him or don't understand him; therefore his grammar must be to blame. Whatever grammar is.

Of course, it is possible that McClellan does commit major sentence mayhem in spontaneous speech (if you never do, then you get to cast the first stone). I'm just wondering why Wolff was unable or unwilling to quote even a single example of syntactic mutilation, in an article where supposed linguistic ineptitude was one of the major themes.

Posted by Geoffrey K. Pullum at 07:46 PM

L-erba' temi tal-poeżija

In case you've ever wondered what the four subjects of poetry are in Maltese, Antoine Cassar at Triq il-Maqluba has them here.

A Japanese version can be found at Mielikkiの日記.

Posted by Mark Liberman at 09:34 AM

MLA Language Map enters new territory

Back in June 2004, the MLA website rolled out an interactive language map of the United States, displaying the number of speakers per county or zip code for 37 languages, based on 2000 census data. The site originally used the misleading term "density" to refer to these statistics, even though the numbers given were for total speakers of a language, not the proportion of speakers to the local population. After this was pointed out on Language Log, the MLA changed the wording from "density" to "number of speakers." David Goldberg of the MLA's Foreign Language Programs further promised that "an anticipated expansion of the site will include a reflection of actual density of speakers." Well, as the Chronicle News Blog reports, the anticipated expansion has finally arrived. Not only does the new improved site generate percentage-based maps for different languages, it has a whole host of enhancements, including a Data Center with statistics for more than 300 languages searchable all the way down to the municipal level.

The percentage-based maps currently can be obtained only by county, not by zip code, but it's still enough to make a big difference compared to the previously available maps on the site. Compare these two maps for Spanish speakers in Texas, the first coded by number of speakers and the second by percentage of speakers:

One nice feature of the new Data Center is the ability to see local statistics for any recognized language or language group, combined with census data on speakers' age ranges and knowledge of English. So, for instance, if you look up the list of languages spoken in Jersey City, NJ, you can click on any language, say Gujarathi, and get this "language snapshot":

MLA vice president Michael Holquist was quoted by the Chronicle News Blog as saying that the project demonstrates how the United States, "with the exception of Papua New Guinea, is the country in the world with the greatest diversity of languages." I'm not sure how they're measuring linguistic diversity, but I doubt the U.S. comes in second on any reliable scale. According to tabulated data from Ethnologue, the U.S. ranks fifth in terms of total languages with 311, behind Papua New Guinea (820), Indonesia (742), Nigeria (516), and India (427). And the United States is not especially diverse according to another scale, Greenberg's diversity index, which calculates the probability that any two randomly selected people have different native languages. Ethnologue gives that probability as 0.353 for the U.S., good enough for 124th place out of 218 countries, sandwiched between Serbia-Montenegro and Paraguay.

[Update: Ben Sadock points out that the Census Bureau's own mapping tool is "harder to use than the MLA's interface" but "infinitely more manipulable." He recommends: "go play around on the Census Bureau's website, and you'll never be satisfied with the MLA's mapping tools again, even if they do monopolize mappable language data."]

Posted by Benjamin Zimmer at 07:52 AM

Grand theft bovine; or, when is an antelope not a document?

Cows might not be motor vehicles, but Bill Poser's argument for that proposition prompted Heidi Harley to remind us on about her discovery, last year, that "there are genuine laws of the land according to which cows and cars do form a natural class, namely the class of items which you can be charged with 'Grand Theft' for stealing one of".

According to the California Penal Code, section 487:

Grand theft is theft committed in any of the following cases:
[...]
(d) When the property taken is any of the following:
(1) An automobile, horse, mare, gelding, any bovine animal, any caprine animal, mule, jack, jenny, sheep, lamb, hog, sow, boar, gilt, barrow, or pig.
(2) A firearm.

[Given the care with which "horse, mare, gelding" and "hog, sow, boar, gilt, barrow, or pig" are enumerated, it seems odd that "automobile" is all on its terminological lonesome: what about SUVs, pickups, panel trucks and the like?]

Anyhow, this all reminded me of a lovely passage in Eben Moglen's "Anarchism Triumphant", which I've quoted before and will quote again:

No one can tell, simply by looking at a number that is 100 million digits long, whether that number is subject to patent, copyright, or trade secret protection, or indeed whether it is "owned" by anyone at all. So the legal system we have ... is compelled to treat indistinguishable things in unlike ways.

Now, in my role as a legal historian concerned with the secular (that is, very long term) development of legal thought, I claim that legal regimes based on sharp but unpredictable distinctions among similar objects are radically unstable. They fall apart over time because every instance of the rules' application is an invitation to at least one side to claim that instead of fitting in ideal category A the particular object in dispute should be deemed to fit instead in category B, where the rules will be more favorable to the party making the claim. This game - about whether a typewriter should be deemed a musical instrument for purposes of railway rate regulation, or whether a steam shovel is a motor vehicle - is the frequent stuff of legal ingenuity. But when the conventionally-approved legal categories require judges to distinguish among the identical, the game is infinitely lengthy, infinitely costly, and almost infinitely offensive to the unbiased bystander. [emphasis added]

I've been meaning for years to ask Professor Moglen whether the typewriter-as-musical-instrument example is taken from a real case, and if so, which one. A similar concern with necessary and sufficient conditions for category-membership underlies the traditional quasi-proverb "if my grandmother had wheels, she's be a wagon" (or perhaps a bicycle), used by Scotty in Star Trek III.

And any discussion of the natural class of unnatural categorizations should mention Borges' celebrated discussion of "El Idioma Analítico de John Wilkins" -- discussed in Language Log here -- as well as Suzanne Briet's widely-cited analysis of when an antelope is a document. As Michael Buckland explains:

One individual, who had, for years, been involved in discussions of the nature of documentation and documents, addressed the extension of the meaning of "document" with unusual directness. Suzanne Briet (1894-1989), also known as Suzanne Dupuy and as Suzanne Dupuy-Briet was active as a librarian and documentalist from 1924 to 1954 (Lemaître & Roux-Fouillet 1989; Buckland 1995).

In 1951 Briet published a manifesto on the nature of documentation, Qu'est-ce que la documentation, which starts with the assertion that "A document is evidence in support of a fact." ("Un document est une preuve à l'appui d'un fait" (Briet, 1951, 7). She then elaborates: A document is "any physical or symbolic sign, preserved or recorded, intended to represent, to reconstruct, or to demonstrate a physical or conceptual phenomenon". ("Tout indice concret ou symbolique, conservé ou enregistré, aux fins de représenter, de reconstituer ou de prouver un phénomène ou physique ou intellectuel." p. 7.) The implication is that documentation should not be viewed as being concerned with texts but with access to evidence.

Briet enumerates six objects and asks if each is a document.
Object --- Document?
Star in sky -- No
Photo of star -- Yes
Stone in river -- No
Stone in museum -- Yes
Animal in wild -- No
Animal in zoo -- Yes
There is discussion of an antelope. An antelope running wild on the plains of Africa should not be considered a document, she rules. But if it were to be captured, taken to a zoo and made an object of study, it has been made into a document. It has become physical evidence being used by those who study it. Not only that, but scholarly articles written about the antelope are secondary documents, since the antelope itself is the primary document.

I haven't been able to find the original of Briet's "Qu'est-ce que la documentation?" on line, but an English translation is here, and the relevant passage is this:

In our age of multiple and accelerated broadcasts, the least event, scientific or political, once it has been brought into public knowledge immediately becomes weighted down under a "veil of documents" (Raymond Bayer). We admire the documentary fertility of a simple originary fact: for example, an antelope of a new kind has been encountered in Africa by an explorer which has resulted in the capture of an individual that is then brought back to Europe for our Botanical Garden [Jardin des Plantes]. A press release makes the event known by newspaper, by radio, and by newsreels. The discovery becomes the object of an announcement at the Academy of Sciences. A professor of the Museum mentions it in his lectures. The living animal is placed in a cage and cataloged (zoological garden). Once it is dead, it will be stuffed and preserved (in the Museum). It is loaned to an Exposition. It is played on a soundtrack at the cinema. Its voice is recorded on a record. The first monograph serves to establish part of a treatise with plates, then a special encyclopedia (zoological), then a general encyclopedia. The works are cataloged in a library, after having been announced at publication (publisher catalogs and the French National Bibliography). The documents are recopied (drawings, watercolors, paintings, statues, photos, films, microfilms), then selected, analyzed, described, translated (documentary productions). The documents which relate to this event are the object of scientific sorting (fauna) and of ideological sorting (classification). Their ultimate conservation and utilization are determined by some general techniques and by sound methods for assembling the documents--methods which are studied in national associations and at international Congresses.

The cataloged antelope is an initial document and the other documents are secondary or derived.

It was surely a modern discipline of Briet who wrote in the Onion last year about Google Purge.

"Our users want the world to be as simple, clean, and accessible as the Google home page itself," said Google CEO Eric Schmidt at a press conference held in their corporate offices. "Soon, it will be."

[...]

"Thanks to Google Purge, you'll never have to worry that your search has missed some obscure book, because that book will no longer exist. And the same goes for movies, art, and music."

[...]

"Book burning is just the beginning," said Google co-founder Larry Page. "This fall, we'll unveil Google Sound, which will record and index all the noise on Earth. Is your baby sleeping soundly? Does your high-school sweetheart still talk about you? Google will have the answers."

Page added: "And thanks to Google Purge, anything our global microphone network can't pick up will be silenced by noise-cancellation machines in low-Earth orbit."

[...]

Although Google executives are keeping many details about Google Purge under wraps, some analysts speculate that the categories of information Google will eventually index or destroy include handwritten correspondence, buried fossils, and private thoughts and feelings.

The company's new directive may explain its recent acquisition of Celera Genomics, the company that mapped the human genome, and its buildup of a vast army of laser-equipped robots.

[Update: Bill Poser protests that

According to the California code cows and cars do NOT form a natural class. They BELONG to a natural class, but they do not FORM one since they are a proper subset of the elements of the class of things the theft of which is grand theft.

Right, there's all that stuff (in other clauses of the code) about avocados, shellfish, credit cards and other West Coast flora and fauna.]

[Update #2: Joe Gordon offers a legal opinion:

On a whim, I did the Westlaw search
typewriter /s "musical instrument"
and WL returned 24 results.
Many simply included both in a listing of the form of "equipment, including but not limited to..."
One had the ominous lines "...11, 1974, about 7:30 p. m., the defendant and his companion, Clifford Kam (Kam), entered Floyds of Hawaii, a retail musical instrument and typewriter repair store located in Kailua, Oahu. The defendant proceeded towards the backroom of the store where the owner, ..." (State v. Napeahi, 57 Haw. 365, 556 P.2d 569, Hawai'i, Nov 12, 1976)
(you just know that didn't turn out well for the owner, the defendant, and probably the typewriter)
Not one opinion returned by Westlaw involved the categorization problem, but I sympathize, really I do. In law school our administrative law prof introduced us to HLA Hart in the context of the question of what a vehicle was. "No vehicles in the National Park" - do snowmobiles count? Vehicles, we were told, have a hard chewy center (okay, that's not how he taught it, it's how I remember it), with a progressively vaguer and fuzzier penumbra (again, my phrasing) where related but not necessarily quintessential instantiations of the concept hover. Is a vehicle motorized, does it have wheels, does it carry passengers, is it a car, do trucks count, how about a bicycle. Etc. and etc.

Another example of the affinity between legal scholars and linguists.]

Posted by Mark Liberman at 05:52 AM

April 18, 2006

All that and talk about Fight Club

The last week's e-mail to Language Log Plaza brought two fairly recent snowclones, both based on very specific originals, "The first rule of Fight Club is, you do not talk about Fight Club" and "(be) all that and a bag of chips". As usual, the variants include some that stick close to the original and some that stray from the model.

Talk about Fight Club. On 4/11/06 Ann Burlingham, owner and manager of Burlingham Books in Perry, New York, wrote that she had been planning to use the slogan "the first rule of book club is, you don't talk about book club" in the store, but discovered it had already been taken. In fact, she got 821 Google hits for "first rule of book club". And found this in Wikipedia:

The Onion -- The satirical newspaper ran an article parodying Fight Club titled "The First Rule Of The Quilting Society Is You Don't Talk About The Quilting Society"

The original, from the movie Fight Club (1999), is given by the IMDB as:

Tyler Durden [played by Brad Pitt]: The first rule of Fight Club is - you do not talk about Fight Club. The second rule of Fight Club is - you DO NOT talk about Fight Club.

(There are other rules, but no more of this form.)

I get 867 raw webhits on <"the first rule of" "is that you do not talk"> and 671 on the contracted version <"the first rule of" "is that you don't talk">.

Sticking close to the model with "X Club(s)/club(s)" are variants like: DXBetaClub, rhetoric fight club, Employed Club, Milk and Honey Club, Blast Club, Blast Clubs, suicide club, Job Club, Bollywood Club. A bit further out are instances of "the first rule of X..." where X denotes an organization (Chandler's Guild, the organization, IT support), a game or contest (Mornington Crescent, King of the Mountain, DEATHBALL!!!!, Roomba cockfighting), or an activity (podcasting, partying with AG and I, Google Party). Still further out are various geeky Xs, for instance: Unicode, Design Vigilantism, Mac text editors, LiveJournal, HamletWeb. No doubt there are still more remote examples.

Though the movie Fight Club is clearly the trigger for the spread of the snowclone, the expression "the first thing about X is that you don't talk about X" (or close variants of it) is likely to have been around for some time before the movie, probably used in a wry way by speakers or writers independently. There's really no point in trying to trace the pre-Fight Club history of the expression. Things are different with the other snowclone I was offered this week.

A bag of chips. On 4/18/06 came e-mail from Erin McKean of OUP, relaying a blog entry by Jenny Palmer on the snowcloning of the predicative idiom "(be) all that and a bag of chips" '(be) wonderful, (be) hot stuff' (often used ironically, apparently). I am so not with it that I hadn't even been aware of the idiom. But it's been around for a few years -- the Urban Dictionary entry for it has one contributor identifying it as a 90s saying, though this site is scarcely to be taken as an authority on anything -- long enough to get used as the title of a novel (by Darrien Lee, published in 2002), the name of a rock group (formed in 2002), and the title of a poem (by Nordette Adams, in 2004).

Urban Dictionary has the variant "bag of potato chips", but Palmer found much, much more: 31 variants in the first 100 hits pulled up by a Google search on <"all that and a">. There are some very close to the original, with "bag of X", where X denotes something edible, especially (as Palmer notes in her analysis, which I'm following fairly closely here) a snack food: Terra Chips, Fritos, Gummy Bears, frijoles, Tostitos, Pistachios, Chomby Chips (treats for Chombies -- check out "Chomby"), pretzels, crisps. Or with other Xs: microchips, Crips, dicks, self-loathing, antisemitism (note phonological play in the first three). Somewhat further out are examples with "a Y of X", where Y denotes a container, measure, or serving and X denotes the thing contained, measured, or served. Again, X often denotes something edible: bowl of grits, slice of toast, side of slaw, napsack [sic] of chips, cup of coffee, bottle of rum, plate of chips, side of bacon. But occasionally not: pair of tap shoes. Still further out, but with the "Y of X" structure: hideous reminder of our insignificance.

The outliers lose the "Y of X" structure, in favor of simpler NPs. Some of these denote edibilia: ham sandwich, Frito pie. Others lose even that connection to the original: tarot deck, handbag, mustache, new toothbrush.

Neither Palmer nor I has any idea about the source of the original "(be) all that and a bag of chips". But it probably has a traceable history, since it's not at all an obvious figure for conveying positive evaluation (whether straight or ironically). Its effect has to be calculated from the context in which it's used and an assessment of the user's intentions, and can't be easily divined just from the form of the expression -- in contrast to uses of the "first rule... don't talk" snowclone.

[Ben Zimmer has now unearthed piles of examples from the newsgroup alt.rap from 1992-93. The very first of these, from 1/7/92, is: "Naughty By Nature was up next. The brothers were all dat and a bag of chips, pretzels and Doritos." In addition to plain "chips" examples, there's quite an assortment of other variants: bag of Cheetoes [sic], bowl of government $cheese, bag of grits, cherry on top, Bowlful of Jelly, pair of black boots, can of tomato soup, slice or [sic] warm banana bread with some butter, bagel with cream cheese, bowl of grits (several times). Such a profusion of variants suggests that the figure had been around for a while before 1992, or that it had radiated rapidly from its source, possibly in a rap.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:25 PM

The Emperor's Clothes

A few days ago in suggesting reliable sources for information on linguistic classification I described Merritt Ruhlen's approach as "unreliable at higher-levels due to its reliance on an unsound and subjective approach to linguistic classification". Those who have some familiarity with the debate over what sort of evidence is necessary to establish linguistic relationships no doubt assumed that I was referring to the fact that Ruhlen is an advocate of an approach dubbed "mass lexical comparison" that is known to be unsound. That is indeed part of what I meant, but it isn't the whole story.

There are two aspects to classifying languages. The one that attracts most popular interest is showing that languages are related in the first place. The second, harder, task is deciding exactly how they are related, that is, working out the family tree. This is called "subgrouping".

There are a number of ways of doing this, and on-going debates as to which way is best. The classical approach requires reconstructing the proto-language, working out the sequence of changes by which each of the daughter languages is derived from the proto-language, and structuring the tree in such a way as to minimize the number of changes that must have happened independently. In other words, this approach groups languages, and intermediate proto-languages, on the basis of shared innovations. It is essentially the same approach as that of biological classification. Other approaches, which fall generally under the heading of "lexicostatistics", are based entirely on lexical replacement. What all of these approaches have in common is that they provide an objective basis for claiming that language A is more closely related to language B than to language C. These techniques also have in common the fact that they all depend on establishing phonological correspondences among the languages.

"Mass lexical comparison" consists of setting down lists of words that one considers to resemble each other in sound and meaning and declaring that such similarities could not be due to chance and so must reflect common descent. That's it. No phonological correspondences are established. No reconstruction is done. That's why I prefer to call it "superficial lexical juxtaposition" (SLJ). Even if you believe that this method works (which it doesn't), what basis does it provide for taking the second step and working out the family tree? You can't use any of the usual techniques, even lexicostatistics, since they all rely on phonological correspondences, so there must be some other technique that proponents of SLJ use for subgrouping.

You'd think so, but you'd be wrong. Nowhere in the writings of people like Merritt Ruhlen or Joseph Greenberg will you find an exposition of an objective technique for subgrouping. The closest thing you'll find are tables listing words from several languages, where a few of the languages are closely related and can be seen to be obviously different from the others. In such easy cases the subgrouping may be obvious on inspection, but much of the time it isn't so obvious. The fact of the matter is, SLJ-ers have no technique for subgrouping.

The absence of a technique for subgrouping explains why SLJ-ers don't present evidence for their subgroupings. In a book like Ruhlen's Guide to the World's Languages you don't expect to see the evidence, but if you look at primary works by SLJ-ers you'll find that there isn't any evidence there either. To take a really prominent example, consider Joesph Greenberg's Language in the Americas (LIA), which purports to demonstrate that all of the native languages of the Americas other than those belonging to the Eskimo-Aleut and Na-Dene families belong to a single language family dubbed "Amerind". LIA also claims that Amerind consists of 11 subgroups and gives a detailed family tree.

The controversy about LIA has been devoted almost entirely to whether Greenberg is justified in claiming that almost all the languages of the Americas are related, but at least he gives an argument of sorts for that, albeit one that is badly flawed. What is the evidence for the 11 subgroups of Amerind, and more generally for his family tree? There isn't any. Really.

NO EVIDENCE WHATEVER IS PRESENTED.

None. Nada. Zilch.

Nowhere in LIA, nor anywhere else in Greenberg's oeuvre, will you find any evidence or argument for his subgrouping of Amerind. For each of the 11 putative subgroups Greenberg gives a list of "etymologies" similar to the one he gives for Amerind as a whole, but at best what that shows is that the languages included in each of his subgroups are related to each other. It provides no evidence whatever that the languages in one of his subgroups are more closely related to each other than to languages in other subgroups.

The upshot is that even if you believe SLJ-ers' claims about remote linguistic relationships, you shouldn't believe their claims about subgrouping because they have no technique for carrying out subgrouping and offer no evidence in favour of the subgroupings that they present. It's all a scam.

That's what makes it so ironic that Luca Cavalli-Sforza claims that mainstream historical linguists are unable to classify languages because they don't recognize that there are degrees of relationship. As I pointed out previously, his claim is entirely unfounded - he actually admits that he has no evidence for it and is just repeating what people like Ruhlen tell him. In fact, it is the SLJ-ers who have no method for determining degree of relationship and hence are incapable of classifying languages. Cavalli-Sforza has got it backward.

This doesn't mean that Ruhlen's Guide is useless. To a large extent, if you ignore the claims about remote relationships you can still use it since it is largely a compendium of work that other people have done. The difficulty is that where he has to choose among competing classifications you can't have much confidence in his choice because you have no idea on what basis he would choose. Since he has no method for classifying languages, on what basis will he evaluate the proposals of others? That means that you have to rely on your own knowledge of the reliability of the people he cites or look up the references yourself.

Posted by Bill Poser at 01:44 AM

April 17, 2006

Harry Potter and the Madding Gerund: Secrets of the Language Log Code

I have a secret to reveal to you. There is no point in trying to conceal it any longer. No point in laboring to bury the secret in a chain of clues that will take several chapters for you and a sexy cryptanalyst companion to solve. Because at this weekend's Wordstock Festival, a big book fair in Portland, Oregon, the cat will be up, the jig will be out of the bag: Language Log is making its first venture into print.

Yes, print, the medium where Dan Brown is king and J. K. Rowling is queen.

You see, some time last year an interesting offer arrived at Language Log Plaza. Tom Sumner, managing editor at the great publishing house of William, James & Co. in Wilsonville, Oregon (increasingly the epicenter of the world publishing industry now that Manhattan is so over), informed Mark and me that he wanted to put together a print collection of some of our Language Log posts from 2003 to 2005. And, modest and shy though we are (what, our little contributions?), Mark and I ultimately succumbed to his blandishments.

Naturally, Mark and I see ourselves as being basically of the post-print era. Cyber-aware, web-initiated, silicon-sinewed, HTML-savvy guys. Books, for us, are those old, musty things bound in calf skin that line the walls of the studies of older scholars. Books are for geezers. We do most of our research through web browsers, like everybody else now. We belong to the 21st century, not the 14th. We are much too modern for books. Oh, sure, we do keep a few books around, but just a few that we care about. Roughly four or five thousand each, I estimate, looking at the bookshelves in his Philadelphia apartment, and recalling the shelves in the Santa Cruz house and office that I will be returning to this July. The Cambridge Grammar; Syntactic Structures; a first edition of Dracula; that sort of thing. But mostly we are postbiblic.

However, Tom is a book guy, and he was persuasive. He traveled to Philadelphia and bought lunch for Mark at a trendy restaurant called Pod where food is all Asian and the lighting is all weird and you emerge from lunch feeling sort of shaky having agreed to things you can't quite recall. And Tom corresponded with me a lot and flattered my ego (extraordinarily easy to do, I discovered: just a few smooth strokes, and I am yours and will follow you anywhere, like a little dog). Tom got his way.

Tom turns out to be a great editor. He not only ferreted out all the little tiny slip-ups (we make a few) and fixed them, and tracked down all the cross-references and verified them; he made many key decisions about the book. He made selection recommendations; he did the sequencing and the breaking into thematic chapters; he did a beautiful page design (sidenotes rather than footnotes); he even researched the cover color with marketing people. (We had no idea: it seems tests have shown that, other things being equal, folks who do wholesale book-buying prefer light-colored covers to dark-colored covers. Go figure. For some things you really need a professional, don't you? We went with light.)

In the end, Tom chose the book's title as well. So many great ideas were rejected: (1) The Language Log Code; (2) Harry Potter and the Secret of Language Log; (3) Language Log and the Attack of the Giant Internet Worm; (4) The Unbearable Lightness of Language Log; (5) The Hitchhiker's Guide to Language Log; (6) Men are from Mars, Women are from Venus, and Language Log is from a Whole Nother Galaxy; and a denumerable infinity of other mad brainstormings we could have thought up if we'd had more time to spend on fun. Instead of any of these, Tom simply chose a title from among the titles of the posts included. To my amazement, he chose a post title that Mark had made up, rather than one of mine! (I whined and sniveled and kicked the furniture for a while; but I pulled out of it after a day and a half.) I never would have thought of the one he picked, but Tom found that people in the book trade love it: he picked Far from the Madding Gerund (you saw it here). To be more precise (since there is a subtitle), the book is actually called Far from the Madding Gerund and Other Dispatches from Language Log.

Starting May 1 it goes on sale. You can actually buy it on Amazon right now. And of course if you're in Oregon you can go to Wordstock and see advance copies on the stand shared by William, James & Co. and Franklin Beedle & Associates, where we are assured there will be show specials. (We're not sure quite what that means, but at the very least I imagine you should expect free blueberry snowclones, golden eggcorn pins and cufflinks, champagne, a brass band, and throngs of attractive, intelligent, sexy young people hanging around.)

So, from this weekend on, we are out there in the book marketplace. I'm a bit nervous, actually, to be competing with Dan Brown on his home turf. It's all very well for me to say that Dan can't write an effective simile to save his life. But he can sell books like a monster, can't he? Will we overtake him? His Amazon.com sales rank, for the new paperback version of The Da Vinci Code, is in the region of 5, and Language Log, as you know, champions the use of empirical evidence and quantitative methods. Things are now about to become distinctly empirical. Will we reach number 5 in Amazon's sales rankings? Or will Dan kick our sorry butts? What will you, the public, decide on? Is it to be The Da Vinci Code, which comes in either paperback, hardback, or a special illustrated luxury edition with fine art reproductions? Or Far from the Madding Gerund, which comes in pale blue with a picture of an eggcorn on the cover?

Time will tell. Still ahead of us, once we get popular and famous, are first the cruel reviews (it would be just so killer poetic if we got reviewed by Dan Brown, wouldn't it?), and then the lawsuits charging that we ripped off the ideas in our posts from Baigent and Leigh or some Russian loony...

Thus we move with a certain trepidation out of the blogosphere and into our new future, as essayists in a medium that came into its own in the late Middle Ages — not so much back to the future, more like forward into the past. Go forth unto your bookstore, and choose wisely.

Posted by Geoffrey K. Pullum at 09:18 PM

Doctors return to their senses

The New York Times reports that third-year students at the Mount Sinai School of Medicine are required to take a course in art appreciation (here). Other med schools also have been offering such courses in the effort to help train future doctors improve their sense of visual observation. This seems particularly important in an age of managed medical care that makes it difficult for doctors to notice much about the twenty patients they can give only about 15 minutes to each day. So training in using their five senses, from art or anything else, is a promising development. Patients are complex human beings — a lot more than bodies — and it's important to see the details of their physical problems as well as the details of their social, pschological and cultural context, the same way one sees a painting or sculpture. But seeing is not the only sense to be enhanced in medical students. What about some more training in hearing and speaking?

I once heard a physician say that 95% of success in a medical diagnosis comes from getting accurate information during the medical interview. Information comes from seeing, of course, but also from hearing what the other person has to say. And hearing is related to talking — asking the right questions based on what is heard. Way back in 1971, Dr. C. P. Kimball advocated this in his article in Annals of Internal Medicine (74:137):

The physician speaks a strange and often unintelligible dialect. He calls everday common objects by absurd and antiquated terms. He speaks of mitral commissurotomies, pituitary insufficiency, and reality feedback. This world is peopled with cirrhotics, greensticks, and hebephrenics. The professional dialect creates a communication gap between physician and patient that is generally acknowledged by neither. Increased specialization refines the physician's particular dialect, and he becomes much like the computer, tolerating only the imprint of words that fit into the programmed languages.

In the past thirty or so years others, mostly linguists, have echoed and amplified Kimball's statements but since there is little room for more courses in the already overcrowed curricula of medical schools, only small progress has been made to train doctors in how to communicate effectively with their patients. And managed care has reduced whatever opportunities even the best physicians might have. In fact, some doctors are dropping out completely. I've lost two of my own doctors in the past five years. One left the profession entirely and the other accepted a more normal 9 to 5 life at a hospital. One problem is the fixed-choice medical interview form, which has to be zipped through as quickly as possible.

Another communication barrier stems from the difference between middle class doctors and working class patients, who often have their own very different words for things, such as "sugar" for "diabetes." In a study I conducted years ago a doctor asked a patient if she ever had an abortion. She replied, "No," even though her chart indicated two of them. I spoke with her privately after her interview and she explained that to her an "abortion" meant deliberately getting rid of the fetus. In her case, she miscarried naturally (‘The Medical Interview: Problems in Communication’, Primary Care, September, 1976).

So here's another voice in the wilderness calling for the medical profession to expand its thinking beyond seeing details to hearing them and learning to talk to patients in ways they can understand. And allowing some time for them to talk would help too.

Posted by Roger Shuy at 12:27 PM

Patriots Day, Patriot's Day, and Patriots' Day

Happy Patriots Day, Patriots’ Day, or Patriot’s Day to all Language Log readers. Today is a holiday in Massachusetts (my state of residence for the 2005-2006 academic year while I'm enjoying a fellowship at the Radcliffe Institute for Advanced Study at Harvard University here in Cambridge; but Language Log does not observe holidays, so I am at work as usual). Today's holiday commemorates the battle in nearby Lexington and Concord which took place on April 19, 1775, and also the famous midnight warning ride of Paul Revere and William Dawes across this state. It is the day of the Boston Marathon (I will be running, of course, as a representative of the Language Log senior staff*). And like every other day, it is a day for us to reflect a little upon the topic of linguistic correctness. Because while neighbourhoods.net says "Patriots Day" and the Boston Athletic Association says "Patriots’ Day", holidayorigins.com says "Patriot’s Day". Which is correct?

Wikipedia is correct: it notes all three spellings. The thing is, you see (oh dear, the purists are going to hate me), all three phrases are fully grammatical, so the only thing that's at issue is which of the three phrases is standardly used to refer to the day in question, and the answer is that all three of them are in common use.

Why are they all grammatical?

Patriots Day uses the plural noun patriots as an attributive modifier in a singular noun phrase with the head noun day, as in weapons cache or activities center.
Patriot’s Day uses the genitive singular noun patriot’s as the determiner in a singular noun phrase with day as head, as in my MTV, or in Jeeves's description of his profession, gentleman's gentleman.
Patriots’ Day uses the genitive plural noun patriots’ as the determiner in a singular noun phrase with day as head, as in workers' pay or ladies' room.

Grammar sets the bounds of what is possible in the language (and discovering what the grammatical rules are involves a very tricky empirical investigation concerning which we have to be fallibilists: linguists must recognize that at any given stage they may not yet have identified the rules correctly, though that doesn't mean it can't be done). People choose which of the grammatical phrases and sentences in (their variety of) English they will deploy for what purposes. Probably between a billion and two billion people will use English at some time today. They don't all agree on precise details (to say the least). They won't agree on which of the three phrases just discussed is the real name of today's holiday (apostrophe use is a very shakily grasped aspect of English spelling and grammar anyway, and of course phonetically the three phrases are exactly the same, so the matter only emerges in writing).

One other thing (and this was added later): What I just said is not in any way in conflict with the observation that one group of speakers of English have a rather special status: the legislators of the Commonwealth of Massachusetts. And Jim Gordon has pointed out to me that what they said (General Laws of Massachusetts, Part I, Title II, Chapter 6, Section 12J) was this:

Section 12J. The governor shall annually issue a proclamation calling for a proper observance of April nineteenth as Patriots’ Day, in commemoration of the opening events of the War of the Revolution and the struggle through which the nation passed in its early days.

If you'd like to adopt that version, with Patriots’, on the grounds that it has been enshrined in law, that is very definitely a sensible usage decision to make. But if you think that choice was somehow mandated by English grammar, and the other two options are linguistically or logically mistaken and the people who use them are ignorant fools, you're wrong. And you're also wrong if you think government sources can be relied upon for grammatical consistency. Michael Greene points out to me that the Internal Revenue Service has a page announcing the extra day's grace that attributes it to "Patriot’s Day"; and I notice that the legend "N-2006-23, Patriots’ Day Filings and Payments (PDF K)" appears in a link further down on the same page! Trust no one; not even the IRS. Come to think of it, especially not the IRS.

* Some of the people who have been emailing me seem quite unaccountably to have drawn from this passing remark the inference that I will be doing a 26-mile run today. I am baffled at how they could have read this into what I said. I simply said "I will be running", and added that I am a representative of Language Log's senior staff. I didn't say "I will be an official participant in the Marathon." I merely meant that later on I will be running over from my Brattle Street office to the post office on Mt Auburn Street to mail something to Mark (today is not a Federal holiday, so the post office is open — though the Feds do recognize us Massachusetts residents as being on a holiday, so we, unlike you, actually have until tomorrow to submit our tax returns). Congratulations on running a marathon, indeed! Please try to read more carefully, all of you.

Posted by Geoffrey K. Pullum at 10:26 AM

How much do those red and blue jellybeans predict about linguistic ability?

Narly Golestani, Nicolas Molko, Stanislas Dehaene, Denis LeBihan and Christophe Pallier ("Brain Structure Predicts the Learning of Foreign Speech Sounds", Cerebral Cortex, published online April 7, 2006) trained 65 native speakers of French to distinguish Hindi dental vs. retroflex consonants. Then they MRI-scanned the brains of the 11 fastest and 10 slowest learners, and concluded that

... left auditory cortex WM [white matter] anatomy ... partly predicts individual differences in an aspect of language learning that relies on rapid temporal processing ... [and] a global displacement of components of a right hemispheric language network ... is predictive of speech sound learning.

Ben Zimmer posted on Language Log on 4/7/2006 about the use of the word linguist in the headline of the BBC's report on this work: "Linguists 'have different brains'". It's amusing to note that at some point between then and now, the BBC changed the headline to read "Polyglots 'have different brains'". Anyhow, the BBC story ends with a striking quote from Dr. Golestani:

"We can start to make predictions regarding whether people will be good at something or not based on their brain structure," she said," or diagnose clinical problems."

This is immediately followed by a very pretty pair of brain images that reinforce her point impressively:

As Baruch Grazer at Deep Weeds put it,

Apparently, linguists have a pair of red jelly beans in their brains. Non-linguists are, sadly, forced to muddle along with a blue jelly bean pair.

Now, Dr. Golestani's forecast of a modern phrenology will doubtless come true to some extent, though there is plenty of room for controversy about how far we'll be able to go in predicting performance from local features of gross neuroanatomy. But the juxtaposed picture may, I think, have given many readers a misleading idea of how accurately the jellybeans in your brain can now be used to characterize your talents.

One of the issues emerges in this quote from the Golestani et al. paper:

In order to better characterize the VBM ["voxel-based morphometry"] result in HG [Heschl's Gyrus], we manually labeled (segmented) the left and right HG of the 21 subjects on anatomically normalized images using previously deﬁned criteria (Penhune and others 1996). In cases where there were multiple transverse gyri, or when there was a single gyrus divided by a sulcus intermedius (SI) that extended to at least half of the length of HG, we included only the most anterior gyrus or gyral subregion, respectively, regions most likely to encompass the primary auditory cortex (Rademacher and others 2001). The rater was blind to group (faster vs. slower) and hemisphere, and the labeling was performed twice. HG volumes were signiﬁcantly correlated across labelizations (left: r = 0.81, P < 0.001, P < 0.001, right: r = 0.78), providing evidence for labelization reliability. We found that the left but not the right HG is larger in faster compared with slower learners ... [emphasis added]

Since Heschl's Gyrus is primary auditory cortex, it's plausible that people with radically different abilities to learn new phonological distinctions might show an anatomical difference there. But what, you may ask, is this "labelization reliability" business?

Well, different people's brains are at least as different as their faces are, not just in size but also in shape. If we want to compare the size or shape of my nose and your nose, we need to define what we mean by "nose" and how we're going to measure its size and shape. We'd also want to know how reproducible the data from different nose-measurers is; and we'd want to be sure that the nose-measurers did not have any particular expectations about what they should find in a given case. This is not an entirely fanciful example -- as Darwin explains in his autobiography, the dimensions of his nose nearly cost him his position as naturalist on the voyage of the Beagle:

Next day I started for Cambridge to see Henslow, and thence to London to see [Captain] Fitz-Roy, and all was soon arranged. Afterwards, on becoming very intimate with Fitz-Roy, I heard that I had run a very narrow risk of being rejected, on account of the shape of my nose! He was an ardent disciple of Lavater, and was convinced that he could judge of a man's character by the outline of his features; and he doubted whether any one with my nose could possess sufficient energy and determination for the voyage. But I think he was afterwards well satisfied that my nose had spoken falsely.

If we want to compare the size of your Heschl's Gyri and my Heschl's Gyri, the same requirements apply; and Golestani et al. have clearly met them. They used "previously defined criteria" for morphological segmentation of MRI images; they had each brain segmented twice by different raters, who were not told whether they were looking at the brain of a fast or slow (retroflexion) learner, or at the right or left cerebral hemisphere.This is all excellent practice, as would be expected from such eminent researchers.

The striking thing, though, is that the results for the different raters have rather modest degrees of correlation: r=0.81 in the right hemisphere, and r=0.78 in the left hemisphere. What does this really mean? Well, one way to look at it is that the square of these numbers (r²=0.66 in the right hemisphere and r²=0.61 in the left hemisphere) tells us how much of the variation in measured volume was caused by the identity of the brain being measure: 61-66% of the measured variation was due to the actual anatomy of the brain being measured, while 34-39% was noise due to measurement error (or really, measurement uncertainty). This is not a great start to the project of predicting ability from the size of brain regions: almost 40% of the measured variation is noise, even before we look at abilities at all.

To give you an idea of how much measurement uncertainty a correlation of r=0.78 represents, that's exactly the correlation between the (hypothetical) measurements of body weight (in pounds) for ten people given by two different scales in the table below:

	Person 1	Person 2	Person 3	Person 4	Person 5	Person 6	Person 7	Person 8	Person 9	Person 10
Scale 1	130	112	140	160	182	175	195	197	210	209
Scale 2	100	135	123	181	190	166	170	228	167	250

Although the mean values for the two scales are the same, and the correlation between their results is highly significant, the reported weights are different enough that you'd wonder if the scales might be defective, or if the research assistants reading them might not have been paying attention. That would be the wrong conclusion about the Golestani et al. gyrus-size estimates: I'm sure that the brain-anatomy raters were highly skilled and did an excellent job in applying the criteria they were given. The thing is, though, this sort of segmentation of brain anatomy is apparently not now a task that different experts can agree on to a very exact degree.

[Another way to look at these numbers is to compare them to the test-retest correlations for the SAT, which are said to be between 0.86 and 0.9 (Donlon 1984), or the test-retest correlations for the Stanford Binet, which are reported as between 0.85 and 0.95 (Thorndike and Hagen 1977). The correlation of AES ("automated essay scoring") systems with the scores of human raters is reported to be "generally between .70 and .90 and often between .80 and .85". ]

In the published paper, Golestani et al. give a scatter plot of the HG sizes -- overall volume, and volume of grey matter (neuronal cell bodies, "GM") and white matter (neuronal interconnections, "WM"). As you can see, the distributions overlap pretty extensively even in the case (left hemisphere white matter) where there is a statistically significant difference in means:

In fact, the result that I found most interesting was not the reported quantitative difference, but rather a qualitative difference, a difference in shape:

Finally, we explored group differences in the gross morphology of LHG by examining the frequency of duplication or splitting of HG in the left hemisphere in slower and faster learners. We found that in the group of faster learners, 6/11 individuals had either a duplicate or a split LHG, whereas in the group of slower learners, only 1/10 individuals had duplicate or split LHG.

We aren't told how often the raters agreed on whether the gyrus was duplicated or split, but I'd assume that this is somewhat more intersubjectively stable than the volume estimates.

Given the emphasis on prediction of learning performance, there's a table that I would have liked to see in the paper: nine columns for each of the 21 imaged subjects, one giving the quantitative performance on the phonetic learning task, and eight giving the right and left HG grey and white matter volume estimates from the two raters. This would give us a genuine basis for seeing how accurately learning can now be predicted from (this aspect of) anatomy. A complete table of all the relevant features for the 11 subjects would be even better, enabling us to use cross-validation techniques to estimate the likely performance of various prediction models on new subjects. Certainly any researcher in the machine learning area would (I think) want to look at the problem that way.

We could also start to compare the predictive power of neuroanatomical measurements with the traditional way to "make predictions regarding whether people will be good at something or not", namely aptitude tests.

Posted by Mark Liberman at 12:20 AM

April 16, 2006

Russian loony to sue Da Vinci Code author

Almost unbelievably, a Russian art historian now proposes to take our Dan to court for ripping off ideas found in The Da Vinci Code. According to The Times Online:

Mikhail Anikin, a Leonardo da Vinci expert in the Hermitage museum's Western European art department, said he would give Mr Brown one month to apologise and give up half his revenues from the book or he would take him to court in Russia and the US to seek all his earnings from the novel.

Anikin says he shared his opinion that the Mona Lisa is an allegory for the Christian church with someone from Texas who said they knew Dan Brown. That's about it, really. That's the basis for the suit. The prosecution rests.

What is it with these litigious nutballs? Is Anikin really preparing to drop a couple of million dollars on his lost cause, like Baigent and Leigh? Right at the start they were warned not to make fools of themselves, but they went ahead with their doomed lawsuit anyway. So Random House slaughtered it, vindicating Dan completely. And straight away this loony out of St Petersburg comes gunning for him.

I say we need to start a defense fund for Dan Brown right now. Let me remind you that we are talking about a man with limited resources: according to Forbes magazine, he made just $76.5 million from June 2004 to June 2005. There are quite a few CEOs in the USA who get much more than this. And they have corporations and CFOs to protect them when trouble comes. Dan is out there on his own, an independent contractor, a sensitive literary man, facing all these jealous nobodies bringing suit because they think their own pathetic little observations entitle them to a piece of the action ("Hey, I wrote an article about Opus Dei in 1999 so I deserve a few million too!").

So send your contributions (used USA banknotes or open-payee money orders, by ordinary first class post with no tracking number, please) to: Dan Brown Defense Fund, c/o Geoff Pullum, One Language Log Plaza, Philadelphia, PA 19104-6024. Our legal department here at the secret society of Language Log will ensure that the money is used appropriately.

Posted by Geoffrey K. Pullum at 08:04 AM

Is a cow a motor vehicle?

An interesting decision of the Ohio Court of Appeals for the 11th District recently came to my attention. A couple were injured when their car struck a cow. The owner of the cow had no insurance, so they filed a claim with their own insurance company for coverage under its uninsured motorist provision. The company refused to pay, so they sued. The trial court ruled against them, and they appealed.

The point of contention was whether a cow is a motor vehicle. The court cites the American Heritage Dictionary's definition: "a self-propelled, wheeled conveyance that does not run on rails" The court correctly observes that:

a cow is self-propelled, does not run on rails, and could be used as a conveyance; however, there is no indication in the record that this particular cow had wheels. Therefore, it was not a motor vehicle...

On this basis, buttressed by precedant to the effect that a horse is not a motor vehicle, it affirmed the decision of the Court of Common Pleas that the couple were not entitled to compensation.

I think that the Court of Appeals made the right decision, but for the wrong reasons. The American Heritage Dictionary's definition is wrong. On the one hand, it is not necessary for a vehicle to have wheels in order to be a motor vehicle. In my judgement, and I believe that of most people, vehicles such as snowmobiles, tanks, and bulldozers are motor vehicles even though they lack wheels. Furthermore, adding wheels to a cow would not make it a motor vehicle. On the other hand, not all self-propelled vehicles are motor vehicles. A sled is not a motor vehicle, even though it is self-propelled. What makes a vehicle a motor vehicle is, not surprisingly, its reliance on a motor. The reason that a cow is not a motor vehicle is that it has no motor.

Posted by Bill Poser at 03:09 AM

April 15, 2006

Jury instructions up close

I've always wanted to be selected as a juror in a law case but I could never pass the selection tests. It's probably because I've participated in so many trials as an expert witness. Knowing stuff can work against you in this strange world. Linda Seebach sent me an article by Arnold Kling (Cato Institute) that appeared in TCSDAILY on April 14 ( (here) called Law and Order. It's a very well written account of the author's recent experience as a juror in a murder case. He has some interesting things to say about how the system works from the stage of jury selection to the deliberation room. The defendant was a 17 year old who, after a high school football game, upon demand, gave his baseball bat to an 18 year old friend, who used it to hit and ultimately kill the victim. During his interrogation, he confessed to his role in the killing.

The confession interview was video taped and played to the jury, but a previous interview with him was not recorded. Information obtained by the police in the first interview must have led them to do a second one, but there was no way for the jury to learn what was said in it, leaving unanswered the question of whether he might have been coerced or misled to confess. Was the following confession statement voluntary? Police would do themselves a favor by taping the ENTIRE interaction with suspects, as noted in my recent post (here).

Kling didn't focus on this but he had some devastating things to say about the instructions given by the judge to the jury before they went to the deliberation room:

Part of the problem was the way the instructions were formatted. The instruction about the charges used phrases such as "the defendant caused the death," but the defendant was not the one who hit the victim with the bat. Instead, the defendant was accused of "aiding and abetting," so that you had to say that what he did "aided and abetted" the causing of death. The instruction for interpreting "aiding and abetting" was on a separate page, which forced you to go back and forth between the pages, mentally cutting and pasting, in order to parse the instructions...
The user-unfriendly nature of the instructions made our deliberations more protracted and difficult than they would have been otherwise. It added to the strain of what was already a stressful situation. I could not help wondering why the instructions were written this way, and by the end of our deliberations, I had three hypotheses, or possible explanations.

1. The judge does not understand the needs of jurors, and he does not know how to write clear, user-friendly instructions.

2. The judge wants us to deliberate for days and have difficulty reaching a verdict. He was secretly cackling to himself sadistically as he wrote the instructions ("Bet they go back and forth at least three hours on that one. Oh, ho--this one should tie them in knots.")

3. The instructions are subject to input and negotiations from the attorneys in the case. This would cause the instructions to wind up looking like a committee document. Memoranda produced by committees are characteristically ambiguous, and ideas that are supposed to be logically connected can become physically separated in a collective editorial process.

Here Kling echoes what linguists have been saying for years. Robert and Veda Charrow started it off back in the 1970s with research that showed that jurors were generally confused and even misinformed by jury instructions (Columbia Law Review, 79:7: 1306-1374). Many others since have followed up on this topic, including Peter Tiersma's recent work with the California Judicial Counsel's Task Force on Jury Instructions. Judges worry about giving jury instructions that might get the case overturned on appeal but one must wonder why it is that jurors can't be told what confusing instructions really mean. At least part of the problem stems from law's apparent need to use legal language to non-lawyers even when they don't understand it. The concepts of register change and participant perspective don't seem to occur to them. Tiersma's book, Legal Language (Chicago, 1999) details this issue at length. It's a very important book.

Kling also worried about the meaning of voluntary as it relates to the Miranda warning, particularly since the prosecution argued that the confession was given voluntarily. He then offered his own jury instruction suggestions for this:

For the defendant's statement to be considered voluntary, you must be satisfied that

1. The defendant clearly and completely understood the charges. The defendant does not need to understand why the state is making its accusations or the possible consequences of conviction, but the defendant does need to grasp the nature of the crimes that are given in the accusation.

2. The defendant made an intentional decision to speak in his own defense without the aid of an attorney.

User-friendly clear language? Understandability? Intentional decision making? Our court system could do a lot better.

Posted by Roger Shuy at 04:39 PM

fibs and cats

If poets have discovered the fibonacci sequence, perhaps they'll move on to other sequences. I suggest the Catalan numbers, known to mathematicians for increasing so quickly, and to computational linguists from a famous paper on syntactic ambiguity by Ken Church and Ramesh Patil in which it is shown that as the number of prepositional phrases in a sentence increases, the number of possible parse trees and their corresponding representations grows as the Catalan numbers.The sequence begins: 1, 1, 2, 5, 14, 42, 132, 429. Here is what is presumably the first cat:

1	A
1	cat
2	like this
5	is hard to get right
14	Catalan numbers increase so quickly this must be all

Posted by Bill Poser at 02:37 PM

Hungry for constraint

0
1   You
1   like
2   to read
3   Language Log
5   because you trust us
8   to write the sort of thing you like.

That's a fib: a poem of a type invented by Gregory K. Pincus (another GKP, but it's not me, he's a screenwriter and aspiring children's book author in Los Angeles) in which each line after the second contains as many syllables as the last two lines added together, so that the successive syllable counts follow the Fibonacci sequence. (Technically, the first line should be taken to be a blank one, so it has a syllable count of zero. The only stipulated syllable count is for the second line, which must contain to contain just one syllable, but from then on, it all follows the rule made famous in Dan Brown's The Da Vinci Code: 0+1 = 1, 1+1 = 2, 1+2 = 3, 2+3 = 5, 3+5 = 8.)

According to The New York Times [Friday April 14, page B31; couldn't find a web location], fibs are catching on all over the Internet, and more than a thousand have been written. I didn't want Language Log to be the last blog to publish one.

The Times article quotes Annie Finch, a poet who teaches at the University of Southern Maine and has written on formal poetry, as saying:

Poets are very, very hungry for constraint right now. . . . Poets are often poets because they love to play with words and love constraints that allow the self to step out of the picture a little bit. The form gives you something to dance with so it's not just you alone on the page.

I like that phrase "hungry for constraint". We grammarians love constraints too, of course. Figuring out what exactly the constraints of a given language are is a big part of our job description. What's different about constraints in formal poetry is that instead of being inherent in the language and unconsciously obeyed, the constraints are completely arbitrary, and the poet adopts them just to see what results when you try to use language in a way that complies with them (and at the same time also complies with at least most of the inherent constraints, of course — allowing for poetic license to violate some).

Personally, I don't have a lot of interest in the results of adopting arbitrary constraints. But I am always thrilled when I discover a new genuine, inherent constraint. I remember the day in 1997 that I discovered the key syntactic constraint on doubling adjectives to signal intensification (It's a huge, huge problem). You might have thought, once the existence of such doublings was pointed out, that wherever an adjective like huge occurs you can double it; but that is not true. There is a very important constraint on what else has to be true in the sentence in order for it to be permissible. Nobody had ever noticed it before. Perhaps you can identify it. The answer can be found on page 561 of The Cambridge Grammar of the English Language, and to the best of my knowledge no one had ever previously given a description of the special condition. In fact I can't find any other grammar that even notes the possibility of adjective doubling at all.

I wonder how long it will be before I discover another constraint inherent in the grammar of ordinary Standard English that no one had previously noticed. I look forward to it. Hungry for constraints.

Posted by Geoffrey K. Pullum at 12:23 PM

The four subjects of whatever

My post on William Matthew's "Four Subjects of Poetry" and Roger Shuy's follow-up on "The Four Subjects of Linguistic Analysis" have led to some similar lists in other areas. (Well, some bloggers picked the meme up more directly from Scott Simon's interview with Edward Hirsch on Weekend Edition...)

Kerim Friedman at Savage Minds explained "The Four Subjects of Anthropological Research":

These people are really, really, oppressed, but look! They have agency!
Identity is political and transcends national boundaries.
These people used to have a tradition, but they’ve adapted it to better fit with their current lifestyle and now it is a different tradition.
There are no signifieds, only an endless chain of signifiers representing the illusion of self resulting from desire-as-lack.

Ann Bartow at Feminist Law Professors added (under the clever title "Petite Fours") the four subjects of law review articles:

Congress passed a really dumb law.
The courts are doing stupid things that Congress could fix with a really good law.
Both legislatures and courts should start drafting and interpreting laws with an eye toward enhanced economic efficiency.
I’m bored with law, except as it is described in literature.

And Jim Miles at Out of the Jungle described "The Four Topics of Law Library Scholarship":

We surveyed one of our research classes and they want us to emphasize print sources.
We surveyed one of our research classes and they want us to emphasize electronic sources.
A current management theory, summarized in ten pages or less, applies to law libraries.
Technology will change everything, but there will always be a need for law libraries.

Guinness (?) at Pimpgnosis contributed "The four subjects of sociological enquiry":

It turns out that when you have money, that’s really great for you, and when you don’t have money? Dude, that like sucks. Just like it sucks to be black in America. Or a woman pretty much anywhere.
Oh, and by the way? Being black is not the same as being poor. Or a woman. Or gay. Not the same thing at all — no way, no how. Every kind of oppression is like a snowflake, you know? Unique and shit.
You know that after-school special that said you can be anything you want to be? Total fucking lie. Well, not _totally_ a lie. But pretty much. There’s like this thing called “structure,” see, and sometimes things happen to us whether we want them to or not.
Bishop Berkeley totally didn’t get it. The world and everything about it — including point #3 — is inside your head. Dude! That’s like, hella trippy when you think about it.

From Aloysius at Catymology, "The four subjects of catblogging":

I was feeling rotten today, and then I played with my cat, and it made me feel, you know, less rotten.
I may be losing my (a) hair, (b) figure, or (c) mind, but my cat is cuter than your cat.
Look at that cute cat (a) in a bag (b) in a sink, or (c) in, on top of, next to, or underneath any inanimate object not usually associated with a cat.
The world is going to hell in a handbasket, and it's all the fault of (a) those snarky Republicans (b) those snotty liberals, or (c) reality TV, but I don’t care cause my cat loves me.

[Update: Matt at No-sword contributes "The four subjects of writing on Japan":

Japanese people sure do bow and smile a lot, and their language is quite different from English. What are they hiding?
Check out this mysterious cartoony sex toy I found! (Page translated by BabelFish.)
My Japanese lover and I have come to stay at an onsen by a quiet, picturesque lake, but they have a face like a porcelain mask that prevents me from reading their emotions.
Geisha were totally not prostitutes. Not.

]

Posted by Mark Liberman at 11:55 AM

Freedom of Speech?

Here's a novel interpretation of "freedom of speech". At Northern Kentucky University Wednesday some students destroyed an anti-abortion display. Their teacher, literature and language professor Sally Jacobsen, is quoted as saying:

I did, outside of class during the break, invite students to express their freedom of speech rights to destroy the display if they wished to.

Interfering with other people's speech is freedom of speech? According to Jacobsen's web site her specialty is British women's literature. She should have studied Orwell.

Posted by Bill Poser at 07:44 AM

What "Multicultural London English" sounds like

A couple of days ago, I posted about "Multicultural London English", discussed in the press as "Jafaican" (or sometimes "Jafaikan"). None of the stories included any sound clips, and so I asked for suggestions about how to find something more authentic than Ali G, my only previous point of reference for this way of talking.

Abnu from Wordlab recommended Apache Indian's recent remake of Desmond Dekker's great 1969 reggae hit Israelites. "Apache Indian" is the stage name of Steven Kapur, who was born in Birmingham of East Indian ethnic background, and has pioneered what his website calls the "fusion of Reggae, Raggamuffin, Hip Hop and Bhangra". As this audio sample indicates, Apache's performance dialect (at least in this example) is transparently "fake Jamaican", and therefore the term "Jafaican" is a reasonable description. But he started out in Birmingham, and so this seems to be part of a broader cultural fusion that is not limited to London. (Dekker's original might be the most prolific source of mondegreens ever, by the way.)

Jennifer Tillotson sent a recorded passage from a BBC Radio 4 interview with some London-area schoolgirls about female violence. She explained that she lives "at the other end of the country" and therefore "know little of how the youth down there speak nowadays".

However, I had to hit the record button when I heard these North London girls being interviewed by BBC Radio 4. I have no idea what their ethnic background is, but that's kind of the point, innit!

These young women (for example here) aren't speaking Jamaican, fake or otherwise, but they aren't speaking Cockney either. So I'm guessing that these are some variants of the "multicutural London English" that Sue Fox and her colleagues are talking about. (I'll freely admit ignorance of British dialectology; if you can characterize these accents more accurately, please let me know. The whole passage is available as a 3MB mp3 file here. I don't know when this segment was broadcast; if I find out, I'll substitute a link to the Radio 4 archives.)

Steve from Languagehat wrote to register a complaint about that phrase "multicultural London English":

Do you really think that's a better term? To me it sounds dry and misleading: what on earth does "multicultural" mean in a linguistic sense? "Jafaican" may be too narrow, but it's punchy and memorable and at least gives a nod in a meaningful direction.

Steve's got a point. But if the sociolinguists' claims are correct, the emerging dialect is a blend of West Indian English, East Indian English, and some other things as well, all grafted onto a Cockney and/or Estuary-English substrate. "Multicultural" is an plausible mnemonic for that kind of masala.

The trouble with "Jafaican" is that it seems to have started life as a sort of British version of "wigger": that is, a somewhat insulting way to refer to the culture of white or asian kids who decide to act (and talk) black. Thus "Jafaican" isn't a reasonable term for the young Cockneys from Tower Hamlets that (according to Sue Fox) are starting to copy some vowel features and some lexical items from neighboring Bangladeshis. On her account, at least as filtered through the news stories, there are no Jamaicans, fake or otherwise, anywhere in that particular picture. And likewise, the speech of the girls in that BBC 4 interview is not "fake Jamaican", even if some Jamaican features are in the mix.

Still, I have to agree with Steve "Multicultural London English" is about as unlikely to become a popular piece of terminology as "African-American Vernacular English" was. And "Jafaican" feels like a lexicographical winner, even if it's misleading.

[Update: David Donnell writes:

Thanks for contextualizing this dialect for me.

For the past year or so I've been enjoying a CD called "Arular", by the artist M.I.A. (real name Maya Arulpragasam). Apparently the CD has won a number of awards and is fairly widely known by now.

M.I.A. is a young Londoner, originally from Sri Lanka, with family members who are/were Tamil Tigers, anti-gov't rebels back in the old country. (My wife is a South African with Sri Lankan roots, so we had some "cultural" interest in the artist.)

It is precisely the lingo that I find so engaging, and "Multicultural London English" or "Jafaican" would seem to describe the dialect pretty well.

Listen to, for example, "Pull Up the People".
[you can navigate to it via flash on her website -- myl]

Another interesting case -- linguistically. In other dimensions, though I don't know much about the who-done-what-to-whom in Sri Lanka, it's hard for me to get enthusiastic about a London-based artist who features a bundle of dynamite with a lit fuse on her web site. The content of "Pull Up the People" promotes the volatile metaphor of adolescent anger = religiously-inspired bombings = ecstatic music. With all respect to David, to multicultural London youth, and to the Tamils in Sri Lanka, this is topical and edgy, but isn't it also a little, you know, immoral?

slang tang
that's the that m.i.a. thang
i got the bombs to make you blow
i got the beats to make you bang

yeah me got god, and me got you
everyday thinkin bout how me get through
everything i own is on i.o.u.
but i'm here bringing y'all something new

you no like the people
they no like you
then they go set it off with a big boom
every gun in a battle is a son and daughter too
why you wanna talk about who done who?
what you wanna talk about?

slang tang
that's the that m.i.a. thang
i got the bombs to make you blow
i got the beats to make you bang

pull up the people, pull up the poor...

i'm a fighter, fighter god
i'm a soldier on that road
i'm a fighter, a nice nice fighter
i'm a soldier on that road

bring me the reaper
bring me a lawyer
i'll fight i'll take 'em on
you treat me like a killer
i ain't never hate ya
i'm a soldier on that road

i'm a fighter, fighter god
i'm a soldier on that road
i'm a fighter, a nice nice fighter
i'm a soldier on that road

]

Posted by Mark Liberman at 06:49 AM

April 14, 2006

Congratulations to Dan Brown

I hope all Language Log readers have already learned that Michael Baigent and Richard Leigh have failed in their ridiculous plagiarism suit against Dan Brown's publisher (Random House, which is also their own publisher — go figure!). Baigent and Leigh alleged that their Holy Blood, Holy Grail was the source for Brown's megaseller The Da Vinci Code, and they deserved a piece of the action. Don't be too surprised that I was rooting for Dan.

Yes, I know, you're going to point out that I have been utterly beastly about his writing, again and again and again and again and again and again and again and again. Totally beastly. I know. I hate myself, OK? But the things I said (all of them true) were almost one hundred percent devoted to his unintentionally hilarious phraseological bungling. Not his honesty.

Certainly (if I may break a lifelong rule by understating a little), Dan is not a master of of fine English prose. Nonetheless, his plots barrel along nice and fast (the whole of the action always takes place within a 24-hour span), and though they may range between the mildly implausible and the utterly incredible, he hasn't been ripping his ideas off from second-rate pseudohistorical books about Jesus. He makes his stuff up; he doesn't rip it off. Unreliable as the research underlying his books often is, I have no doubt that it is honestly done by Dan and his wife.

Baigent and Leigh's may be madly jealous that Dan made a multi-hyper-megaseller out of some shreds of speculative history of theirs and others', but that doesn't mean Dan's reading of their sloppy myth-making book amounts to anything like grounds for a plagiarism case.

Let me tell you, I know plagiarism, and I don't approve. My philosopher partner Barbara showed me a truly staggering case: a student in an advanced undergraduate philosophy of science class had been asked to write (after much preparatory classroom discussion) an answer to a very specific question about how philosopher of biology Helen Longino could maintain that science could still be objective despite also claiming that its development was influenced by the scientist's social milieu. The student began the (totally irrelevant) essay thus:

Helen Longino has written a timely book that fills a critical gap in the existing literature between philosophy of science and the social studies of science. Her exposition of scientific inquiry as a context-laden process provides the conceptual tools we need to understand how social expectations shape the development of science while at the same time recognizing the dependence of scientific inquiry on its interactions with natural phenomena. This is an important book precisely because there is none other quite like it.

Yes, it does sound a bit like a blurb, doesn't it? And that's because it is. The student simply copied and pasted the above from the blurb by Evelyn Fox Keller on the Amazon.com web site. I hope that astounds you as much as it does me. I don't ever want to start thinking of this sort of appalling brainless dishonesty by college students as if it were normal.

Yes, there is plagiarism out there. It is occasionally perpetrated by second-rate history authors, but much more of it is done by lazy students with cotton wool for brains who are so out of touch with reality that they don't realize their professor can spot a suspiciously over-professional phrase in one second, and in about 15 seconds more can induce Google to provide a report on where it came from. Dan Brown is not to be accused of this sort of pathetic intellectual shoplifting. Dan writes his own awkward and clunky prose after reading enough books to have the necessary facts to plant the clues that drive the plot, and he has made his millions honestly. My hat's off to him. I have never liked seeing misuses of the civil law (like intimdatory libel suits) that might tend to trammel free linguistic creativity, and I'm glad Baigent and Leigh lost their shirts (they have been ordered to cover an estimated $1.75 million of Random House's costs, I have read). I hope Dan and his wife celebrated with a bottle of champagne so expensive that I will never even see let alone taste it. They have earned their Dom Perignon. Cheers!

Posted by Geoffrey K. Pullum at 05:15 PM

My name is Hare and I know nothing

A happy Pesach, Paschal Triduum and Easter for all who celebrate. Here's an Easter-related translational oddity just in time for Holy Week.

I've installed antivirus software from Avira, a German company, and I have no complaints about its general usefulness. Occasionally, though, I get pop-up messages suggesting I upgrade from AntiVir Classic to AntiVir Premium. The latest such exhortation, using Eastertime as its "hook," came across as utterly bizarre in its English rendering:

"Now there's the rub", ...

... is still a quite usual comment of some users, although spyware programs already represent a common danger — not only during the Easter time.

We recommend you to cock your ears and protect yourself straight at Easter — with AntiVir PersonalEdition Premium — and spyware won't give you a clip round the ears.

I'm used to seeing strange English on the Web (and in my email spam-trap), but this one had an intriguing mixture of wildly off-the-mark idioms that seemed to have nothing to do with the ad's Easter theme. So I tracked down the original German version on Avira's website:

"Mein Name ist Hase, ...

... ich weiß von nichts", sagt immer noch mancher Anwender trotz der Gefahren, die von Spyware-Programmen ausgeht. Und das nicht nur zur Osterzeit. Kein Wunder, wenn sich diese ungebetenen Gäste dann auf dem Rechner ungestört breit machen können.

Besser, Sie wissen Bescheid und ziehen Spyware-Eindringlingen die Löffel lang - mit der AntiVir PersonalEdition Premium. Am besten gleich zu Ostern.

Now it started to become clear what was going on. The German copywriter was attempting a visual pun, linking the Easter bunny (Osterhase) in the animated graphic with the idiomatic expression Mein Name ist Hase, ich weiß von nichts ("My name is Hare and I know nothing"). Then whoever was charged with creating the English copy must have misread a dictionary of idioms and selected "Now there's the rub" as the English equivalent, even though that has nothing to do with the profession of ignorance in the original expression.

Since I'm not proficient in German, my initial search on the Mein Name ist Hase saying only traced it back to a 1971 hit song by Chris Roberts. I suspected it was older than that, so I called upon a good friend of Language Log — the masterfully multlingual Chris Waigl (author of the Diacritiques blog and keeper of the Eggcorn Database). Chris was as enlightening as always:

Yes, it's a visual pun. The saying/quote "Mein Name ist Hase" is much older, though, and an amusing bit of pop cultural history. The legend goes (and the Duden accepts this as at least not totally off the wall) that in 1854 a fraternity comrade of the Heidelberg student Victor von Hase, son of the theologian and church historian Karl von Hase and originally from Jena, killed someone in a duel and fled, using von Hase's student ID papers. When he was caught and brought to court in Strasbourg, Karl von Hase was prosecuted for assisting a fugitive. When he was summoned before the court, he is said to have given the following statement "Mein Name ist Hase. Ich verneine alle Gegenfragen. Ich weiß von nichts." (Free translation: "My name is Hase. I refuse all cross-examination. I don't know anything about this." Usually quoted without the middle sentence as "Mein Name ist Hase, ich weiß von nichts," or even just as "Mein Name ist Hase.")

You'd be unlikely to find "Mein Name ist Hase ..." employed as a straightforward profession of ignorance. There has to be at least a little bit of self-irony or jocularity in it. While searching for the exact wording I got reminded that the Bugs Bunny animated clips used to be broadcast in German under the title "Mein Name ist Hase". I was as addicted to them as any child my age, but didn't remember.

So to sum up, the German original has a pun based on the Easter Bunny on the one hand (Osterhase — Hase (der) is hare, actually, while rabbit is Kaninchen (das), but that's not true for the Osterhase) — and the von Hase quote (or the saying derived from it) on the other. Which the translation just drops, and thereby creates some major weirdness in English.

And what about the final bit of weirdness, the "clip around the ears" idiom? Chris explains that this involves a second pun in the German original: die Löffel lang ziehen, literally meaning "pull the [rabbit's] spoons." Rabbit ears are called Löffel in German, alluding to their spoon-like shape. So "give a clip round the ears" does actually bear some resemblance to the original German idiom, but of course the whole point of the pun is waylaid in the translation. Nonetheless, the English copywriter has inserted yet another "ear" idiom ("cock your ears") for good measure.

Mulling over this train wreck of a translation I started to wonder if the whole thing might have been intentional. After all, if the English copy had been unremarkable, I would probably have treated the pop-up as as a minor irritation and closed the window at once. Instead, here I am spending a great deal of time trying to make sense of the eccentric English. I don't know if the ad will move many copies of the premium antivirus software, but it certainly got my attention.

(This is all reminiscent of that classic work of unintentional cross-linguistic humor: English As She Is Spoke, an 1855 English phrasebook for Portuguese students credited to José da Fonseca and Pedro Carolino. It was long assumed that the two authors wrote the book without actually knowing any English or having access to a Portuguese-English dictionary. Instead, the story went, they tried to craft the phrasebook from a Portuguese-French dictionary and a French-English dictionary with predictably preposterous results. As it turns out, José da Fonseca was an accomplished scholar of languages who published perfectly competent French-English and Portuguese-French phrasebooks. Pedro Carolino apparently pirated Fonseca's Portuguese-French phrasebook by translating the French parts into English and then published the work with both of their names on it. The true story, told here and here, was uncovered after the publication of the Collins Library edition in 2002, with sleuthing done by Alex MacBride, then a grad student in linguistics at UCLA.)

[Related posts:

The bird clapper: a new tool in semiconductor fabrication (2/8/04)
Never pronouncing East Thursday? (2/6/05)
Tong-maker the Kong-maker, and other translational follies (2/2/06)
Engrish explained (3/11/06) ]

Posted by Benjamin Zimmer at 01:43 PM

April 13, 2006

Double cousins?

I am not an expert in kinship vocabulary, even in my native language, and the other day I found myself wondering whether either English or Spanish had a word or short phrase for a concept that arises in connection with my sister-in-law and her cousin. Is there a term for a cousin whose father is the brother of your father and whose mother is the sister of your mother (or perhaps, whose father is the brother of your mother and whose mother is the sister of your father)?

This arose when Barbara and I recently spent some time together with my brother Richard and his wife Amparo at an apartment in Spain borrowed from Amparo's cousin Begonia. (Hence my two-week absence from Language Log Plaza.) Richard and Amparo were indulgent to me as they drove me around some of Spain's stunning grandeur and I slept off my jetlag in the back seat of their car, and then after a few days plugged in my laptop in and worked on a paper I had to give in England at the (very enjoyable) DELS conference. We learned while we were there that Begonia is Amparo's cousin twice. Amparo's father's brother is Begonia's father; but if he had not been, Begonia would still have been Amparo's cousin, because Amparo's mother's sister is Begonia's mother, and either of those connections would suffice for cousinhood. So what happened was that two brothers chose as wives a pair of sisters; Amparo is a daughter from one marriage in such a pair of marriages, and Begonia is a daughter from the other. So it is clear (I think; I'm not a geneticist any more than I am a cultural anthropologist) that Amparo and Begonia are more closely related than most people are to their cousins. But what do we call people in that relationship? Double cousins?

Now for the updates (posted April 14)...

The answer to what anthropologists call the relationship (information mailed to me overnight by several people, including people that I would have talked to at the water cooler if I had been at Language Log Plaza) is that Amparo and Begonia are bilateral parallel cousins. If I have it right (with a lot of help from my friends, Molly Aplet especially), that is the technical name for the relation that holds between two people if their fathers are brothers and their mothers are sisters. In an earlier version of this update I said that Amparo and Begonia are bilateral cross cousins. That is a closely related notion; it would hold between Amparo and Begonia if Amparo's father and Begonia's mother were brother and sister, and Amparo's mother was the sister of Begonia's father. The term "double first cousins" has also been used in less technical contexts for both of these relationships (bilateral parallel cousinhood and bilateral cross cousinhood), and Jonathan Lundell tells me that the term his family always used was indeed the one I guessed at the outset, "double cousins". (I just knew there would be some kind of everyday terminology for it.) On the genetics question (how close is the relationship?), this discussion by a geneticist describes it as hard, and gives an inconclusive but informative review of the issues. Australian linguist David Nash has pointed out to me that p.9 of Mathematical Population Genetics by Warren J. Ewens (2004; acknowledgments to Amazon.com's Search Inside feature) says the correlation for double first cousins is slightly higher than that between an uncle and nephew, but a fair bit less than between full siblings, citing Fisher (1918), but that may be about phenotypes. Chris Maloof thinks the answer regarding genotypes is "easy to get with even high school biology": he says, "The genetic relatedness of double cousins is just 1/4, the same as an uncle to a nephew . . . One cousin shares 1/4 of his genes with his aunt and with his uncle (who are unrelated), so he'll also share 1/4 of his genes with their offspring." That's what we have for you at the moment (Fri Apr 14 13:43:59 EDT 2006).

Acknowledgments to: Molly Aplet, John Cowan, Karen Davis, Mark Liberman, Jonathan Lundell, Chris Maloof, Marilyn Martin, David Nash, Ben Zimmer, and abnu at WordLab. Thank you all.

Posted by Geoffrey K. Pullum at 09:16 PM

A brief history of "spaz"

Tiger Woods landed in hot water after he made this comment in a post-round interview with CBS at the Masters Tournament:

I was so in control from tee to green, the best I've played for years... But as soon as I got on the green I was a spaz.

Tiger's use of spaz, an epithet derived from spastic, caused nary a ripple in the U.S., but when it hit British newspapers there was a significant uproar. "Extraordinarily insensitive," said Lewine Mair in The Telegraph. "Woods sure to regret remark," read the headline in The Scotsman. "Some interpreted this as a grievous insult to handicapped people all over the world," said The Independent. "I don't think he meant to be that offensive but it's something nobody in his position should be saying," Paralympian Dame Tanni Grey Thompson told the BBC.

Tiger quickly apologized, saying through a spokesman that he "meant nothing derogatory to any person or persons and apologizes for any offense caused." But it's doubtful that he realized he had anything to apologize for until the firestorm in the British press. So how did the word spaz become innocuous playground slang in the U.S. but a grave insult in the U.K.?

There's no question that spaz is a shortened and altered form of spastic, a term historically used to describe people with spastic paralysis, a condition that disables the part of the nervous system controling motor coordination. (The congenital form of spastic paralysis is now commonly known as cerebral palsy.) Spastic and its clipped form spaz (sometimes spelled spas or spazz but always pronounced [spæz], influenced by spasm and spasmodic) eventually developed a contemptuous sense to describe not just those afflicted with spastic paralysis but anyone who lacks coordination or physical competence. In the U.S., a verb form of spaz, also appearing as spaz out, came to refer to losing physical control or simply acting "weird" or "uncool."

It's unclear how long these derogatory senses have been kicking around, since they were evidently taboo from early on and considered unfit for publication. (As the erstwhile Oxford English Dictionary editor Robert W. Burchfield wrote in a note appended to the entry for spastic, the epithet "is generally condemned as a tasteless expression, and is not common in print.") Many people report that spaz, meaning a clumsy or foolish person, was in common use in the mid- to late '50s here in the U.S. In a discussion on the alt.usage.english newsgroup, Joe Fineman (Caltech class of '58) reproduced this journal entry he wrote in 1956, in a section on the language of Caltech students:

SPAZ, n.R (shortened from _spastic_) 1. _Obsolete._ A person lacking in the common social skills & virtues. See TWITCH. 2.
To surprise a person in a way that causes him to take some time to react. v.R

(The "R" means "regional or national" — i.e., I was aware at the time that this was not just Caltech slang. The noun was, of course, obsolete only at Caltech, where it had been replaced by the allusive "twitch".)

The term may have already been on its way out at Caltech, but both the noun and verb were catching on in various parts of the country in the late '50s. The earliest print reference cited by the OED is actually for the verb, even though the noun form must have come first:

1957 Hammond (Indiana) Times 6 Nov.B2/6 Jewelers, furriers, and furniture dealers go through similar merchandising tortures whenever Wall Street spazzes.

This usage may have been deemed acceptable by the Hammond Times editors because it doesn't allude directly to someone with spastic paralysis but instead figuratively extends the term to the uncontrolled ups and downs of Wall Street. And when the noun spaz finally began to be used in mainstream print publications in the mid-'60s, it was used in a sense well removed from spastic. Here is the earliest cite in the OED, from film critic Pauline Kael in 1965, along with another cite I found from that year in a New York Times column by Russell Baker:

1965 P. KAEL I lost it at Movies III. 259The term that American teen-agers now use as the opposite of 'tough' is 'spaz'. A spaz is a person who is courteous to teachers, plans for a career..and believes in official values. A spaz is something like what adults still call a square.

"Observer: America's New Class System," New York Times, Apr. 11, 1965, p. E14
Your teen-age daughter asks what you think of her "shades," which you are canny enough to know are her sunglasses, and you say, "Cool," and she says, "Oh, Dad, what a spaz!" (Translation: "You're strictly from 23-skidoo.")

So by the time Kael and Baker noticed teenagers using spaz, the sense had already shifted to 'uncool person,' without reference to lack of motor coordination. But that doesn't mean the 'clumsy' sense, with echoes of spastic, was no longer in use at the time. The earliest public attestation that I know of for the uncoordinated sense of spaz is the undeniably tasteless garage-rock single "Spazz" by The Elastik Band (Atco #6537, Nov. 1967), included in the box set Nuggets: Original Artyfacts from the First Psychedelic Era 1965-1968. (This is also the earliest example I know of for the double-z spelling of the noun spazz.) The crude but catchy refrain goes:

I said, get offa the floor, get offa the floor, boy,
People gonna think, yes they're gonna think, people gonna think you're a spazz.

It's still baffling how this single ever got released by a major record label, and unsurprisingly it ended up receiving very little airplay. (Besides the dubious use of spazz, DJs were no doubt also wary of the explicit drug reference in the lyrics: "But when you turn around some joker slipped you LSD.")

In any case, the clumsy or inept meaning of spaz remained mostly on the playground until the late 1970s, when it began seeping into American popular culture. In 1978, Saturday Night Live started running occasional sketches starring "The Nerds," with Bill Murray as Todd DiLamuca and Gilda Radner as Lisa Loopner. On two shows that year (Apr. 22 and Nov. 4), host Steve Martin joined in, playing the character Charles Knerlman, or "Chaz the Spaz" as he was known to Todd and Lisa. (A side note: in one of the sketches, "Nerds Science Fair," Chaz the Spaz says to Lisa, "That's a fabulous science fair project... not!" Though this was hardly the first use of "not" for sarcastic negation, it may have laid the groundwork for usage in the "Wayne's World" sketches and movies a decade or so later.) A year after the SNL sketches in 1979, Bill Murray starred in the summer-camp comedy Meatballs, which featured a stereotypically nerdy character played by Jack Blum called "Spaz."

For someone like Tiger Woods who came of age in the '80s (and who, incidentally, is on record as saying that another Bill Murray movie, Caddyshack, is his all-time favorite), the American usage of spaz had long lost any resonance it might have had with the epithet spastic. This is not the case in Great Britain, however, where both spastic and spaz evidently remain in active usage as derogatory terms for people with cerebral palsy or other disabilities affecting motor coordination. A BBC survey ranked spastic as the second-most offensive term for disabled people, just below retard. (Spaz does not appear on the list, though presumably it was just considered a variant form of spastic.) The BBC attributes the British resurgence of the epithet to publicity in the early '80s surrounding a man with cerebral palsy named Joey Deacon, particularly his appearance on the children's television show Blue Peter in 1981. The word spaz and other variants like spazmo became firmly connected with Deacon among British youth, according to the BBC report.

All of this helps explain the reaction Tiger's comments engendered in the U.K. press. It would be helpful for British golf fans (and activists for the disabled) to know, however, that Tiger grew up with Bill Murray, not Blue Peter, and he was no doubt oblivious to the cultural resonances the term might have had across the Pond.

[Update #1: Some additional insight from Chris Brew:

Growing up as a mildly physically handicapped teenager in British boys' private schools, I can report that 'spaz' and 'spastic' were routine in the same "lacking in common social skills and virtues" sense that they were being used at Caltech. Joe Fineman's gloss is brilliantly exact for how I recall it being used, but it is a while ago and I don't have a journal entry. This would be late 60's and early 70's.

When it crossed people's minds that I actually was a spastic, they were usually surprised and bit embarrassed by having said something with a sense that they hadn't thought of. Then, depending on testosterone levels, whether they liked me, and how polite they were, they either apologised or didn't. But I knew that they knew that they felt they should have. So it must have been reasonably offensive, but the Caltech sense was there too.

Also, I'd hope to have seen some reference to the wonderful Ian Dury's Spasticus Autisticus. This gives another sidelight on how Brits would react to 'spaz' (lyrics, Dury obituary). This is a deliberately confrontational piece, which was written for the Year of the Disabled, and banned from radio play for its trouble.

I should note that Joey Deacon's appearance on Blue Peter also occurred during the Year of the Disabled (1981). I don't recall any pop-cultural events connected with that commemoration in the States, however.]

[Update #2: Kellen of The Definitive Truth points out that spastic also ranks highly on the BBC's "ranked list of rudeness" that Mark Liberman wrote about. Turns out it's slightly less offensive than twat and piss off, and slightly more offensive than slag and shit.]

[Update #3: Caity Taylor writes:

There used to be a charity in the UK called the Spastic Society for people with cerebral palsy. Because of the offensive use of spastic, spaz and spakka (on Wearside in the north east of England, at least), they had to change the charity's name. It's now called Scope. This hasn't really had the desired effect though: people have merely gained a new insult: scopers. ]

[Update #4: The blogger Interrobang, who has cerebral palsy, shares an anecdote similar to Chris Brew's at The Interroblog (cross-posted here).]

Posted by Benjamin Zimmer at 07:38 PM

And one more...

If you're willing to extend "official" to the state level, you could add Hawai'i to Bill's list of places where indigenous languages are official -- since 1978, Hawai'ian has shared official status with English.

Posted by Geoff Nunberg at 07:22 PM

Police: Dead rapper fired first shot

Until you saw this improbable sounding CNN headline (via Wonkette), maybe you thought that sentences were associated with a single time, as picked out by the verb's tense. But it ain't necessarily so. Though it was an expensive way for the rapper 'Proof' to prove it, a noun phrase, say dead rapper, can be interpreted at a completely different time from the main verb. The firing event apparently took place around 4:30AM on Tuesday at the CCC club in Detroit, while dead rapper first described Proof only afterwards, maybe not long before he was pronounced dead on arrival at a local hospital. While we're at it, first shot also only became an apt description sometime after the shot was fired.

If this sort of thing excites you (temporal semantics, not dead rappers), then get in line for the first editions of Judith Tonhauser's PhD dissertation on temporal interpretation of noun phrases, due to be completed sometime this summer. No pressure, Judith! (As a preview, see e.g. this paper of hers for some relevant, though technical discussion.)

By the way, from a linguistic point of view it looks like Proof's big mistake was allowing himself to be pronounced upon, and if I were you, I'd never let anyone pronounce you anything. Based on Google counts, you're over 5000 times more likely to be pronounced dead than pronounced alive. More optimistically, you have a better than 1 in 100 chance of being pronounced husband and wife rather than dead. But can marriage really merit such a risk?

Posted by David Beaver at 01:09 AM

April 12, 2006

Where are Native Languages Official

A friend asked me where in the Americas indigenous languages are official at the national level and I thought other people might be interested. As far as I know only three countries have a native language as an official language:

Bolivia	Aymara
	Quechua
	Spanish
Paraguay	Guaraní
	Spanish
Peru	Aymara
	Quechua
	Spanish

If you use a broader notion of "indigenous" you could include Haiti, where both French and Haitian Creole are official.

Posted by Bill Poser at 09:16 PM

Pronominal perplexity at the AP

Looks like the Associated Press today had a little bit of what Daffy Duck memorably called "pronoun trouble."

That's how the headline appeared on the ABC News website, and several other news sites that reproduced the AP story, such as Yahoo! News, Forbes, and Business Week, went with the exact same wording. Other versions of the AP headline must have gone out over the wire, as a Google News search turns up less vexatious variations, including: "Skilling says he, Lay never broke the law," "Skilling testifies that he and Lay never broke the law," and the pronoun-free "Skilling defends Lay At Enron trial."

In standard journalistic usage, we wouldn't expect the nominative pronoun he to show up in the headline, since the coordinate NP he and Lay appears as a direct object complement of the verb call. Like other members of what Beth Levin dubs the class of "dub verbs" (anoint, baptize, brand, etc.), call here takes two verbal complements: he and Lay (those who are called something) and a good team (what they are called). If a personal pronoun with a nominative-accusative distinction (I/me, we/us, he/him, she/her, they/them) shows up in the first complement of this sort of "double object" construction, norms of standard English dictate the accusative form (He branded me a liar, They named her CEO).

Things get a little tricky, though, when the pronoun has a subject antecedent and is coordinated with another NP (in this case Lay). If the headline writer opted for the accusative case, there are four possibilities, depending on the order of the coordinates and whether a reflexive pronoun is used:

(a) Skilling_i calls him_i and Lay a good team.
(b) Skilling_i calls Lay and him_i a good team.
(c) Skilling_i calls himself_i and Lay a good team.
(d) Skilling_i calls Lay and himself_i a good team.

Of these, (d) is likely the most acceptable in terms of standard usage, though it lacks the punchiness associated with headlinese. None of the selections sound particularly euphonious, which may have contributed to the AP writer's non-standard choice of the nominative case in the coordinate object he and Lay. Another possible contributing factor is suggested by Arnold Zwicky's post here last year, "Case nuances." Arnold notes that certain non-standard uses of nominative pronouns in coordinate objects are more acceptable than others. For instance, many people find sentences of type (e) moderately acceptable but reject (f) out of hand:

(e) Rachel wants you and I to...
(f) Rachel likes you and I.

In the above examples, Arnold writes, "the effect seems to have something to do with the fact that the coordinate NP is interpreted as the subject of the VP that follows it." Though there's no VP following he and Lay in the Associated Press headline, there is another complement, a good team. So it's possible that the two complements for call are treated together as a "small clause," encouraging a reading where the first complement is interpreted as the subject of the second, thus taking the nominative case.

I would also guess that the AP headline is the result of some rushed editing. The headline could have initially read, "Skilling says/claims/testifies (that) he and Lay were a 'good team,'" where he and Lay serves as the subject of a relative clause. Then an editor could have replaced "says (that)..." with "calls..." without changing the coordinate NP. But it's still telling that the headline was able to go out on the wires that way and was then deemed acceptable by editors at ABC News, Business Week, and elsewhere. The copy editors have unwittingly performed their own mini-experiment on the acceptability of a type of non-standard (soon to be standard?) pronominal usage.

Posted by Benjamin Zimmer at 08:13 PM

Say Anything

While I'm on the subject of Paul J. J. Payack, I might mention a bizarre remark he made in the CBS.com piece I mentioned in my earlier post.

Payack knows he's got skeptics, but he identifies the real enemies of his research as "postmodernists and deconstructionists" who tend to deny that any definition of a word is suitable. Payack's definition: "anything that can be understood. If millions of people are saying 'bling bling,' we'll accept that."

"Postmodernists and deconstructionists" have been called a lot of things, of course -- "anything-goes" relativists, enemies of reason, and America-haters, among other things -- but not even their most ferocious critics would be likely to describe them as defenders of the idea that words have fixed meanings against the popular tendency to use language in novel and creative ways.

But as best I can divine, what Payack means by "postmodernists and deconstructionists" is simply "snooty academics." It's a sign of how successful the cultural right has been in its attacks on the academy that even people who don't have a clue what postmodernism and deconstruction mean can use the words to evoke anti-intellectual stereotypes, even when the positions they're charging academics with holding are the opposite of the ones that postmodernists have been accused of promulgating. (Stanley Fish, meet Mr. Chips.) But then, if millions of people are saying "bling bling"...

Posted by Geoff Nunberg at 08:04 PM

Jafaican

Jonathan Brown at the Independent tells us that "Jafaican and Tikkiny drown out the East End's Cockney twang". Laura Clark at the Daily Mail worries that "'Jafaican' is wiping out inner-city English accents". At LSE (that's "Life Style Extra", not "London School of Economics"), the headline is "Cockney loses out to Jafaican". The Guardian offers to help us "Learn Jafaikan in two minutes". BBC Voices has an interview with one of the linguists responsible documenting these developments, under the title "A 'nang' new accent".

These are all echoes of an on-going study on Linguistic Innovators: The English of Adolescents in London, by Jenny Chesire and Sue Fox of Queen Mary College, University of London, and Paul Kerswill and Eivind Torgersen of Lancaster University.

I don't have much to add to what you can learn from the project overview and the publications available on the project website. I'll note that the news coverage focuses mainly on vocabulary, though there is some discussion of the sound changes involved. I'll also note that it's not clear to me whether this is a single trend or a set of related developments -- Sue Fox's BBC interview focuses on the influence of Bangladeshi pronunciations on the speech of Cockney youth, whereas some of the other news articles are mainly focused on influences of Jamaican English, which is very different. The sociolinguists seem to prefer the term "multicultural London English", which is less likely to mislead than "Jafaican" is, but also less likely to be adopted into ordinary usage, I'm afraid. To my disappointment, I've been unable to find any sound clips illustrating the trends -- surely in this multi-media age, someone can produce a few sound clips to back up all the textual verbiage! Write and tell me if you know of any -- or can produce your own, citizen-journalist style! (And I've already got plenty of Ali G clips -- I like Ali G, I'm just looking for some authentic examples.)

I'll also mention something that Sue Fox says in part of her BBC interview: among young people from a Cockney background, it's the boys who are leading the way towards the new multicultural blend, while the girls tend to hang on to Cockney features. This is the opposite of the usual pattern for sound change in progress, suggesting that the social dynamic has some special features in this case.

[Note that the spelling is sometimes Jafaican (Independent, Daily Mail) and sometimes Jafaikan (BBC, Guardian).]

[Hat tip to Abnu of Wordlab]

Posted by Mark Liberman at 05:32 PM

Hackery, Quackery, Schlock

With an apparent million entries in his (or his PR person's) Rolodex, Paul J. J. Payack has once again managed to get media attention for his loopy claim to have determined the exact size of the English vocabulary, this in an article by Christine Lagorio at CBSnews.com.

The article does contain criticisms from language experts and lexicographers like Jesse Sheidlower, whose more extensive debunking of Payack's claims appeared a few days ago in Slate. (See also Ben Zimmer's post of a couple of months ago). But the piece is written in the "evenhanded" he-said-she-said style that journalists fall back on when they're either too lazy or too timorous to check their facts (Sheidlower is described as belonging to the "skeptics camp," for example, as if there were any other). The effect is to leave the reader with the impression that Payack is a participant in a legitimate scientific controversy, rather than simply an opportunistic charlatan. (As I put the point in a "Fresh Air" piece which I'll post after it runs in a week or so, "trying to count the words of the English language is as idiotic an exercise as trying to determine exactly how many socks Americans lost in 2005.") When the media cover stories about global warming or Intelligent Design that way, it's accounted a sign of the brainless irresponsibility of modern journalism; when the subject is language, nobody seems to care. Cue the Bee Gees' "It's Only Words."

Posted by Geoff Nunberg at 04:08 PM

The shape of a spoken phrase

This is a fragment of a work in progress. Jiahong Yuan, Chris Cieri and I have been exploring the ways that speech rate is affected by who you are, what you're talking about, who you're talking to, what language you're speaking, what setting you're in, how you feel about it all, and so forth.

One of the many relevant factors is phrase length: other things equal, shorter phrases tend to be slower. This is mainly because spoken phrases, like musical phrases, unfold in a characteristic way, with a small initial accelerando and a larger final ritard. Short phrases start the final slow-down before they have a chance to speed up; longer phrases have increasingly large proportions of more rapidly-spoken words. As part of an effort to document and model this effect -- so as to be able to normalize it away, among other things -- I did a simple analysis of a small published corpus of English conversational telephone speech. The results are very pretty.

The collection that I used is commonly known as Switchboard. It was recorded at Texas Instruments in 1990-1, originally for use in a speaker-identification study, and has been distributed by the LDC since 1992. The transcripts and time-stamps were corrected at Mississippi State a few years later, and the version that I used can be downloaded here. There are 2,438 conversations, or 4,876 conversational sides.

The transcripts are available in two forms. One set of files divides each conversational side into segments convenient for human transcription or reading. Another set of files assigns a start-time and end-time to every individual word. A (fragment of a) conventional transcript of one of these conversations might look like this:

A: Um yeah I'd like to talk about how you dress for work and and um what do you normally what type of outfit do you normally have to wear?
B: Well I work in uh corporate control, so we have to dress kind of nice, so I usually wear skirts and sweaters in the winter time, slacks I guess [noise] and in the summer just dresses.

But the performed phrasing of conversational speech is not notated in a conventional transcript. One easy-to-see symptom of performed phrasing is the introduction of silent pauses, as shown in the word-by-word time-stamped version of A's turn:

sw2001A-ms98-a-0002 1.724625 2.273625 [silence]
sw2001A-ms98-a-0002 2.273625 2.927625 um
sw2001A-ms98-a-0002 2.927625 3.221500 [silence]
sw2001A-ms98-a-0002 3.221500 3.661750 yeah
sw2001A-ms98-a-0002 3.661750 3.957625 i'd
sw2001A-ms98-a-0002 3.957625 4.107625 like
sw2001A-ms98-a-0002 4.107625 4.267625 to
sw2001A-ms98-a-0002 4.267625 4.527625 talk
sw2001A-ms98-a-0002 4.527625 4.941625 about
sw2001A-ms98-a-0002 4.941625 5.126125 [silence]
sw2001A-ms98-a-0002 5.126125 5.307625 how
sw2001A-ms98-a-0002 5.307625 5.437625 you
sw2001A-ms98-a-0002 5.437625 5.735375 dress
sw2001A-ms98-a-0002 5.735375 5.901125 [silence]
sw2001A-ms98-a-0002 5.901125 6.077625 for
sw2001A-ms98-a-0002 6.077625 6.477625 work
sw2001A-ms98-a-0002 6.477625 6.817625 and
sw2001A-ms98-a-0002 6.817625 7.217625 and
sw2001A-ms98-a-0002 7.217625 7.523125 um
sw2001A-ms98-a-0002 7.523125 7.677500 [silence]
sw2001A-ms98-a-0002 7.677500 7.777625 what
sw2001A-ms98-a-0002 7.777625 7.867625 do
sw2001A-ms98-a-0002 7.867625 7.967625 you
sw2001A-ms98-a-0002 7.967625 8.624625 normally
sw2001A-ms98-a-0002 8.624625 8.797625 what
sw2001A-ms98-a-0002 8.797625 9.067625 type
sw2001A-ms98-a-0002 9.067625 9.307625 of
sw2001A-ms98-a-0002 9.307625 9.707625 outfit
sw2001A-ms98-a-0002 9.707625 9.777625 do
sw2001A-ms98-a-0002 9.777625 9.847625 you
sw2001A-ms98-a-0002 9.847625 10.237625 normally
sw2001A-ms98-a-0002 10.237625 10.397625 have
sw2001A-ms98-a-0002 10.397625 10.547625 to
sw2001A-ms98-a-0002 10.547625 10.961250 wear
sw2001A-ms98-a-0002 10.961250 11.561375 [silence]

Similarly, t he word-by-word version of B's turn is:

sw2001B-ms98-a-0003 10.166375 10.764125 [silence]
sw2001B-ms98-a-0003 10.764125 11.189250 well
sw2001B-ms98-a-0003 11.189250 11.302250 i
sw2001B-ms98-a-0003 11.302250 11.496375 work
sw2001B-ms98-a-0003 11.496375 11.676375 in
sw2001B-ms98-a-0003 11.676375 11.846375 uh
sw2001B-ms98-a-0003 11.846375 12.326375 corporate
sw2001B-ms98-a-0003 12.326375 12.866375 control
sw2001B-ms98-a-0003 12.866375 13.096375 so
sw2001B-ms98-a-0003 13.096375 13.186375 we
sw2001B-ms98-a-0003 13.186375 13.346375 have
sw2001B-ms98-a-0003 13.346375 13.456375 to
sw2001B-ms98-a-0003 13.456375 13.706375 dress
sw2001B-ms98-a-0003 13.706375 13.946375 kind
sw2001B-ms98-a-0003 13.946375 14.006375 of
sw2001B-ms98-a-0003 14.006375 14.518000 nice
sw2001B-ms98-a-0003 14.518000 15.104500 [silence]
sw2001B-ms98-a-0003 15.104500 15.316375 so
sw2001B-ms98-a-0003 15.316375 15.386375 i
sw2001B-ms98-a-0003 15.386375 15.656375 usually
sw2001B-ms98-a-0003 15.656375 15.946375 wear
sw2001B-ms98-a-0003 15.946375 16.366375 skirts
sw2001B-ms98-a-0003 16.366375 16.861875 and
sw2001B-ms98-a-0003 16.861875 17.614875 sweaters
sw2001B-ms98-a-0003 17.614875 18.066875 [silence]
sw2001B-ms98-a-0003 18.066875 18.216375 in
sw2001B-ms98-a-0003 18.216375 18.286375 the
sw2001B-ms98-a-0003 18.286375 18.578375 winter
sw2001B-ms98-a-0003 18.578375 19.105750 time
sw2001B-ms98-a-0003 19.105750 19.401500 [silence]
sw2001B-ms98-a-0003 19.401500 19.936375 slacks
sw2001B-ms98-a-0003 19.936375 20.009500 i
sw2001B-ms98-a-0003 20.009500 20.405625 guess
sw2001B-ms98-a-0003 20.405625 21.125000 [noise]
sw2001B-ms98-a-0003 21.125000 21.236375 and
sw2001B-ms98-a-0003 21.236375 21.346375 in
sw2001B-ms98-a-0003 21.346375 21.406375 the
sw2001B-ms98-a-0003 21.406375 21.766375 summer
sw2001B-ms98-a-0003 21.766375 21.986375 just
sw2001B-ms98-a-0003 21.986375 22.468125 dresses
sw2001B-ms98-a-0003 22.468125 22.813500 [silence]

Silent pauses are not the only symptom of conversational phrasing, but they're a relatively straightforward and intersubjectively stable indicator. If we defined the conversational phrases to be the stretches between silent pauses, in the quoted exchange we get

A:	um yeah i'd like to talk about how you dress for work and and um what do you normally what type of outfit do you normally have to wear
B:	well i work in uh corporate control so we have to dress kind of nice so i usually wear skirts and sweaters in the winter time slacks i guess and in the summer just dresses

By this definition, the Switchboard corpus contains 519,598 "performed phrases".

I categorized each "phrase", in this sense, according to the number of "words" in it. (I called every transcribed token a word, including ums and uhs and partial words and so on, but not including things transcribed as "[noise]".) Printed out as one phrase per line, with the duration of each word following it, the result looks like this.

1 um 0.654
6 yeah 0.440 i'd 0.296 like 0.150 to 0.160 talk 0.260 about 0.414
3 how 0.182 you 0.130 dress 0.298
5 for 0.177 work 0.400 and 0.340 and 0.400 um 0.306
14 what 0.100 do 0.090 you 0.100 normally 0.657 what 0.173 type 0.270 of 0.240 outfit 0.400 do 0.070 you 0.070 normally 0.390 have 0.160 to 0.150 wear 0.414
15 well 0.425 i 0.113 work 0.194 in 0.180 uh 0.170 corporate 0.480 control 0.540 so 0.230 we 0.090 have 0.160 to 0.110 dress 0.250 kind 0.240 of 0.060 nice 0.512
7 so 0.212 i 0.070 usually 0.270 wear 0.290 skirts 0.420 and 0.496 sweaters 0.753
4 in 0.150 the 0.070 winter 0.292 time 0.527
9 slacks 0.535 i 0.073 guess 0.396 and 0.111 in 0.110 the 0.060 summer 0.360 just 0.220 dresses 0.482

Now for each possible phrasal word count, I averaged the duration of the words in each position of all the phrases of that size. (Of course, what I mean is that I wrote a little computer program to do this.)

For example, there were 41,578 phrases of length 3. The words in the first position of these 3-word phrases had an average duration of 0.259877 seconds; in the second position, the average was 0.267758 seconds; in the final position, the average was 0.393747 seconds.

If we plot these position-wise average durations for phrases from length 1 to length 12, lined up so that all the first-position words are in the same place, we get a plot like this:

Another option is to line up the phrase lengths so that the final positions correspond. This makes the pattern easier to see, in my opinion, since the phrase-final modulation of timing is larger than the phrase-initial modulation:

There's nothing special about the number 12, in this case -- the longer phrases look just as you would expect. I kept the plot to lengths 12 and below to keep the plot from getting too busy.

Here's a table of the numbers from the plot.

	[count]	1	2	3	4	5	6	7	8	9	10	11	12
1	151,995	0.452
2	59,260	0.300	0.384
3	41,578	0.260	0.268	0.394
4	35,483	0.248	0.234	0.267	0.393
5	31,891	0.242	0.225	0.238	0.264	0.397
6	28,545	0.238	0.219	0.228	0.236	0.264	0.395
7	25,388	0.238	0.217	0.225	0.227	0.238	0.263	0.397
8	22,386	0.237	0.217	0.222	0.224	0.229	0.236	0.264	0.394
9	19,306	0.237	0.217	0.221	0.224	0.226	0.228	0.238	0.263	0.394
10	16,485	0.238	0.215	0.221	0.223	0.225	0.226	0.228	0.234	0.260	0.391
11	14,155	0.239	0.215	0.221	0.224	0.225	0.225	0.226	0.228	0.235	0.260	0.393
12	12,124	0.238	0.214	0.220	0.224	0.223	0.223	0.227	0.225	0.227	0.234	0.259	0.388

There's a lot more to say, but for now I'll just end with this. The duration of any particular spoken word depends on many things -- how many syllables are in it, and what kind of vowels and consonants make them up; how emphatically it's pronounced; the dialect and speaker and style of speech; the process of selecting the next word; and on and on. But in this case, we've averaged across all sorts of values for all of these factors in every position of every phrase length, and so what is left to see is the shape of the spoken phrase itself.

Because very large collections of transcribed speech are now available -- Switchboard is small by modern standards -- it's become easy to let the Law of Large Numbers reveal the latent structure of speech. This is a great time to be a scientist.

Posted by Mark Liberman at 12:06 AM

April 11, 2006

Videotaping Interrogations

Today's New York Times has an article about how the Detroit police now plan to videotape interrogations of all suspects in crimes that carry a penalty of life in prison without the possibility of parole (here). This is indeed a welcome sign, because there is wide-spread suspicion that such interrogations sometimes go far beyond anything proper or humane. When they are videotaped, exactly what was said and done can be verified, which can be helpful not only to defense attorneys but also to the police.

In recent years there have been many criticisms of police interrogation techniques. In 1992 William A. Geller strongly argued for videotaping in his report to the National Institute of Justice (Police Videotaping of Suspect Interrogations and Confessions). In a survey Geller conducted, investigators found that in 1990 about a third of of all U.S. police and sheriff departments serving 50,000 or more citizens were then videotaping a least some interrogations, primarily in homicide, rape, battery, and drunk driving cases. Departments reporting that they videotaped said that they began the practice to avoid defense attorney's challenges, to reduce doubts about the voluntary nature of confessions, and to aid a detective's memory when testifying in court about what took place. Every police department surveyed said that it planned to continue the videotaping practice. At first, detectives resisted the idea but most of them eventually came to appreciate it, largely because fewer allegations of coercion or intimidation were made by defense attorneys.

One benefit to videotaping came as something of a surprise. A majority of the departments surveyed reported that the videotaping practice led to improvements in their interrogation techniques. Some even used interrogation tapes as training materials for inexperienced officers.

Notably unmentioned in the Times' article about Detroit's new decision is that doesn't describe exactly how much of the interrogation the police department actually plans to tape record. Obviously, defense lawyers want the entire interrogation taped, not just the final confession part. Some police departments say that it costs too much to videotape everything, opting for recapitulations instead. I've worked on cases in which, after hours of untaped questionning, the police produced a five minute videotape of the suspect admitting that he had committed the crime. We can never know what was said leading up to such confessions or whether it was coercive or intimidating.

If the Detroit police want to do this right, they'll videotape the entirety of the interrogations, not just the recapitulations.

Posted by Roger Shuy at 03:29 PM

Reliable Sources on Classification

The question of where to find reliable information on language classification came up over at Language Hat recently in the comments on this post about some beautiful new language maps that, regrettably, are based on unreliable linguistic classifications. The problem is that some commonly used sources, such as Charles Frederick Voegelin and Florence Marie Voegelin's Classification and Index of the World's Languages (1977), are badly dated, while others, such as Merritt Ruhlen's A Guide to the World's Languages (second edition 1991), are unreliable for other reasons. Ruhlen's book is generally fairly accurate for the lower levels of classification but is quite unreliable at higher-levels due to its reliance on an unsound and subjective approach to linguistic classification.

The best single source of information is the Ethnologue, a publication that lists all of the known languages of the world by region. For each language it provides the classification along with such information as where it is spoken, by how many people, and its endangerment status. The Ethnologue also provides numerous maps showing where languages are spoken, and contains an extensive index of alternative names, since many languages are known by several names. (Because the sponsor of the Ethnologue, the Summer Institute of Linguistics, is an organization whose primary purpose is the translation of the New Testament, entries also indicate the availability of the Bible in the language.)

The classification in the Ethnologue generally reflects the mainstream view of historical linguists. It is occasionally somewhat out of date, and is sometimes criticized for treating what most specialists consider to be dialects as distinct languages, but overall it does a better job than any other reasonably comprehensive publication.

The printed version is accompanied by a CD-ROM and has colored maps. It is also available on-line. There are several indices:

The best way to go beyond the Ethnologue, either to obtain what may be more up-to-date information or to find out more about the reasons for decisions about classification and the problems and controversies in particular cases, is to consult specialized publications on particular areas of the world and language families. Here are some good sources:

Africa: The book African Languages: an Introduction edited by Bernd Heine and Derek Nurse (Cambridge University Press, 2000) contains chapters by prominent specialists on the four major language families of Africa (Niger-Congo, Afro-Asiatic, Khoi-San, and Nilo-Saharan), each of which discusses the problems of classification.
The Americas: American Indian Languages: The Historical Linguistics of Native America by Lyle Campbell (Oxford University Press, 1997) is a comprehensive treatment of the languages of the Americas. In addition to surveys of the classification of the languages of North, Middle, and South America, it contains a discussion of the history of classification of the languages of the Americas, a discussion of the methodological issues that arise in establishing distant genetic relationships, and evaluations of quite a few proposals that are not generally accepted.
: For North America, Marianne Mithun's The Languages of Native North America (Cambridge University Press, 1999) contains a general discussion of the history and problems of classification of the languages of North America, and a catalogue of the languages organized by language family. The catalogue includes both a discussion of the relationship of the languages and other information about the language family.
Australia: Claire Bowern and Harold Koch's Australian Languages: classification and the comparative method (2004) is an anthology containing discussions of issues in the classification of the languages of Australia.
East Asia: S. Robert Ramsey's The Languages of China contains a good discussion both of Chinese and of the non-Chinese languages of China. The Sino-Tibetan Etymological Dictionary and Thesaurus project has a useful discussion of the internal structure of Sino-Tibetan. For Chinese "dialects" consult the dialects section of Marjorie Chan's terrific Chinalinks site. The classification of the Austroasiatic languages is discussed here.
New Guinea: William Foley's book The Papuan Languages of New Guinea (Cambridge University Press, 1986) contains a lengthy discussion of the classification of the non-Austronesian languages of New Guinea.

Posted by Bill Poser at 02:34 PM

Unfolding Infogami

A few months ago Mike Pope of Evolving English II brought to our attention an employment website called Jobdango, which grafted the last two syllables of fandango onto job to create its domain name. I described this as a case of cran-morphing, where a segment of a word is reanalyzed as if it were a combinable morpheme preserving some semantic association with the longer word from which the segment is taken (like the cran- in cranberry recombining in cran-grape and cran-raspberry). Now Pope divulges a new cran-morphish domain name from the fevered minds of Web developers: Infogami, a Wiki-like application launched by Aaron Swartz that lets users build their own websites.

Noting that Microsoft used "Origami" as a working name for the project that became "Ultra-Mobile PC," Pope writes:

MS's use is just recasting an ordinary noun as a name. Swartz actually takes the step of decomposing the term. So what's the common semantic, I wonder? Small? Folding? Make cool things out of simple materials? I can't quite pull the instances together.

Swartz illuminates the selection of the name in a narrative about his early brainstorming with Paul Graham, Trevor Blackwell, and a woman he calls "4 of 4":

'When would you have the first prototype done?' 'Well, we'd hope to work on it over the next term so we'd have it ready over the summer.' 'Oh, wonderful, wonderful.' he says. 'What about this name? Infogami? You're going to always have to spell it out.' Paul says. 'Isn't it just origami with info at the beginning?' 4 of 4 asks. 'Well, it's confusing,' Paul says. 'In-FAH-gomee,' Trevor chimes in. 'All the names with blog in it are probably taken,' 4 of 4 says. 'No, you don't want blog in it,' Paul says. 'You want something bigger, something that can face the world. You're not wedded to the name, are you?' 'No, we just picked it so we could stop discussing the name and move on,' I said. 'Oh, good,' Paul says, and moves on.

As is no doubt the case with many startups, a replacement was never found for the original stopgap name. But Graham must have eventually warmed up to the sound of "Infogami," as he wrote about it approvingly in a post about what makes a good startup name:

Infogami is a pretty decent name too. Aaron already had that when we first met him. It can't conveniently be used as a verb, but it looks and sounds good, and has the advantage that it can naturally expand to cover whatever this software evolves into.

Graham observes that nowadays "cool" startups tend to inhabit "decidedly marginal name space" by using peculiar, less-than-obvious domain names. He compares it to "when fashionable people started living in lofts in industrial neighborhoods," where "the features that initially repelled people, like rough concrete walls, have now become a badge of coolness." One example Graham gives of a weird-therefore-cool domain name is Flickr, though as we've seen the substitution of "-er" with "-r" is rapidly losing its cachet of hipness as a flock of Web developers follow Flickr's lead (much as the industrial lofts in Brooklyn's Williamsburg neighborhood are nowhere near as hip as they were a decade ago).

So the use of unusual cran-morphs like -dango for "Jobdango" or -gami for "Infogami" is evidently another route into fashionably offbeat name space. The danger, though, is in deploying a cran-morph that is so unusual that it lacks semantic transparency. Though the -gami ending was striking enough for bloggers to recognize the metaphor of "Web Origami," Mike Pope wasn't the only one to be a bit baffled by what that metaphor was supposed to indicate. As a contributor to the Joel on Software discussion group wondered soon after the startup name leaked out, "Infogami — like origami, except information instead of paper?" And a commenter on Paul Graham's piece about startup names raised a potential cross-linguistic problem:

No biggie, but if you speak Japanese infogami sounds weird, as the "gami" in origami just means paper (kami). The "ori" part means fold. So infogami sounds like info-paper, not info-folding as I imagine was the intention.

That could conceivably be an issue if Infogami makes it big in Japan, but it's not something that would trouble Anglophones. One significant aspect of cran-morphing is that it completely reanalyzes a segment, regardless of what semantic content the segment may have had earlier in its history, whether in English or another originating language. Cheeseburgers and turkeyburgers don't have anything to do with the inhabitants of a burg, just as Monicagate and Plamegate don't have anything to do with gates.

Another possible source of confusion is pronunciation. The connection to origami implies that the name should be pronounced as [ˌɪnfoˈgɑmi] after the typical English pronunciation of [ˌɔrəˈgɑmi]. But Trevor Blackwell was first tempted to pronounce the name as [ɪnˈfɑgəmi], influenced by similarly stressed forms ending in -gamy like monogamy [məˈnɑgəmi]. On the other hand, perhaps that pronunciation would provide an extra semantic boost: you can choose to read it as either 'the folding of information' or 'the marriage of information'!

[Update #1: Sean Palmer emails to say that Steve Ivy was the first to come up with the name "Infogami," back in March 2002.

Also, I was remiss in giving examples demonstrating that -gami had risen to the level of a crantacular combining form. A quick Web search finds plenty of examples of X-gami meaning "(the art of) folding X," such as card-i-gami, diaper-gami, pornogami, penis-gami (yikes), moneygami, and so forth.]

[Update #2: Dan Brown sends along another example of X-gami: baby-gami. (Fortunately this entails wrapping, not folding, babies.)]

Posted by Benjamin Zimmer at 02:22 PM

X is a Y best served Z

According to the first sentence of a story in the New York Times by Richard Siklos, "Post to Daily News: Drop Dead",

In New York's tabloid newspaper war, revenge is a dish best served boldface.

Nathan Bierma writes to suggest that Siklos has produced a clever new variant of the well-established "X is a dish best served Y" snowclone.

The original phrase is "Revenge [or vengeance] is a dish best served cold", so that boldface is nice rhyming substitution. It's even semantically appropriate, since the Post's breathless gossip columns put celebrity names in boldface. (Thus today's Liz Smith lede "'YOU THINK those girls abstain?' That's Sandra Bernhard on the Bush twins, Jenna and Barbara.") And the "revenge" part, in case you've been preoccupied with the Mars Rover's broken wheel, is the Daily News' schadenfreude over the gossip-column extortion scandal at the Post.

According to the Wikipedia:

Revenge is a dish best served cold. - suggesting that emotional detachment and planning ("cold blooded") are best for taking revenge. The earliest well-known example of this proverb in print appears as "La vengeance est un plat qui se mange froid" in the novel Les Liaisons Dangereuses (1782) by Pierre Choderlos de Laclos. The saying exists in many cultures, including Sicilian, Spanish and Pashtun, making its ultimate origin difficult to determine. The modern English wording is attributed to Dorothy Parker. In Star Trek II: The Wrath of Khan (and, in reference, Kill Bill) it is said to be a Klingon proverb and was quoted by Khan Noonian Singh (Original Klingon "bortaS bIr jablu'DI', reH QaQqu' nay"). In comic books it is often associated with Batman's enemy Mr. Freeze.

And there's certainly ample evidence for the popularity of this phrase as the basis for the usual sorts of substitutions. Web searches produce many substitutions for the initial noun and final adjective:

{Vendetta, Justice, Democracy, Seawater, Service, Continuity, Redemption, Truth, Mike Karikas, Tweeness, Schadenfreude, elimination, horror, paranoia, Arch Enemy's music, diplomacy, consciousness, Trampling, stats, skepticism, Payback, Titanium, Ignorance, Intensity, Doom, Comedy, Bed-making, Snark, German electronic music, War, Humble Pie, cronyism, victory, Ethnic insult, Improv, ...} is a dish best served cold.

Revenge is a dish best served {on broken glass, popped, with Happy Penguins, bold, with spam, with a Merlot, online, with ketchup, with postage, cheap, by reply mail, with cabbage, with a stripper, by a musical virgin, slick, smelly, stale, with wine, fried, with a side salad, smack in the f*cking face, with cheese, milky and chocolatey, with a quart of potato salad eaten at one sitting, guava-flavored, ...}

There are also quite a few substitutions for the medial noun alone, though these are mostly simple semantic adjustments or elaborations of the original quote, rather than the re-purposing that typifies a snowclone:

Revenge is a {meal, dinner, plate, drink, platter, dessert, cocktail, beer, tator-tot hotdish, tasty morsel, supper, donut, sweet, meat, repast, fish, cake, ...} best served cold.

Likewise many of the substitutions for the participle served:

Revenge is a dish best {eaten, tasted, savoured, enjoyed, delivered, supped, ...} cold.

And we'll pass over the many examples where it's literally meant that X is best served cold, for X = {Turkey, cheese on toast, Pasta, Steak, Gazpacho, ...}, though this phrasing is probably influenced by the now-famous quotation.

Then there is a penumbra, of indefinite size, of phrases with multiple substitutions:

Disturbing is a dish best served animated
Love is a dish best served in a warm, irresistible atmosphere
Flatulence is a dish best served at 37.5 C
Nostagaligia [sic] is a dish best served squishy
Freedom is a dish best served hot
The truth is a dish best served plain
Clockcleaner's music is a dish best served loud
...comedy is a dish best served black ...
Schadenfreude is a dish best served daily
Blues is a dish best served hot.
Romance is a dish best served spontaneously.
...political insight is a dish best served funny.
Utah's optimism is a dish best served with caution.
... nostalgia is a dish best served without Asian-fusion.
Public radio is a dish best served to those whose palate is mature enough to appreciate it.
...humour is a dish best served cool.
Funk is a dish best served live.
Water skiing is a dish best served at dawn or dusk, when the water is glassy and calm.
Nostalgia Pie is a Dish Best Not Eaten At All.
Five Iron is a dish best experienced live
Satire is a dish best consumed sparingly.
Kama Sutra is a dish best paired with a mango smoothie
Embarassment is a dish best enjoyed alone
Justice is a Pizza Best Served Cold.
History is a lesson best served cold.
envy is a drink best served chilled.
irony is a drink best served on the rocks
Respect is a milkshake best served with two straws.
Reality is a genre best served simple.
quality is a fish best served boned
...

Posted by Mark Liberman at 12:47 AM

April 10, 2006

A fishapod called Tiktaalik

The big news these days in evolutionary biology is the discovery of Tiktaalik roseae, a fossil fish dating back to the Late Devonian era, some 375 million years ago. Tiktaalik is particularly exciting because it represents a transitional stage between organisms with distinct attributes of fish and those with distinct attributes of four-legged land animals, or tetrapods. The genus name Tiktaalik was suggested by elders of the Nunavut Territory in the Canadian Arctic, where the fossil was found. According to the Inuktitut Living Dictionary, tiktaalik is the Inuktitut term for the burbot, a large freshwater fish resembling the cod. It's a lovely, evocative name ("music to my ears," in the words of my brother Carl Zimmer, who wrote extensively about the water-to-land evolutionary leap in his book At The Water's Edge). But one of the discoverers, Neil Shubin of the University of Chicago, apparently couldn't leave well enough alone and introduced another less euphonious moniker for Tiktaalik (and creatures like it) when interviewed by the New York Times about the fossil find:

Tiktaalik, Dr. Shubin said, is "both fish and tetrapod, which we sometimes call a fishapod."

In the press release accompanying the discovery's publication in the journal Nature, Shubin says the researchers "jokingly call it a fishapod," but some bloggers didn't seem to appreciate the humor. "Clearly a crap name," fumed Libertaria of The Bewilderness, adding: "Don't make science less interesting by branding good discoveries with a Soviet consumer brand." John Holbo of Crooked Timber wonders why they didn't go for ichthyopod, before deciding that both fishapod and ichthyopod would more accurately denote "an organism with fish for feet."

Holbo's tongue-in-cheek reading of fishapod assumes that the -pod suffix must attach to a form that describes the feet themselves (e.g. arthropod, lit. 'jointed feet') or counts them (e.g., tetrapod, decapod, myriapod, etc.). But the jocular designation offered by Shubin clearly isn't intended to work like these other neo-Latin combinations. Rather, it's a straightforward blend of the two words fish and tetrapod, iconically representing the neither-here-nor-there transitional nature of Tiktaalik.

I've previously written about political name-blends like Scalito and Camerair (and celebrity name-blends like Bennifer and Brangelina), where the fusing of the two names suggests the creation of a mutant hybrid beast. But the blending or compounding of animal names to indicate hybridity or intermediacy is an old tradition: consider the compound names of such fanciful beasts as the hippogriff ('horse-griffin') or the lycanthrope ('wolf-man'). The giraffe was originally called the camelopard, melding Greek camelos 'camel' and pardalis 'leopard' because it was thought to combine a camel-like head with leopard-like spots. (This was sometimes written as cameleopard, assumed to be a blend of camel and leopard; indeed, giraffes were frequently depicted in early illustrations as if they were camel-leopard amalgams.) In modern times, when new animal hybrids are engineered by interbreeding, they are often given name-blends: the offspring of a male lion and female tiger is a liger, the offspring of a male tiger and female lion is a tigon, the offspring of a male zebra and female donkey is a zedonk or zonkey, and so forth. The earliest such interbred name-blend that I'm aware of is cattalo, a cattle-buffalo hybrid dating to 1888 (now superceded by beefalo).

It's too early to tell whether Shubin's use of the fishapod blend will catch on in popular descriptions of Tiktaalik. Several bloggers (e.g., here, here, here, and here) have independently come up with another name for the fossil: Darwin fish. This refers to a pro-evolution spoof of the "Jesus fish" insignia found on car bumpers across America. Since the discovery of Tiktaalik is already being used as fodder in the battle between evolutionism and creationism, I can see how Darwin fish could spread as a purposefully provocative nickname for the fossil. Tiktaalik has already inspired Darwinian art from Ray Troll with the exhortation to "embrace your inner fish." Troll is also responsible for "The Devonian Blues," a catchy number driving home the evolutionary message that all of us humans are, in a way, fishapods.

Posted by Benjamin Zimmer at 01:10 AM

April 09, 2006

The Four Subjects of Linguistic Analysis

Mark Liberman's post relating William Matthews' Four Subjects of Poetry (here) made me wonder what the four subjects of linguistic analysis might be. It's hard to keep it to four, but there's a humble start after the jump.

1. I've analyzed a whole bunch of language phenomena and what I've found corrects/amplifies/changes completely what the rest of you less enlightened folks have to say about this subject.

2. I've discovered a spanking new language phenomenon and so, ta-da, here it is in all its glory.

3. I've gone to great pains to compare language phenomenon #1 with language phenomenon #2 and I found:
a. one of the two is more accurate or useful or pleasing or relevant than the other one, or
b. the two are either the same or so similar that it doesn't really make any difference.

4. I've discovered that a certain older language issue is still relevant today, so take that, you modern whipper-snappers.

Posted by Roger Shuy at 07:20 PM

The meaning (or not) of links

In the word of weblogs, textual links are either demonstrative, like this, or they're footnote-like provisions of background information that the author thinks might be helpful. (I'm leaving out navigational linkage, which is often quite different.) Those hyperlink conventions are pretty much the ones that developed earlier in other sorts of web text. But over the past few years, newspapers have started experimenting with hyperlinks in their news text; and in my opinion, the results are generally weird.

Several years ago, the New York Times started tagging company names with links to company-associated pages on the NYT business site. But there was no serious attempt to ensure that the tagged names actually have any connection to the company: thus back in March of 2004, I noted a case where "Laura Fluor, a car saleswoman from Monmouth County, N.J" got a link to the Fluor Corporation.

Since then, the Times has either instituted better entity-tagging algorithms, or put humans in the loop: today I can read a dozen stories on line without finding any examples like that one. However, the links are still a little strange.

For example, a story by Joseph Berger on the rising prices of suburban homes ("Homes Too Rich for Firefighters Who Save Them") has three hyperlinks. One is to "Steve Levy" in this context:

Steve Levy, the Suffolk County executive, said the problem went beyond civil servants.

The second one is to "Harvard":

"There are parts of the country, particularly the two coasts, where the price of housing has so outstripped any income gains that moderate wage earners find it difficult to find a decent home in the community where they work," said Nicolas Retsinas, director of the Joint Center for Housing Studies at Harvard and a former assistant federal secretary for housing.

The third one is to "Martha Stewart":

In the town of Bedford, made up of the hamlets of Bedford, Bedford Hills and Katonah, the median household income for its 18,600 residents is more than $100,000, with celebrated residents like Martha Stewart making a good deal more. Volunteers are increasingly coming from outside Bedford's bounds.

Levy is arguably an important figure in the context of the story, deserving the hyperlink equivalent of a footnote. But the references to Harvard and to Martha Stewart are tangential, and the existence of the hyperlinks implicates a degree of relevance that they lack. The links are especially odd, given that many more relevant "named entities" are not given links -- places like Westchester County, organizations like Habitat for Humanity, and so on.

It's pretty clear what's going on. There's an index of Times Topics, which "correspond to the most frequently assigned subject, geographic, organization and personal name headings". Stories are indexed (automatically?) relative to that (finite and fairly small) list of topics. Thus Berger's story on suburban home prices is linked to Harvard, even though the only connection is a quote in the 14th paragraph from someone who works there; and to Martha Stewart, even though she is only mentioned in the 25th paragraph as an example of one of the wealthy residents of the town of Bedford.

Sometimes the Times Topics links seem pragmatically even stranger. Today's 7,800-word NYT Sunday Magazine piece by Jack Hitt on abortion in El Salvador, "Pro-Life Nation", has links on the words and phrases abortion, pregnancy, mental health, Pope John Paul II, U.S. Supreme Court, suicide, ulcer, hepatitis, U.S. Senate, and smoking, although many of these are entirely marginal references, e.g.

The women's prison where convicted murderers are sent is in the outer district of Tonacatepeque. ... Through a small window, I could see an open area crisscrossed by laundry lines and arrayed by different women lying around smoking.

There are no links for many more relevant items in the same story: El Salvador, South Dakota, Roe v. Wade, Opus Dei, Center for Reproductive Rights, Yes to Life Foundation, and so on.

But it could be worse. Following some links in my Sunday morning reading on the web, I happened on Kaelen Wilson-Goldie's review in The Daily Star of Brian Whitaker's "Unspeakable Love", published under the headline "Briton's book gives voice to gay Arabs". It appears that the (Beirut) Daily Star has started selling words in its stories, somewhat in the way that Google sells AdWords relative to users' search terms. I've seen this in the online edition of other publications as well, but The Daily Star doesn't just put perhaps-relevant ads in the margins; it underlines the words and then flashes the ad as a mouse-over event. The first paragraph of the review contains four ad-linked words:

When Salim, a 20-year-old Egyptian, told his family that he was gay, they packed him off for six months of psychiatric treatment. When Ali, a teenager from Lebanon, was discovered to be gay, his father broke a chair over his head and his brother threatened to kill him for tarnishing the family honor. Ali left home and no longer has any contact with his relatives.

And the (mouse-over pop-up) ads for those words start:

*family*	Find Family Practice Opportunities: At M**** M**, we match experienced physicians with respected health care organizations. ...
*Lebanon*	Visiting Lebanon? Find cheap flights and hotel rates ...
*chair*	Find a wide selection of lift chairs, various colors and sizes ...
*home*	Get money-saving tips by taking ***'s home energy survey.

I doubt that "... his father broke a chair over his head..." is really the sort of context in which the company selling lift chairs really hoped to find customers.

It's clear that the algorithm is simple keyword match. Thus words are matched inside names, so that the House inside Zico House

Launched in Beirut on Wednesday night with a book signing at Zico House and a party at Walima...

rates an ad that starts "What's your home worth? Thinking of selling your home? Wonder how much it might be worth? You can find out with a free home valuation at ..."

And words are also taken out of idioms, e.g. credit in

To his credit, Whitaker does not shy away from but rather dives into the murky questions surrounding homosexuality in the Middle East.

which yields "Finding the best credit card deal is now easy ..."

So here's a new application for text-tagging algorithms: not doing information extraction for "data mining" in text, but rather finding textual references that are genuinely appropriate for triggering advertisements.

[If you're thinking of patenting this idea, consider this post to be evidence of prior art, since the implementation of the idea (to some reasonable degree of performance) is a trivial application of the existing technology of stochastic taggers. ]

[Update: Matt Hutson has an example showing that the NYT has not yet worked all the kinks out of their links:

A November 6 NYT Mag article on literary Darwinism mentions Harvard psych prof Stephen Kosslyn. The word "Stephen" is hyperlinked to the Times Topics page for Stephen Sondheim.

]

Posted by Mark Liberman at 05:42 PM

April 08, 2006

The four subjects of poetry

In case you missed it this morning, Scott Simon interviewed Edward Hirsch on Weekend Edition, and together they read William Matthews' Four Subjects of Poetry:

1. I went out into the woods today, and it made me feel, you know, sort of religious.
2. We're not getting any younger.
3. It sure is cold and lonely (a) without you, honey, or (b) with you, honey.
4. Sadness seems but the other side of the coin of happiness, and vice versa, and in any case the coin is too soon spent, and on what we know not what.

Posted by Mark Liberman at 02:21 PM

Quasi-modal be in Family Circus

Back on Thursday, Josh Fruhlinger at the Comics Curmudgeon let loose with

... an extended shout-out to my professional linguist homies over at the Language Log, who have linked to me several times despite my near-total absence of linguistics content), I’ve always found the verb construction Mom’s deploying here pretty stilted and weird. It’s a verb of being governing a negative infinitive, which makes it … well, hell, if I knew that, I’d be writing “I analyze syntax so you don’t have to,” or, you know, the Language Log, instead of this thing. I reached back a decade and rummaged around my half-remembered memories of Latin for a while and came out with the phrase “hortatory subjunctive,” but I don’t think that’s right. Anyway, it does have a certain advantage in that saying “Don’t open it until you get home” would make her look pretty dumb, since he’s already opened it. This way she gets to make a general statement of fact without having to either ignore or explicitly acknowledge the reality of her greedy, gobbly, smarmy little brat of a son.

If Josh were like some people, he would have just randomly gone with "passive tense" or "hortatory dative" or something, but he didn't. Instead, he asked for terminological help, and here it is, two days after the Language Log batsignal went up (which is like a month in blog time) and we haven't responded. Well, none of the syntacticians seem to be on duty here at Language Log Plaza this weekend, so I'll try to fill in. The thing is, I'm a phonetician, not a syntactician, and I'm not entirely sure what this construction ought to be called either.

As far as I know, it doesn't have an established name in traditional grammar. The OED gives sense II 11b. for to:

Expressing duty, obligation, or necessity. (a) with inf. act.: is to..= is bound to, has to.., must.., ought to...

and gives these citations, among others:

1591 SHAKES. Two Gent. II. iii. 37 Thy Master is ship'd, and thou art to post after with oares.
1598 -- Merry W. IV. ii. 128 You are not to goe loose any longer, you must be pinnion'd.
1887 'L. CARROLL' Game of Logic i. §1. 9 What, then, are you to do?

And CGEL distinguishes among six uses of be (p. 113):

i	She was a lawyer.	[copula be]
ii	She was sleeping peacefully.	[progressive be]
iii	They were seen by the security guard.	[passive be]
iv	You are not to tell anyone.	[quasi-modal be]
v	She has been to Paris twice already.	[motional be]
vi	Why don't you be more tolerant?	[lexical be]

The be in iv is called "quasi-modal" because it

has clear semantic affinities with the central modal auxiliaries, and syntactically it resembles them in ... [that] it can't appear in a secondary form: *I resent being not to tell anyone, *The meeting had been to be chaired by the premier. It lacks all the other modal auxiliary properties, however; it has agreement forms, it takes an infinitival with to, it can't occur in a remote apodosis, and its preterities do not occur with the modal remoteness meaning.

So I guess the right terminology would be something like "quasi-modal be with an infinitive of obligation", though this may redundantly ascribe to be and to the responsibility for a single construction type.

[In the CC comments section, Jimmy says that

Thel should have said "you weren't to open those until we got home,"which is a usage of the Past Future Perfect Laudatory Declamatory Tense Twice Removed...

and Scipio tries heroically to assimilate it to Latin:

I think what you're searching for is "negative future passive participle", specifically,
"tibi non aperturum est donec domi revenerimus."
"it's not to be opened by you until we shall have returned home."

]

[Jonathan Lundell wrote to object that

No, quasi-modal be in The Hunchback of Notre Dame.

Ouch. ]

Posted by Mark Liberman at 01:06 PM

"Conditional tense" at the NYT

A couple of days ago , Linda Seebach sent me an email about an item at Regret The Error referencing Gawker's citation of a Tabloid Baby post about a letter from an editorial assistant at the New York Times. That sentence suggests why the blogosphere is a paradise for folks interested in tracking the diffusion of information through social networks -- but in fact there's a linguistic point at the end of that chain of references. Details follow...

A 4/3/2006 NYT article by Andrew Jacobs, "On Job with Empress of Celebrity Gossip", observed that

Reporters from Variety, "Entertainment Tonight" and "A Current Affair" might be expected to remain corralled behind a length of velvet rope, but at a recent premiere for "Inside Man" at the Ziegfeld Theater in Midtown, Mrs. Adams [Cindy Adams, gossip columnist for the New York Post -myl] curtly rebuffed a perky film publicist who had asked her to join the salivating pack.

Tabloid Baby argues that this sentence has a false presupposition, inappropriate for the newspaper of record:

A Current Affair was canceled in October 2005, so there’s no way one of its reporters was behind that velvet rope. Additionally, A Current Affair was regularly denied access to celebrity red carpet lines, because it did real tabloid entertainment journalism and refused to regurgitate studio pap and myths, like ET. So you'd figure the Times would at least cop to placing a nonexistent reporter at the scene, no?

We asked the Times for a correction, and were pleasantly surprised when, in small-town, old school fashion, the paper actually responded to our request.

The post goes on to quote the letter from Karin Roberts, Assistant to the Metropolitan Editor:

You are correct in noting that "A Current Affair" has been canceled. However, the article does not say that a reporter for "A Current Affair" was at the premiere of "Inside Man." ...

The first part of [the quoted] sentence is written in the conditional tense; it means that at red-carpet events like the premiere, those reporters would probably stay behind the velvet rope. The second part goes on to describe what happened at this particular premiere.

Tabloid Baby, Gawker and Regret The Error all seem to feel that Ms. Roberts' excuse was inadequate, not to say lame. One of Tabloid Baby's commenters puts it bluntly:

So if we follow this logic through, basically the Times is saying "Yes, our references are outdated, confusing and stale, but they're not technically inaccurate when we make 'em hypothetical. We couldn't possibly be expected to list three viable, current media outlets for this item. We're on deadline you know. Just thank your lucky stars we didn't cite reporter Lou Loudrock from "The Flintstone Free Press."

But the point of Linda Seebach's email was a different one. Roberts' reference to the "conditional tense" is not just a lame excuse, it's also a double misuse of grammatical terminology. What Roberts is referring to is neither a matter of tense, nor a conditional.

We can streamline the original sentence to read "Reporters are expected to remain behind a velvet rope, but Mrs. Adams rebuffed a publicist who asked her to join the pack." If we add the modal preterite might to the first clause

Reporters might be expected to remain behind a velvet rope, but Mrs. Adams rebuffed a publicist who asked her to join the pack.

it add a tinge of concessive tentativeness, which is a matter of grammatical mood.

As Geoff Pullum wrote recently with respect to a similar terminological malfunction at The Economist, involving a reference to the "passive tense":

Tense is an inflectional category of verbs that has time reference as its primary semantic function. English has two orthogonal tense contrasts: the primary one is present vs. preterite (compare writes with wrote) and the secondary one, marked with the past participle preceded by the auxiliary verb have, which contrasts non-perfect versus perfect. The have can be in either present or preterite, so we get both writes / has written and wrote / had written. Subjects and objects are unaffected by tense changes.

Geoff went on to explain that the distinction between active and passive is a matter of grammatical voice.

None of this matters to the specific point that Ms. Roberts was trying to make. Jacobs' sentence described a general expectation about where journalists should stand, and reported that Cindy Adams insisted on violating it. The reference to particular media outlets was meant to make the exceptionalism more vivid, not to report the presence of any specific reporters.

Nor does the grammatical terminology matter to the bloggers' objection: common sense tells us that it's odd to mention hypothetical reporters who couldn't have been present at the time and place described, even as the subject of a modal concessive clause. If I were to write

Jorge Luis Borges, George W. Bush and Mahatma Gandhi might be in Palo Alto waiting for a bus in the rain, but Geoff Pullum is enjoying the sun in Granada.

it would be reasonable for you to point out that Borges and Gandhi are long since dead, and W doesn't take the bus.

As Geoff said, "the problem of people confusing voice with tense is not a huge danger for the future history of the world"; and we can add that life on earth is also unthreatened by confusing mood with tense and throwing around meaningless references to conditionals. But Geoff's question remains:

... for heaven's sake, if people have absolutely no idea how to use technical terminology of grammar, why do they try ...?

Here's a concrete, positive suggestion: Geoff should create a short remedial course on English grammar, including a brief but clear overview of tense, voice, and mood; and media companies should hire him to give it to interested employees.

Posted by Mark Liberman at 07:51 AM

April 07, 2006

WTF coordination in the bullpen

Here's another gem from Ball Four by Jim Bouton, who clearly has a keen ear for ballplayer-talk. Bouton's no-holds-barred memoir recounts the 1969 baseball season, during which time he pitched for a short-lived expansion team, the Seattle Pilots. An aging knuckleballer, Bouton spent much of the season in the Pilots bullpen hoping to be called in for relief appearances. He describes a game where the starter Steve Barber runs into trouble in the fifth inning (the inning that a starting pitcher needs to complete to get credit for a win), while he and fellow pitcher Fred Talbot wait anxiously in the bullpen.

In the fifth, when Barber walks a couple, the call comes — for me. With two out I'm all set to go in and collect my Big W when Barber, the rat, goes ahead and gets the third out on a pop-up. Says Talbot: "Ah, sit down. No chance now. All you can get is a save or your ass kicked in."

(Even though he missed out on his chance at a win or "Big W," Bouton came in to pitch the last four innings and got a save, not his ass kicked in.)

Talbot provides a lovely in-the-wild example of what Neal Whitman calls "cross-subcategorization complement-complement coordination" — or "WTF coordination," to use the snappier term used by folks around Language Log Plaza. One of the examples Neal gives in his 2004 Language article "Semantics and Pragmatics of English Verbal Dependent Coordination" (available in PDF form here for subscribers to Project Muse) also incorporates an unlike coordination of complements for get:

It makes it tough for him to get [his things done] and [to bed on time].

Each of these examples involves the coordination of a passive "small clause" complement for get: [your ass kicked in], [his things done]. (See Nicholas Fleischer's recent CLS paper for more on get with passive complements.) By coordinating an NP complement ([a save]) with a passive complement ([your ass kicked in]), Fred Talbot's comment resembles another of Neal's examples, this time with the verb need:

He also needs [a refill on his juice] and [his diaper changed].

Neal credits the above exempla to his sister-in-law and his wife, respectively. I wonder if he married into the Talbot family.

Posted by Benjamin Zimmer at 06:13 PM

The discreet charm of French orthography

Francophones can be just as peevish about spelling, grammar, and usage as Anglophones, but at least they can have fun with their linguistic foibles rather than descending into murderous rage. I base that ridiculous generalization on an article in the (UK) Times about the rise of "spelling clubs" (clubs d'orthographe) across France. The clubs are devoted to "dictations" (dictées) in which texts full of difficult words are read aloud, transcribed by participants, and then corrected.

If the article is to be trusted, the clubs are remarkably free of griping or defensive complaints about Anglophone hegemony. "This is about pleasure," says Suzanne Commerçon, who leads dictations in the town of Plouay. "You can see how much everyone is enjoying themselves." Jean-Pierre Jaffré of the French National Centre for Scientific Research ascribes the participants' joie de vivre to the "nostalgic flavor" that dictations provide, reminding older speakers of the "proper French" they were taught as children. "As people grow older, they want to go back to their past," Jaffré told the Times.

Vonick Epaillard of the Plouay spelling club maintains that "the main thing is that you have a great time here." But he doubts that Anglophones could experience the same delight:

"I expect dictations in English are not very exciting, because the only difficulty with English is the accent. In French, we have irregular verbs, complexities with past participles, lots of rules, exceptions to those rules and exceptions to the exceptions. It's a real challenge."

English certainly has no shortage of orthographical and grammatical peculiarities, but a snippet of Mme. Commerçon's dictation correction illustrates the sort of French-specific headache that the spelling clubs find perversely enjoyable:

"There is, of course, a circumflex accent on dûment [duly]," she said, provoking mostly self-satisfied nods. "And there is, of course, no circumflex on éperdument [desperately]," she added, to much dismay among her audience.

Darn that pesky circumflex!

[Update, 4/9/06: Much incisive discussion can be found at millionary wherewhens and No-sword.]

Posted by Benjamin Zimmer at 02:56 PM

Tis strange the mind, that very fiery particle, Should let itself be snuffed out by an Article

What I intended as a throwaway post on H. W. Fowler's analysis of the as an adverb in the the more the merrier construction engendered a ~~torrent~~ ~~inundation~~ freshet of emails. John Cowan pointed out that the article in this construction is in fact the last survivor of the Old English instrumental demonstrative, and Russell Lee-Goldman noted that the construction was hotly debated a century ago and that the analysis of the article as a legacy of the OE demonstrative is defended in a Fall 2005 Linguistic Inquiry article by Marcel den Dikken called "Comparative Correlates Comparatively" (available here), where the determiner is treated as the head of a phonologically null degree phrase. And Justin Mansfield observed that "[Fowler's] analysis is exactly how this construction is done in Latin: e.g. quo plures, eo laetiores "by how much [we are] more, by so much [we shall be] merrier," and added that "calquing the grammatical analysis of English constructions off of their Latin parallels [is] a major vice of traditional grammarians."

Well, yes, to be sure. But then Fowler was playing by different rules from modern syntacticians. Where the game now involves sorting through a vast number of analytic alternatives until they can be reduced to the categorical repertory of Universal Grammar, traditional grammarians were obliged to take their categories from off the shelf and constrict or deform the sprawl of syntax until one or another of them could be made to fit. Questions of empirical methodology aside, you have to wonder which exercise provided for a more gratifying display of ingenuity. Our method is clearly better suited to turning out scientists, but you'd have to give theirs the edge when it comes to training lawyers.

Posted by Geoff Nunberg at 12:58 PM

Linguists 'have different brains'

That's the headline for a recent BBC report from the frontiers of neuroscience. Even though "have different brains" is set off in quotation marks (using the peculiar headline-writer's technique of putting a scientific claim in quotes even if it's not quoted material), I would have preferred seeing the scare-quotes around "linguists" instead. You see, it's not professional language scholars who are differently-brained, at least not necessarily. The article describes a study by neuroscientists at University College London finding that fast learners of non-native speech sounds have more "white matter" in the left part of a brain structure known as Heschl's gyrus. To determine this, the researchers trained native French speakers to hear the difference between dental [d] and retroflex [ɖ]. (This is not a phonemic difference in French or English, but it is in a language like Hindi.) The subjects who were fastest at learning the dental-retroflex contrast showed a marked asymmetry of white-matter density between their right and left Heschl's gyri.

You can't really blame the headline writer for using the word "linguists" as shorthand for "people who are adept at learning non-native phonemic contrasts." But it plays into a popular preconception that linguists (the professional kind) are natural-born polyglots, picking up the sound patterns of foreign languages at the drop of a hat. Many linguists do indeed fit this description, but it's not a prerequisite for study in the discipline, especially for fields outside of phonetics and phonology. (This brings to mind dystopic imagery of children undergoing brain scans to determine their career paths. "Look at all the white matter in Jimmy's left gyrus! He's going to be the next Ladefoged!") In any case, learning a foreign language involves a lot more than mastery of the relevant phonemic distinctions, though that obviously helps.

I should note that the earliest recorded usage of the word "linguist" is not far from what the BBC headline writer used. The OED gives the earliest sense as "one who is skilled in the use of languages; one who is master of other tongues besides his own," with a first citation from Shakespeare's Two Gentleman of Verona:

And partly, seeing you are beautified
With goodly shape and by your own report
A linguist and a man of such perfection
As we do in our quality much want.

The first citation for "linguist" in the sense of "a student of language" comes from John Wilkins in 1641, some 50 years later, and even then there was no clear distinction with the "polyglot" sense. The word "philologist" was long the preferred term for a language scholar — as late as 1922 Otto Jespersen felt compelled to write: "I think I am in accordance with a growing number of scholars in England and America if I apply the word 'linguist' by itself to the scientific student of language (or of languages)." I tend to agree with the Tensor, who feels that professional linguists shouldn't be so annoyed at confusion over the different meanings of "linguist," considering that scholars "hijacked" the word in the first place.

[Update: Mark Liberman investigates the research itself, not just the media coverage, in this post.]

Posted by Benjamin Zimmer at 10:28 AM

Worm grunting

If you happen to be in Sopchoppy, Florida tomorrow, you might want to stop in at the sixth annual Worm Grunting Festival, which offers a unique combination of science and folklore.

The science starts with Charles Darwin's last work, "The Formation of Vegetable Mould Through the Action of Worms, With Observations on Their Habits", which includes a charming account of his experiments:

Worms do not possess any sense of hearing. They took not the least notice of the shrill notes from a metal whistle, which was repeatedly sounded near them; nor did they of the deepest and loudest tones of a bassoon. They were indifferent to shouts, if care was taken that the breath did not strike them. When placed on a table close to the keys of a piano, which was played as loudly as possible, they remained perfectly quiet. Although they are indifferent to undulations in the air audible by us, they are extremely sensitive to vibrations in any solid object. When the pots containing two worms which had remained quite indifferent to the sound of the piano, were placed on this instrument, and the note C in the bass clef was struck, both instantly retreated into their burrows. After a time they emerged, and when G above the line in the treble clef was struck they again retreated. Under similar circumstances on another night one worm dashed into its burrow on a very high note being struck only once, and the other worm when C in the treble clef was struck. On these occasions the worms were not touching the sides of the pots, which stood in saucers; so that the vibrations, before reaching their bodies, had to pass from the sounding board of the piano, through the saucer, the bottom of the pot and the damp, not very compact earth on which they lay with their tails in their burrows. They often showed their sensitiveness when the pot in which they lived, or the table on which the pot stood, was accidentally and lightly struck; but they appeared less sensitive to such jars than to the vibrations of the piano; and their sensitiveness to jars varied much at different times.

It has often been said that if the ground is beaten or otherwise made to tremble, worms believe that they are pursued by a mole and leave their burrows. From one account that I have received, I have no doubt that this is often the case; but a gentleman informs me that he lately saw eight or ten worms leave their burrows and crawl about the grass on some boggy land on which two men had just trampled while setting a trap; and this occurred in a part of Ireland where there were no moles. I have been assured by a Volunteer that he has often seen many large earth-worms crawling quickly about the grass, a few minutes after his company had fired a volley with blank cartridges. The Peewit (Tringa vanellus, Linn.) seems to know instinctively that worms will emerge if the ground is made to tremble; for Bishop Stanley states (as I hear from Mr. Moorhouse) that a young peewit kept in confinement used to stand on one leg and beat the turf with the other leg until the worms crawled out of their burrows, when they were instantly devoured. Nevertheless, worms do not invariably leave their burrows when the ground is made to tremble, as I know by having beaten it with a spade, but perhaps it was beaten too violently.

According to a 4/7/2006 story by Kevin Begos in the Tampa Tribune, the folklore involves "[t]he backwoods craft of rubbing a metal bar against a wood stake [which] produces vibrations that drive earthworms out of the ground. Sopchoppy might be the only place where people make their living harvesting wild earthworms this way, then selling them to bait shops as far away as the Midwest."

"I do have two children, and they can both grunt worms. Absolutely. It's something that I want to hand down to them," Debbie Chane said. "It was something that my daddy handed down to me and his daddy handed down to him. I want them to know a little bit of their heritage and their tradition."

Darwin would have been enthralled. But what I want to know is, what is the scientific explanation for the folkloric secret of worm-calling that eluded Darwin? Was it really a matter of amplitude, as he speculated? Or did his spade produce the wrong vibration spectrum, whacks as opposed to grunts?

Posted by Mark Liberman at 09:46 AM

Adam Kendon on the 'chin flick'

[I wrote to Adam Kendon to ask him about the gesture that Antonin Scalia used recently in Boston. The guest post below is Adam's email response, formatted for posting by me. -myl]

Andrea de Jorio, whose La mimica degli antichi investigata nel gestire napoletano from 1832 is rather comprehensive regarding Neapolitan gesture, describes the gesture known -- since Desmond Morris et al.'s Gestures (London: Longmans, 1979) as the 'chin flick' -- in this way:

"Outside tips of the fingers pointed under the chin and pushed outwards forcefully".

He describes this gesture in his section entitled "Negativa, No", and it is the sixth of the 13 different gestures of negation that he describes. It is preceded by a description of a form of a head negation gesture in which the

"Head [is] raised a little as in pushing it backwards".

He adds "The Neapolitans typically give greater emphasis to this same gesture by adding one of the following, which also have the same meaning themselves, without the head being raised." The two gestures that may be added are a mouth gesture in which the "[l]ower lip [is] moved up somewhat and pushed outward a little, or pulled downwards towards one side", and the so-called "chin flick"), his description of which I already quoted. He provides a drawing of this gesture in his Plate XX1, No. 2.

Thus De Jorio sees the 'chin flick' as a reinforced version of the head negation gesture in which the head is pushed back somewhat. The 'chin flick' as de Jorio described it for Neapolitans in 1832 is still in use today here in the Neapolitan area, and it is used with just the significance that de Jorio ascribes to it: a forceful or reinforced "no". It can be used in many of the various contexts in which one wishes to say "no" or in some other way express a negative.

De Jorio suggests that the head-pushed-back gesture (his negation gesture No. 4) derives from a wish to distance oneself from whatever it is that is proposed -- and of his gesture of negation No. 6, here the 'chin flick', he adds

"...it is clear that with such an action the gesturer wishes to show that he wishes to distance his head from whatever is offered to him or proposed that does not please him. In order to do this with emphasis, the hand, the finger tips or just the nails may be used to push the head back as far as possible."

I have myself captured this gesture in use on video-tape several times, when it is being used within a discourse. That is, it is not being directed to another person but is part of saying something negative in a rather emphatic way.

Today, in a little local trattoria that I frequent here on the Island of Procida (where I am at the moment) I asked two local Procidanians (both at least in their sixties, I believe) if they knew and used the gesture. They were totally familiar with it and explained that it is a way of saying "no" or expressing a negative sentiment in a forceful way, confirming exactly Andrea de Jorio's explanation of it.

[Quotations from de Jorio are taken from my translation, published as Gesture in Naples and Gesture in Classical Antiquity by Indiana University Press, 2000. See pp. 290-291].

If you ask someone from the more northerly parts of Italy about this gesture they are likely to say that it means "I don't care" or "It does not bother me" -- and they do tend to suggest that it is a rather rude gesture. Possibly it can be used in this way also in Naples, but this is not the first meaning that is given to you if you ask about it. Certainly my video-tape recordings -- all recordings of natural spontaneous talk, not gesture elicitation recordings -- which I have never tried to do, by the way -- would confirm that here in the Neapolitan area, in any case, it is best interpreted as a forceful or reinforced negative. And I should think de Jorio's idea that it is a way of adding emphasis to the "head toss" negative is a very plausible derivation of the gesture.

In any case, as far as I know, there does tend to be a difference in how it is used in the Neapolitan zone (more or less coastal Campania, shall we say) and how it is used further north. I know nothing about its use in Sicily or Calabria, however. No one has undertaken any sort of systematic survey of the use of this gesture, by the way (except Morris et al., and their work was pretty 'rough', you might say).

Whether there is more than one form of this gesture, I do not know. The Neapolitans I have asked or have observed don't think so -- the gesture may be done more than once in succession if you are being really very strong in your negation. However, it may be that in the North, where it is said to accompany the expression "me ne frega" -- more or less "I don't care about it", and where it might be less acceptable, it may be that there are 'reduced versions' in which the finger tips as it were rub back and forth under the chin.

Gestures: Their Origins and Distribution, by Desmond Morris et al., describes a Europe-wide survey of twenty different gestural forms, including the 'chin flick'. Morris et al. are responsible for the name 'chin flick'. However, I do not have the book to hand here, and I cannot remember exactly what he has to say about this gesture.

(As to the gesture of negation in which you push the head back, this is still used even today in Southern Italy and Sicily, and is almost certainly very old. It is distributed in those parts of the Mediterranean that were, in antiquity, occupied by Greeks [see Gerhard Rohlfs "Influence des élements autochones sur les langues romanes (Problèmes de gógraphie linguistique). Actes du Colloque International de Civilisations, Littératures et Langues Romanes. Bucherest: Comission nationale roumaine pour l'Unesco, Actes du Colloque international de civilisations. 1959/1960. 240-247 and see also Peter Collett and Alberta Contarello "Gesti di assenso e di dissenso" in Pio Enrico Ricci Bitti, ed. Comunicazione e gestualità. Milan: Franco Agneli, 1987, pp. 69-85]).

This is the best I can do. As to Scalia, he is, after all, Italian-American -- I do not know his Italian background, but he is not, as far as I know, of Neapolitan origin. [Wikipedia says that says "Antonin Scalia was born in Trenton, New Jersey. His mother, Catherine, was born in the United States; his father, S. Eugene, a professor of romance languages, emigrated from Sicily at age 15." - myl] His use of what appears to be a version of the 'chin flick' (as described above) seems a little different from the Neapolitan one -- but then, as I say, in regard to language, as well as gesture, you cannot generalize about "Italians" -- you have to be local.

[Guest post by Adam Kendon]

Posted by Mark Liberman at 07:00 AM

April 06, 2006

So in style at the NYT

New York Times editorial, "The Amnesty Trap", 4/5/06, p. A22:

All it [the Martinez-Hagel compromise bill on immigration] would do is give a face-saving assurance to hard-liners that immigrants would suffer adequately for their green cards and allow Republicans to reassure suspicious constituents: this is so not amnesty.

Ah, GenX so! How in style is that?

GenX so -- so-called because it seems to have first appeared in the speech of Generation Xers (in the 80s, with the movie Heathers as a major boost for its spread) -- is recognizable in speech by its characteristic high-rising-falling intonation (which distinguishes it from ordinary intensifying so, even when the intensifier is accented), but can be detected in writing only through its syntactic context: clear cases of GenX so occur in contexts that otherwise are not available for intensifiers -- with dates and similar time expressions ("That is, like, so 1980s", "It was so two years ago"), proper nouns and pronouns ("This is so Iceland", "It's so you"), absolute adjectives ("You are so dead!"), negatives ("It's so not entertaining", "A pizza delivery man who can't find a campus address is so not my problem"), and VPs ("We so don't have a song", "Parker so wanted to be included", "I am so hitting you with the September issue of Vogue!"). There are cases -- like the title of this posting -- that aren't so easy to classify, but the Times editorial's so is a solid example of a GenX use, with a negative.

The thing about GenX so is that, though it's spreading, it's still associated (almost twenty years after Heathers) in the minds of many people with the trendy young, especially young women. Meanwhile, plenty of older folks (like me) find it handy rather than trendy, and use it every so often. Nevertheless, it's still an informal usage, so it's a small surprise to see it in a Times editorial, even in a representation of speech.

But the Times's editorial style is not at all stiffly formal. The editorial writers seem to be aiming at something you might call "relaxed formal"(or possibly "serious informal"), the sort of thing you might expect in essays, on serious subjects, that are meant for a general educated readership. Part of the point is not to be off-putting, and a friendly conversational tone can be helpful. So we get things like article omission in the introductory expressions the thing/trouble/point/problem/... is:

Most Americans -- two-thirds, accordng to a Pew Research poll this month -- believe that Saddam Hussein had a hand in the Sept. 11 terrorist attacks. Trouble is, no hard evidence of such a link has been made public. (NYT editorial "The Illusory Prague Connection", 10/23/02, p. A26)

Apparently, GenX so is now entrenched enough to count as an everyday colloquialism, like Initial Article Omission. You go, so!

[Note on intensifier so: Astonishingly, some advice manuals -- like the recent Garner's Modern American Usage -- label plain ol' intensifier so 'very' ("I'm so happy to meet you!") as casual, colloquial, or conversational, too informal in style for use in serious writing. This one has been around since Old English. Though it is certainly frequent in conversation (where it rivals, or in some counts, exceeds, very and really), it is not at all rare in serious nonfiction. Perhaps the reasoning of the advice writers is that a usage that's very common in conversation is appropriate only there. But that's fallacious reasoning; surely yes and no and clauses without dependent clauses in them (and huge numbers of other usages) are much more frequent in conversation than in serious nonfiction, but there's absolutely no reason to avoid them in formal writing. What's important is not the relative frequency of a usage in various contexts, but the associations speakers make between the usage and those contexts.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:02 PM

Meth Stings

The DEA continues its war on meth, now conducting sting operations on Georgia convenience store owners. This time the targets are in northern Georgia, as reported in the April 6, 2006 New York Times (here). As in other similar sting operations throughout the country, the targets are primarily immigrants from India, many named Patel. In the recent Georgia stings, 23 of the 24 targeted stores were owned and operated by Indians.

Undercover officers approach the stores and ask to buy quantities of Sudafed and other products from which meth can be made. The conversations are tape recorded, of course, then used as evidence against the store owners and employees.

In my experience with such cases, there are five major problems with sting operations of this type:

1. The targets' English ability is limited. 2. The law about quantities they can sell is not all that clear. 3. The undercover agents are vague and ambiguous about their illegal intentions. 4. The DEA's transcripts of the tapes are frequently wrong. 5. The DEA Investigative Reports (written reports done after the conversations took place) are often creatively dead wrong.

Let's start with the law. From Title 21, United States Code, Section 802, page 1291: (C) Offenses involving listed chemicals Any person who knowingly or intentionally--

(1) possesses a listed chemical with intent to manufacture a controlled substance except as authorized by this subchapter;

(2) possesses or distributes a listed chemical knowing, or having reasonable cause to believe, that the listed chemical will be used to manufacture a controlled substance except as authorized by this subchapter...

Here we have the problem of a convenience store "knowing or having reasonable cause to believe." In other criminal laws, such as fraud, "known or should have known" is used in much the same way. This poses interesting problems for non-native speakers of English, whose inferencing skills fall far short of those of native speakers. One of the common strategies of undercover operations is to drop in hints of illegality, hoping that the targets will pick up on them and say something incriminating. Such hints may be clear to native speakers but they can go right by non-natives easily. Even if the officers manage to be even a little bit explicit, such as mentioning the words, "meth," or "cook," they often say them in ways that are either unclear or difficult to hear.

I don't know the facts in the Georgia cases, but would be surprised if the agents didn't talk the same way they did in a Detroit case I worked on a couple years ago, where the agent dropped his voice to near silence when he uttered "meth" the only time he used it during six conversations. The prosecution celebrated this use of "meth," but the tape wasn't very supportive. The government's transcript reads:

Owner: What are you doing with these? It's none of my business but--

Agent: Makin' money, makin' meth.

Unfortunately for the government's claim, this was a gross mistranscription of the language on the tape, which was:

Owner: What are you doing with these? It's none of my business but--

Agent: Makin' money ma'am, makin' mo--(inaudible second syllable).

It's hard to imagine how mo-- can be understood as meth and the simple count of syllables doesn't add up to "makin' meth."

The agent also said something about "cooking up" two times, both of which yielded no response at all from the store owner, who continued on with her topic, as though she didn't even hear these words. Some of the Georgia defendants said they thought when the agent mentioned cooking," he was talking about a barbecue cook-out. In the Georgia cases, the ACLU filed documents including a sworn statement by one of the informants in the sting, in which he claimed that federal investigators were sent to Indian-owned stores "beacuse the Indians' English wasn't good."

What a way to run a country.

Posted by Roger Shuy at 04:22 PM

Onghornlay rofessionalpay

A few days ago, Arnold Zwicky imagined someone calling the Grammar Hotline with a question about Pig Latin solecisms. Now Michael Kaplan at Sorting it All Out has provided a real-life example from the history of Microsoft product development.

You see, we had just shipped Windows Server 2003 a little while ago, a product which had its name changed at the last minute from Windows .NET Server 2003 ... to Windows Server 2003. I never heard from any authoritative source, but some people claimed the last minute name change cost the company a ton of money since the name string was hard-coded in so many places.

So anyway, to avoid this problem in the future (the expense, not the name changes themselves; just ask anyone who worked on Office XP about those last minute name alterations in products!), some work was done to make sure the name was stored in a more central location.

And to test this, there were several early internal builds that, to test this centralization, had the name of the product (at the time Longhorn Professional) translated into Pig Latin, so that any place that had the name unchanged would be considered a bug. ...

Now to why the Language Log post made me think of it -- because the Pig Latinized version of the name was onghornlay rofessionalpay, rather than onghornlay ofessionalpray, as some might expect. :-) ...

It's too bad no one called in to Grammar Lady to get an official answer on the gramaticality of the fake Longhorn name. :-)

Problems like this come up frequently, in fact. They might not make it to the inbox of grammar mavens, but I use similar cases in my intro linguistics course, to help make the point that children's language games are based on their implicit understanding of phonology, not on spelling.

One example comes form a Spanish game called "Jerigonza" or "Jeringonza," which involves replacing every vowel V with the sequence VpV. About a decade ago, searching the nascent web for "Jerigonza" used to turn up a C program for turning Spanish text into Jerigonza, written by a student at the Universidad Católica de Chile. However, this program applied the Jerigonza rule literally to spelling, and so it didn't behave the way that most Spanish-speaking children would. Because the program applied the V-to-VpV change to every orthographic vowel, it turned the (three-syllable) Spanish word escuela "school" into the (eight-syllable) Jerigonza word epescupuepelapa.

Spanish-speaking children -- who often learn the jerigonza game before they learn to read -- would think in terms of the categories of their natural and internal phonology, rather than in terms of letters of the alphabet. In the letter sequence "ue" in Spanish spelling, the "u" letter is actually pronounced as a glide /w/, which in phonological terms is a consonant rather than a vowel. Thus we might write the pronunciation of escuela as /e-skwe-la/, with three syllables, not four, even though there are four vowel letters in the word. In the same form of quasi-phonetic writing, the jerigonza version would be /e-pe skwe-pe la-pa/, with six syllables, not /e-pe sku-pu e-pe la-pa/, with eight. Most Spanish-speaking children would produce the six-syllable form -- though I've found that there are a few who have learned to apply the jerigonza rules to spelling instead of to sound.

[Update: Matthew Stuckwisch writes in to correct the syllabification of my Jerigonza example:

While it is true that Spanish favours CV-CV-CV-CV constructs, in the case of double consonants, they don't always both go to the following syllable. So, escuela would be (using your phonetic orthography) /es-kwe-la/ and the two jerigonza versions would be /e-pes kwe-pe la-pa/ and /e-pes ku-pu e-pe la-pa/ respectively: Spanish speakers tend to have quite a difficult time saying English words like "scram" and "school" without say "ehscram" or "ehschool" since /sk/ isn't a native construct within a syllable for them.

Yes, that's obviously right. I got the Jerigonza forms from native Spanish speakers, but of course the cited syllabification was my own, and I should have known better.]

Posted by Mark Liberman at 08:32 AM

Sister-girl diva-dom

Yesterday on NPR's News and Notes, Farai Chideya interviewed Cynthia McKinney (along with a couple of lawyers), and then discussed the interview with a group including Jeff Obafemi Carr, host of the radio show Freestyle. Carr contibuted a linguistically innovative description of McKinney:

Chideya:	Jeff, I do want to drill down one last thing before we move on to other topics. Why is this blanketing the media in a week filled with political news about lobbyist Jack Abramoff, corruption scandals, Tom DeLay. Why is this -- wall to wall?
Carr:	Well, Cynthia McKinney is a very- very popular figure, she's a polarizing figure, she's got- she's got that it, and- if you want to call it diva-dom, if you want to call it that- that sistergirl kind of nature that makes her both popular, on one end, and a target on the other, and that's why people are drawn to her personality.

I associate sistergirl (or sister girl or sister-girl) with the expression of African-American female solidarity that's on display in the lyrics of "Don't waste your time" by Denise Rich (from 1999, performed by Aretha Franklin and Mary J. Blige):

Sister girl, sister girl
It's much deeper than what you're thinking
When something don't feel the same
Yeah you better believe his love has changed.

or in this passage from Patricia Smith's 1997 review in the Boston Globe of Jill Nelson's "Straight, No Chaser: How I Became a Grown-up Black Woman":

That very first line on the very first page of Jill Nelson's biting and bittersweet rant, "Straight, No Chaser: How I Became a Grown-up Black Woman," was enough to make me put down the book and squeeze my eyes shut in recognition of that feeling, that frustrating transparency, the searing solitude that accompanies the state of being black and female in this world. I waited for the chill to pass, picked up the book again, and smiled at the author's firm, assured, 'bout-to-lay-it-down countenance on the front of the jacket. "OK, sistergirl," I said aloud. "Let's talk."

So I guess "that sistergirl kind of nature" would be the kind of nature that evokes African-American female solidarity, or maybe general African-American solidarity from the female side.

I associate diva-dom with female star quality in pop music, as in this 1992 Billboard note:

Upbeat, electronic dance/popper should play well in shopping malls as well as nightclubs. Sugar-sweet female vocals are synthetically cool and detached. Slick production should help propel Modest Fok upward in the ranks of disco diva-dom.

or this 2001 Houston Chronicle review of the music at the Livestock Show and Rodeo:

Through their 40-year careers Gladys Knight and Patti LaBelle have amassed six Grammys, two stars on the Hollywood Walk of Fame and a body of work that defined soul diva-dom.

Carr's frame "if you want to call it X, if you want to call it Y" seems to be a variant of the commoner "you could call it X, you could call it Y" -- "if you want to call it X [you could], if you want to call it Y [you coul]". This invites us to place McKinney's "it" somewhere between "diva-dom" and "that sistergirl kind of nature", a place where it's normal to swat a policeman who's bold enough to try to stop you from bypassing a metal detector. Whether or not you agree with Carr's admiring tone, you've got to admit that it's a vivid description.

[The passage from News and Notes also exemplifies the new use of "drill down" to mean something like "talk more about"; but that's a topic for another post.]

[Update: John Cowan, Language Log's non-resident copy editor, wrote to complain

What's with the hyphen in "diva-dom"?

Granted, the word is somewhat ad hoc, but -dom is productive these days (it apparently went through a period in the 18th century when it wasn't). Is it the Romance root with the Saxon suffix? But the same applies to chiefdom, dukedom, fiefdom, martyrdom, officialdom, and serfdom.

I was misled (or mizzled) into reading the word as "diva-dom[inatrix]", with a different stress pattern.

Well, I hesitated for a moment about this when I dashed off the post this morning. Google has 27,900 for "diva-dom" vs. 13,800 for "divadom"; and searching on LexisNexis and Proquest turned up a fair number of hits for "diva-dom" but none for "divadom", suggesting a collective decision by the world's copy-editors to go with the hyphen. Finally, whenever I see "divadom" I think of "diatom", which is unhelpful.

But John's logic is plausible, so I'll leave the hyphen out if I ever have occasion to use the word again.]

Posted by Mark Liberman at 07:27 AM

April 05, 2006

Much ado about a lot

No, not about Mary Newton Bruder's book Much Ado About a Lot, but about the quantity determiners much and a lot of. It all started when a Stanford Humanities Center colleague, Wendy Larson, asked me about a point of English usage: had I noticed occurrences of much that seemed awkward, but were improved when replaced by a lot of? She had, in student writing. I hadn't, but I instantly saw her point: things like "There was much rain last night" and "Much shrubbery was growing in front of the house" are grammatical for me, but "There was a lot of rain last night" and "A lot of shrubbery was growing in front of the house" strike me as very much better.

The advice literature on English usage mostly sees the difference as one of style/register -- a lot of (also lots of) is specifically informal, casual, colloquial, conversational, spoken -- and there turns out to be something in that, though it's not really right to fix on a lot of as the stylistically marked item of the pair, and there's much more variation in formal writing than some of the advice books suppose. Meanwhile, ESL materials mostly see the difference as determined by syntactic context -- much (also many) is used in questions and negatives, a lot of in positive sentences -- and again there's something in that, but there's also a lot of variation in both sets of contexts.

Where there is variation, just how free is it? For many years, Dwight Bolinger steadfastly maintained that a difference in form always spells a difference in meaning, so that all variation will turn out to be unfree, if you look hard enough. I'm not willing to go that far, but I do believe that almost all putative free variants turn out to be discriminable (on the basis of semantics or discourse function) in certain contexts. As, I think, is the case for much and a lot of.

The first reaction that Larson and I had to examples like "Much shrubbery was growing in front of the house" was that the occurrence of much in them was inappropriately stiff and formal -- a proposal that would predict that with head nouns from the low end of the style/register spectrum, much would be really terrible, while with head nouns from the high end of this spectrum, much would be fine. Both predictions were almost immediately borne out, but only to some extent.

First, much with a really informal noun, like fun. "We had much fun at the beer blast" is pretty dreadful (though not ungrammatical). I undertook a quick Google web search on "much fun", hoping (without much reflection on the matter) that this would be as easy to do as the searches I did on like vs. such as in September 2005. Instant complications. Huge numbers of hits on passives like "Much fun was had by all" (related, I suppose, to the formula "A good time was had by all" -- where, I wonder, do these odd passives come from?), which had to be set aside. Then very large numbers of "much fun" with a degree modifier on the "much": too, as, so, how, that, this, very, pretty, etc. This is a context in which no variation between much and a lot is possible, since a lot cannot be modified by these degree words (while much cannot be modified by quite, the only degree word that occurs with a lot of with any frequency). Excluding these degree words in the search cut things down, but not as much as I'd hoped, thanks to the existence of the jokey proper names "Sew Much Fun", "Two Much Fun", and "Snow Much Fun" (who knew?).

In the end I looked at the first hundred remaining items (out of ca. 30,200). There were a respectable number of hits -- given this amount of noise, the exact number isn't important -- a lot of them in negative contexts: "Dead Pornstars Aren't Much Fun", "you aren't much fun", "Not Much Fun", "it can't be much fun being a member of..." But not all: "touring Europe and Korea... and having much fun", "an exciting holiday with much fun and pleasure".

To summarize so far: there seemed to be some style/register effect for much, and also an independent syntactico-semantic effect, generally allowing much in negative contexts. (Quickly, I realized that interrogative contexts also allowed much freely: "Did you have much fun at the beer blast?") Obviously, there's a lot more work to be done here, but this is a start.

Second, much with head nouns from the higher end of style/register spectrum, for instance nominalizations. Here, Larson supplied some examples from her reading. In Jack Turner's article "Green Gold" (on absinthe) in the 3/13/06 New Yorker, she found two such examples, both in positive contexts:

He makes his absinthes from entirely natural ingredients, and there has been much speculation about what those ingredients are. (p. 40)

Alcoholism was not yet properly understood, and there was much confusion about absinthe's physiological effects. (p. 41)

Both of these sound fine to me (and Larson).

(In addition, there's an occurrence of (positive) headless much not long after these:

Hemingway did much to burnish the drink's legend... (p. 42)

I might have preferred a lot here, but much is ok. I take headless much to belong with much in combination with stylistically neutral head nouns.)

To summarize: much seems to be generally congenial with more formal head nouns.

At this point, I turned to MWDEU and its article on lots, a lot, which begins:

Sir Ernest Gowers in Fowler 1965 notes that the Concise Oxford Dictionary labels a lot colloquial but that modern writers do not hesitate to use it in serious prose... Colloquial is the favorite handbook label for a lot and lots: about three quarters of those in our collection use it.

And those handbooks often advise against it, as in this quotation from Morris & Morris, Harper Dictionary of Contemporary Usage (1975), p. 379:

In the senses of "very much" and "a great amount," lot and lots are accepted as Standard by the latest dictionaries. However, they are still considered Informal by many, and a different choice of words is advisable in writing.

and from Trask's (2005) Say What You Mean!, p. 171:

LOTS OF, A LOT OF Normal in spoken English, these expressions still look rather strange in formal writing. Quite a few people are now happy to use these things in formal writing, and write Lots of research has been done, but many readers will find this objectionable. You are advised to write A great deal of research has been done, or, in very formal writing, Much research has been done.

(Notice that Trask treats much as stylistically marked (as formal) AND a lot of as stylistically marked (as conversational).)

On the other hand, the Collins Cobuild English Grammar (1990) assigns no stylistic level to either much or a lot of; Garner's Modern American Usage (2003) seems not to mention the topic at all; and one of the on-line grammar sites (Literacy Education Online) specifically says that a lot "has the same meaning as both many and much and can be interchanged with either one."

Back on the first hand, the Longman Grammar of Spoken and Written English (1999), tells us:

Other determiners specifying a large quantity are a great/good many (with plural countable nouns, a great/good deal of (with uncountable nouns), plenty of, a lot of, and lots of. The last three combine with both uncountable and countable nouns. They are characteristic of casual speech... (p. 276)

Some of the differences in the use of quantifiers may reflect their relative novelty, in historical terms. Those ending in of... are recent developments from quantifying nouns. It is thus no surprise that these are relatively rare, and when they do occur, they are most typically found in conversation, or carry a strong overt one of casual speech when used... (p. 277)

This is the authors' interpretation of their corpus results for quantity determiners, as summarized in Table 4.15 (p. 278), from which the following figures (in occurrences per million words, in conversation, fiction, news writing, and academic writing) are extracted:

	CONV	FICT	NEWS	ACAD
many	400	400	600	1000
much	600	400	200	200
many+much	1000	800	800	1200
a lot of	400	200	200	< 100

Now, it's true that a lot of occurs more often in conversation that in any of the three other genres, and least in academic writing (which would suggest that it's still not entirely congenial with great formality). But many + much (the parallel to a lot of) also occurs more often in conversation than in fiction and news writing. What's really striking about the table, however, is that many occurs most often in academic writing; this is consistent with the proposal that this determiner is in fact stylistically marked, as formal.

Nevertheless, much on its own (separated from many) occurs less often in the more formal half of the style spectrum (400/million) than in the less formal half (1000/million). I'm not at all sure how to interpret this, especially since this pattern is the opposite of the pattern for many (1600/million inh the more formal half vs. 800/million in the less formal half). It might follow from differences between many and much with respect to negation, interrogativity, and/or preceding degree modifiers in the two sets of genres. Another topic for further study.

(In all genres, many + much is more frequent than a lot of, but it's hard to assess the significance of that fact, given that a lot of is inconsistent with a wide range of degree modifiers, as noted above.)

To summarize this tour through the advice books and the Longman Grammar: despite what some of the advice literature says, a lot of isn't strongly conversational/casual in style; instead, there's merely some dispreference for using it in very formal writing. The Longman Grammar data are apparently not consistent with Larson's and my judgments (also suggested by Trask) that much is tilted towards the formal end of the spectrum, as many pretty clearly is.

When we look at on-line ESL materials, formality hardly figures at all. Instead, negativity and interrogativity are listed in source after source as heavily favoring much and many, with a lot of used primarily in positive statements. Some typical advice:

www.learnenglish.be: Much and many are generally used in questions and negatives... A lot of is used in positive sentences...

www.learnenglish.org.uk: Much and many are usually used in negative sentences and questions... Much is not usually used in affirmative sentences. For these we prefer a lot of or lots of.... Much can be used in affirmative sentences when it is preceded by so, too or as...

grammar.free-esl.com: We generally use many and much in questions and negative statements but we use a lot of in positive statements.

www.edufind.com: Note: much and many are used in negative and question forms... They are also used with too, (not) so, and (not) as... In positive statements, we use a lot of...

(The Longman Grammar, p. 275, notes the negation connection, saying of many and much: "They are typically used in negative contexts".)

Formality makes an appearance in at least one of the on-line ESL sites:

www.1-language.com: Much is used with uncountable nouns, and is often used in negative statements and questions. It's uncommon to use much in positive statements... Much and many can be used in affirmative statements, but give a more formal meaning... A lot of is used with uncountable and countable nouns, and is generally used for affirmative statements... A lot of is also used in questions, especially when you expect a positive response. Although it is often said that much and many are used for questions, we usually use them for questions which expect a negative response. For example:
- Do you want a lot of pizza?
I expect you want to eat a lot.
- Do you want much pizza?
This sounds unusual, as though I expect you don't want to eat much.

Lots of can be used in the same way as a lot of, often in informal speech.

Now we have introduced the possibility that there is (sometimes) a meaning difference between much and a lot of, as we'd expect from the rule of thumb that variation is unfree (Bolinger's dictum). Consider

1a. Much office work is tedious.
1b. A lot of office work is tedious.

My intuition is that 1a merely says that a large amount of office work is tedious, while 1b says that a SIGNIFICANTLY large amount of office work (possibly the majority of it) is tedious. Ceteris paribus, 1b is (I think) a stronger claim than 1a.

There's even a possible source for such a meaning difference: a lot of, with its source in a nominal construction, has a secondary accent on lot, while much, like many other determiners, has a tertiary accent; in addition, a lot of has three syllables, while much has but one. So a lot of is significantly heavier phonologically than much. A greater semantic strength for a lot of would then be iconic to its greater phonological weight. (Ok, it's speculative, but I've just started exploring this topic, so now's the time for speculation.)

In any case, I'm hoping that this difference (or some other one) can be parlayed into an account of the difference between 2a and 2b (from the 1-language site), which I feel pretty strongly:

2a. Do you want much pizza?
2b. Do you want a lot of pizza?

And also into a difference I (think I) see in some negative sentences. This one I'm going to have to work up to.

First, distinguish NP-internal negation (hereafter, Int-Neg), in which not is a constituent with an NP it negates, as in 3, from VP-external negation (hereafter, Ext-Neg), in which is outside the VP of a clause it negates, as in 4 and 5:

3a. Not much shrubbery still had leaves on it.
3b. Not a lot of shrubbery still had leaves on it.

4a. We didn't see much shrubbery.
4b. We didn't see a lot of shrubbery.

5a. There isn't much we can do about it.
5b. There isn't a lot we can do about it.

(I think I detect subtle meaning differences in each pair, but that's not my point here.)

Now, an alternative to clausal negation with a negative auxiliary (in -n't) is clausal negation with independent not, as in 6:

6a. There's not much we can do about it.
6b. There's not a lot we can do about it.

It's well-known that examples like 6 are in fact ambiguous, between an Int-Neg structure and an Ext-Neg structure. I believe that's true here, except that 6a strongly favors the Int-Neg structure ('There's (only) a little we can do about it'), while 6b does not, and might even favor the Ext-Neg structure ('We're unable to do a lot about it', i.e., 'It's not the case that we're able to do a lot about it').

Enough for today. There are plenty of loose ends to entertain yourselves with.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:27 PM

How innovative is that!

One more for the baseball files... The game of baseball has provided many obvious contributions to the English lexicon, particularly via metaphorical extensions to other fields of human endeavor — think of the figurative senses of curveball, home run, in the ballpark, lineup, offbase, out of left field, pinch-hit, and so forth. More subtle are turns of phrase that are not specifically related to baseball but may have first been popularized among ballplayers before spreading to wider usage. I came across one such candidate while reading Jim Bouton's classic 1970 tell-all, Ball Four:

[Gary] Bell is a funny man... He's got an odd way of talking. Instead of saying, "Boy, that's funny," he'll wrinkle up his face and say, "How funny is that?"
Or he'll say, "How fabulous are greenies?" (The answer is very. Greenies are pep pills — dextroamphetamine sulfate — and a lot of baseball players couldn't function without them.) [p. 81 of 1990 Wiley edition]

(The passage is also notable for its revelations about "greenies." Now, 36 years after Bouton exposed the rampant use of pep pills in the sport, Major League Baseball is finally testing for amphetamines as part of its new drug policy.)

Nowadays this usage of "How <Adj> is that!" to mean "That's very <Adj>" is quite widespread in American English. Granted, exclamations introduced by how are nothing new: consider such lines from the King James Bible as "How are the mighty fallen!" (2 Samuel 1:19) or "How great are thy works!" (Psalms 92:5). The Oxford English Dictionary traces this use of how all the way back to Old English and also notes the postwar colloquial formation "How <Adj> can you get?" (The earliest citation listed for that expression is "How unconscious can you get?" from Herman Wouk's 1951 novel The Caine Mutiny, but Leonard and Jane Feathers' composition "How Blue Can You Get?" predates that by about five years.)

Despite these precursors, the construction "How <Adj> is that!" has a particularly modern ring to it. In a discussion on the American Dialect Society mailing list, Larry Horn noted that the expression tends to have a heavy stress on the final word "that," with a falling intonation at the end. This is best represented as "How <Adj> is THAT" (and not at all well represented by "How <Adj> is that?", which implies a rising intonation at the end of the utterance). The turn of phrase became very popular in the 1990s, a trend that a contributor to the newsgroup alt.usage.english attributed to use on "Seinfeld." But Bouton's comment about his teammate Gary Bell's then-idiosyncratic usage suggests it might have been percolating for two decades before its '90s vogue.

The next earliest cite I've found implies that the locution circulated among ballplayers throughout the 1970s:

Los Angeles Times, Aug 22, 1979, p. E2
"He didn't like the way I ran onto the field," says [Dave] Kingman. "Now, how asinine is that?"

By the mid-'80s it had spread to sports columnists, as in this column by the Washington Post's Tony Kornheiser:

Washington Post, Oct 11, 1985, p. F1
Fourth and goal from the one. Dynastic Green Bay left with one last play, trailing, 17-14. The NFL's most celebrated team coached by its most celebrated man, and the quarterback bops into the huddle taking requests. Now, how cool is that?

Kornheiser could very well have picked up the expression from the players he was covering and helped bring it to wider attention. "How cool is that" doesn't start showing up in the Usenet archive until May 1992, and by the following year similar formations could be heard on "Seinfeld":

George: Hey look at this - boy are you lucky - another spot - right in front of the hospital. In an emergency yet! How lucky is that? Is that unbelievable? How unbelievable is that?
("The Bris," Season 5, Episode 5, Oct. 14, 1993)

(For what it's worth, George Costanza didn't begin working for the New York Yankees front office until "The Opposite," which aired on May 19, 1994.)

Even though the two earliest cites I've found so far are quotes from ballplayers, that doesn't mean the innovation originated in baseball usage, of course. In the ADS discussion, Wilson Gray recalled hearing the expressions "How fine is that?" and "How bad is that?" from fellow GI's as far back as 1959. So perhaps "How <Adj> is that!" first circulated in the Army before escaping to baseball by the late '60s. As searchable databases of digitized sources keep expanding, we may well be able to track the expression's early trajectory with some more precision.

Posted by Benjamin Zimmer at 02:26 PM

Everything is too appropriate these days

So says John Fiore, the actor who plays Gigi Cestone on The Sopranos. But then, luckily, there's Associate Justice Antonin Scalia. According to a story in the Boston Herald

[Peter] Smith was working as a freelance photographer for the Boston archdiocese’s weekly newspaper at a special Mass for lawyers Sunday when a Herald reporter asked the justice how he responds to critics who might question his impartiality as a judge given his public worship.

“The judge paused for a second, then looked directly into my lens and said, ‘To my critics, I say, ‘Vaffanculo,’ ” punctuating the comment by flicking his right hand out from under his chin, Smith said.
[...]

Yesterday, Herald reporter Laurel J. Sweet agreed with Smith’s account, but said she did not hear Scalia utter the obscenity.

Smith was fired by the Archdiocese for releasing the picture, but he retains his teaching position at B.U.

Scalia sent a letter to the Herald giving his side of the story. He describes a different gesture and a different meaning:

I responded, jocularly, with a gesture that consisted of fanning the fingers of my right hand under my chin. Seeing that she did not understand, I said “That’s Sicilian,” and explained its meaning - which was that I could not care less.

He quotes from Luigi Barzini's The Italians to support his view of the gesture's meaning:

“The extended fingers of one hand moving slowly back and forth under the raised chin means: ‘I couldn’t care less. It’s no business of mine. Count me out.’ This is the gesture made in 1860 by the grandfather of Signor O.O. of Messina as an answer to Garibaldi. The general, who had conquered Sicily with his volunteers and was moving on to the mainland, had seen him, a robust youth at the time, dozing on a little stone wall, in the shadow of a carob tree, along a country lane. He reined in his horse and asked him: ‘Young man, will you not join us in our fight to free our brothers in Southern Italy from the bloody tyranny of the Bourbon kings? How can you sleep when your country needs you? Awake and to arms!’ The young man silently made the gesture. Garibaldi spurred his horse on.” (Page 63.)

But the reporter and the photographer seem to be describing a different gesture, and the published photograph seems to support their description.

In a later article, the Herald asked the cast of The Sopranos to act as language consultants:

“It’s an obscenity,” Joseph Gannascoli, who plays capo Vito Spatafore on the HBO drama “The Sopranos,” said of Scalia’s gesture, which involved flicking his hand under his chin. ...

“It’s not like grabbing your crotch, not that bad an obscenity,” Gannascoli said. “But it’s an obscenity. It’s something you would do after paying a bookie, to your bookie, but not something you would do in church.”

Though the Herald's headline was ‘Sopranos’ stars divided on bawdy body language, in fact the stars quoted seem pretty much in agreement that the gesture is highly disrepectful but not the worst possible gestural obscenity:

“It’s not that bad, but I wouldn’t do it to my mother. No way. Would I do it in church? These days, maybe. It depends if the priest was giving me the hairy eyeball,” said Stoneham native John Fiore, who played Sopranos capo Gigi Cestone.

Fiore did applaud the outspoken jurist for his animated honesty.

“I like when people are casual like that,” Fiore said. “Everything is too appropriate these days.”

In the Philadelphia Inquirer, Alfred Lubrano asked around in South Philly, and got various answers:

"The gesture means, 'I don't care, fuhgeddaboutit,' " said Joe "Bubbles" Scavola, 70, a longtime employee at Esposito's.

[...]

Sonny D'Angelo of D'Angelo Brothers butcher shop said: "My father would go berserk if I used that gesture." To his family's way of thinking, the gesture is obscene. "Really, I've never done that," he said. "And I'm fluent in all the Italian curses."

The Wikipedia article on gesture has been updated to include a section titled "Flipping the fingers out from under the chin", which references the Scalia incident, assuming that the gesture was the chin flick that the reporter and the photographer described, and supporting the view that the meaning corresponds to the obscenity that the photographer reports hearing. That expression -- spelled as "vaffanculo" -- is said to represent a fluent pronuncation of "va a fare in culo", meaning "go take it in the ass".

Leaving aside the contested verbal obscenity, there are at least three gesture-related questions here. First, which gesture did Associate Justice Scalia actually make? Was it "flipping the fingers out from under the chin", or "the extended fingers of one hand moving slowly back and forth under the raised chin"? Second, what are the conventional meanings (and degrees of obscenity or taboo qualities) of these various gestures in Sicilian (or more general Italian) culture? Third, what did Scalia intend the gesture to mean as he made it?

A possible geographical and historical account of differences of opinion about the gesture's meaning is supplied by this May 2004 entry in Google Answers:

Q: Could someone please explain what this hand gesture means? This guy did this hand gesture to a group of my friends and myself the other day - he had his hand underneath his chin (fingers underneath his chin, nails facing up) and was moving or "flicking" his fingers hitting underneath his chin on his neck. He did this quite often and found it rather amusing as did his friends. Just curious what him and his friends meant by this.

A: Here is the most comprehensive explanation I've found: "The Chin Flick gesture, in which the backs of the fingers are swept upwards and forwards against the underside of the chin, is an insulting action in both France and northern Italy. There it means 'Get lost-you are annoying me.' In southern Italy it also has a negative meaning, but the message it carries is no longer insulting. It now says simply 'There is nothing' or 'No' or 'I cannot' or 'I don't want any'. This switch takes place between Rome and Naples and gives rise to the intriguing possibility that the difference is due to a surviving influence of ancient Greece. The Greeks colonized southern Italy, but stopped their northern movement between Rome and Naples. Greeks today use the Chin Flick in the same way as the southern Italians... "

Is the "flick" vs. "back and forth" also a regional difference? Or are there two different gestures, one of which means "fuck you" from Rome to the north, but "fuhgeddaboutit" from Naples to the south? If any readers know the answers to these questions (or to better and more precisely posed questions), please let me know and I'll post them.

[A non-linguistic point: This all started because a reporter for the Herald, Laurel J. Sweet, "asked the justice how he responds to critics who might question his impartiality as a judge given his public worship". That's how the question is described in a Herald article by Marie Szaniszlo, and put that way, it seems remarkably offensive. The question seems to assume that the impartiality of any judge who attends any religious service is in doubt. I find it shocking to think that an American reporter finds that plausible enough to ask about it.

Of course, Sweet might just have been trying to get a rise out of Scalia -- and after all, she succeeded. Or perhaps this description of the question is misleading. But Sweet's own original article refers to "those who question his impartiality when it comes to matters of church and state ... for publicly celebrating his conservative Roman Catholic beliefs". Given what this leads me to think the question must have been, I'm not surprised that Scalia was reduced to some wordless -- or at least non-English -- expression of rejection.

After all, as the McCormick Tribune Freedom museum wants us to remember:

Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof ...

Of course, it doesn't say that the press shall make no assumptions in these areas. But still, it seems truly bizarre to question whether a judge's participation in public worship is proper.]

[Update: some expert opinion about the meaning of the "chin flick" is here. ]

Posted by Mark Liberman at 12:08 AM

April 04, 2006

Cupcakin' it

Here's a little something for all the new readers sent our way from Baseball Prospectus. In the offseason, the New York Mets acquired All-Star closer Billy Wagner, and on Opening Day manager Willie Randolph wasted no time in bringing Wagner in for the ninth inning of a close game. After the Mets pulled off a 3-2 victory, the AP quoted Wagner as saying:

"Might as well get thrown right into the fire. No use cupcakin' it."

I had never come across this use of the verb cupcake before, but its meaning was immediately obvious from the context. Wagner meant there was no point in trying to breeze through his assignment or get by with little exertion. There's a long tradition of similar dessert-related metaphors in American slang: piece of cake, cakewalk, pudding (meaning 'something easy'), easy as pie, etc. A little searching on Usenet newsgroups and other forums for sports talk turned up various uses of the verb cupcake, very often in reference to a team building up a deceptively good win-loss record thanks to an easy competitive schedule:

nebr.sports.unl, Dec. 29, 1999
BYU, who cupcaked their way to a perfect season in '84, gets it in the karma shorts again by getting whipped by unbeaten Marshall, one year after losing their bowl to unbeaten Tulane.

Buffalo Runners Forum, June 18, 2002
I don't care if Dan Grande cupcakes his way to victory; I'm just wishing that he would throw it down against the big dogs once in a while.

rec.sport.basketball.college, Jan. 6, 2003
Look at SOS [strength of schedule] to see if the school is cupcaking.

KFFL Community, Jan. 13, 2005
Mediocre Indy, Cleveland, and Pittsburgh teams cupcaked their way into the playoffs, feasting on the Jaguars, Texans, and Bengals.

Sportsrant Community, Jan. 29, 2005
and how does a good defense give up what 56 points to KC? thats why Philly basically cupcaked it through the playoffs.

I posted my findings to the American Dialect Society mailing list, and in short order Grant Barrett had crafted an entry for cupcake in his online lexicon of weird and wonderful words, Double-Tongued Word Wrester. Searching on the hyphenated spelling, Grant found a much earlier citation in the newspaper databases, from 1991:

1991 Phil Jackman Baltimore Evening Sun (Md.) (July 24) "But I only used them two days" p. D2: In its last season before joining Big Ten hoops, Penn State is cup-caking it with UMBC, Morgan, Drexel, Miami (O), Buffalo, Lafayette, Cleveland State, Columbia, Toledo, etc. How would you like to be in charge of selling season tickets?

Grant also suggested another possible source for the term besides dessert-type "easy" metaphors: the noun cupcake can also mean 'a homosexual man; an effeminate or ineffectual man,' from an earlier term of affection for a girl or woman. Given all the macho posturing in American organized sports, I can see how this sense of cupcake could have informed the creation of a verb that implies a lack of manly exertion. In baseball alone, there is a great deal of what Jim Bouton in Ball Four called "homosexual kidding among players" — not to mention the type of gender play found in rookie hazing, where new players on a team are dressed in drag (see also Barry Bonds' recent impersonation of Paula Abdul). So it wouldn't be surprising if cupcake as a derogatory epithet for a less-than-manly man could be transferred to the verb form, with an assist from similar lexical items like cakewalk (which also has a verb sense meaning 'to breeze through an easy task').

I came across one outlier with a slightly different sense, in an article where Kansas University basketball player Keith Langford is talking reverentially about his mother:

Topeka Capital-Journal, Dec. 17, 2002
"She's always been my worst critic," Langford said. "She never cupcaked anything, always gave it to me straight."

Here the verb clearly means 'sugarcoat (something)' (i.e., 'make criticism more palatable'). I haven't found any other examples of this usage, so I'm guessing Langford was reaching for sugarcoat but came up with cupcake through a creative mixture of metonymy and metaphor.

Posted by Benjamin Zimmer at 10:44 PM

Invasion of the philosophical jocks

Language Log has been slashdotted several times, linked to by Andrew Sullivan, discussed on NPR, and cited in the pages of The Economist, among other sources of blips in our visit and pageview counts. But as large a blip as any of these has come today from a paid-subscription sports site, Baseball Prospectus, where Kevin Goldstein's Future Shock column today linked to Geoff Pullum's 12/30/2005 post on the American Philosophical Association's Eastern Division meetings.

Goldstein's piece is about "surprise names on opening day rosters". The Baltimore Orioles' roster includes an outfielder named Nick Markakis, whom Goldstein (in an email to me) characterized as "good, but not ready yet". What does this have to do with the American Philosphical Association? Well, not much, really. Goldstein attributes Markakis' roster spot to the intervention of the Oriole's owner, Peter Angelos, in an imaginary conversation whose last word links to Geoff's post:

I get the feeling a conversation happened in the Baltimore front office recently, and it went something like this:

[Phone rings]
Important Front Office Guy: Hello?
Voice On Phone: Hey There! You see my boy Markakis again?
IFOG: [sighs]. Yes sir, Mr. Angelos. He's going to be a good one, sir.
Peter Angelos: He sure is! I can't wait to see him with the big league club this year!
IFOG: Mr. Angelos, please don't get me wrong here. Markakis is an outstanding prospect, clearly our best, and he's had a fantastic spring. But our outfield situation is very crowded, and to keep him we're going to have to not only take at-bats away from a veteran, but go with one fewer pitcher on the roster than we'd like.
PA: Well, I'm sure you'll figure something out to get my boy Nick on the roster for Tuesday. Maybe a bench role!
IFOG: We also feel that Nick needs some more time, sir. He has only 33 games above A ball, and is just 22. No need to rush him, sir. In the minors he could play every day and continue to develop.
PA: Get it done!
IFOG: Yes, sir.
PA: OPAA!

[click]

When I saw all those baseball fans clicking through to Geoff's post, I was at a loss to predict the connection. Kevin was kind enough to send me the segment, even though I'm not a subscriber, and I'll confess that it would have taken me quite a few guesses to get it.

It's more than fair for a baseball writer to find a punch line in one of our posts, given how often we've used baseball references, at least in passing:

"Playing for the Dominican, skiing in Czech, working in Saudi" (3/3/2006)
"Sketchballs" (2/18/2006)
"Football's F-word" (11/29/2005)
"When 'there's' isn't 'there is'" (9/1/2005)
"Stuff" (5/11/2005)
"Historically untracked" (5/11/2005)
"The mystery of #12" (1/21/2005)
"Grice, Pascal, The Times, and Barry Bonds" (4/20/2004)
"Bonds Ties Mays" (4/13/2004)
"Pete Rose and sorry statements of the third kind" (1/13/2004)

Actually, I'm surprised that there aren't a lot more baseball references in Language Log's back list. I can't answer for the others who post here, but my own excuse is the sad trajectory of the post-1993 Phillies, along with the lack of any local equivalent to the Cubs tradition of finding fun in losing baseball.

Judging by the number of clicks it generates, Baseball Prospectus must have quite a respectable number of subscribers. Meanwhile, it seems to me that weblogs and forums and such -- what marketing types have taken to calling "consumer-generated media" -- are growing rapidly in the sports area. Back in September of 2003, Davide Dikcevich complained in Forbes that "Sports blogs ... are few and meager". That's a long time ago, in blog years, and my casual impression is that sports-oriented blogs have been springing up like mushrooms after a wet summer.

Posted by Mark Liberman at 07:22 PM

It's all grammar, one more time

Every so often -- most recently, back in February -- I comment on the fact that most people who aren't trained in linguistics think of "grammar" as embracing everything that is regulated in language, including (among other things) spelling, punctuation, pronunciation, and address terms. But, wait, even some Ph.D.s in linguistics go this route. For instance, the late Mary Newton Bruder, "a.k.a. The Grammar Lady", author of Much Ado About a Lot: How to Mind Your Manners in Print and in Person (reprinted under the title The Grammar Lady: How to Mind Your Grammar in Print and in Person), who earned an M.A. and a Ph.D. in linguistics from the University of Pittsburgh, taught TESOL for some years, and wrote a newspaper column on "grammar" (in a very broad sense) from which this book is derived (following in the wake of a phone hotline and a Grammar Lady website).

The jacket copy describes MNB as "a lover of language and a passionate gadfly" (she Wrote Letters, many of them, to people whose linguistic choices she objected to). Well, her authorial persona is both perky and prickly. Case in point:

Good grammar enhances communication. Not only can bad grammar make it difficult for a particular sentence to be correctly interpreted, but it can also detract from the message by becoming, in itself, a distraction. There is a restaurant where I often go in spite of its list of specials, which I can hardly stand to read since the spelling and grammar are so terrible. A person who didn't know how good this restaurant's butterscotch pie is might not be so forgiving. (pp. 5-6)

Despite her willingness to consume damn good pie in a den of bad spelling and grammar, she's generally unaccommodating, up to the point of willful misunderstanding: to someone who is asked "Can you spell your name for me?", she suggests responding, "Yes, I can. Would you like me to?"(p. 57), and faced with young people using high rising terminals in things like "Hi. This is Jane Doooeeee?" (her representation of the phenomenon), she says she's tempted to ask, "Are you sure?" (p. 59) What an annoying person!

And she pretty much steadfastly refuses to make distinctions; her examples of language gone awry include simple typos (labeled "Typo of the Weak"), ordinary misspellings, word confusions, non-standard forms and constructions, most of the usual shibboleths, and choices she believed to be too colloquial for formal (especially written) contexts (for instance, split infinitives). They're all mistakes, and they're all offenses against "grammar manners" (p. 7).

Just how wide she casts her net can be seen from the lists of "grammar points" at the end of each chapter. Here's the one for chapter 4 (p. 72), on language in social situations:

change in social situation
pronunciation of mauve
phatic language
"you're welcome"
range of "thank you" occasions
answering machine rules
titles + last names
addressing widows
addressing men with the same last name
names ending in s
addressing former elected officials
addressing young girls
addressing young men
mens' last names and number
women changing names
correcting others' language
wedding invitations
plural "you"
seasons' Greetings
apostrophe in name signs

Ah, she anticipates your objections. On addressing widows (p. 68):

Someone even had the temerity to ask if this topic wasn't an etiquette question... On one hand, this is an etiquette question, but it also involves language use and thus falls into my area of interest.

As far as I can tell, things like the conventions for composing double dactyls and knock-knock jokes don't make it into the book, but maybe that's just because she hadn't come across any inept double dactyls or ill-formed knock-knock jokes, and nobody had asked her about these forms. (Let's listen in... Grammar Hotline: "May I help you?" Caller: "Yes. When my brother speaks Pig Latin, he pronounces 'stop' as 'topsay', but I say it has to be 'opstay'. Which of us is right, Grammar Lady?")

I do wish there were some short and punchy label for all the kinds of conventions of language use (as well as labels for the many different types of these conventions), so that "grammar" wouldn't have to serve this purpose and could continue to be used by linguists for the system of regularities connecting the phonetics and semantics of a (variety of a) language. Or maybe linguists should just give up and follow Geoff Pullum and Barbara Scholz in calling this system the "correctness conditions" for a (variety of a) language. Though I worry about how "correctness" would be taken by non-linguists.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:49 PM

O to be a particle, now that spring is here

This morning my wife, a native Montanan, told me that she is going to drive "up to Hamilton" to visit a friend. This sounded odd to me, because Hamilton is 40 miles south of Missoula, the place where we live. But it isn't odd in this area, where the geographical directions,"up" and "down," relate to the flow of the rivers. Hamilton is upstream. The Bitterroot River flows north between Hamilton and Missoula and all the natives here seem to know this. They notice such things but I'm a product of the industrial East, where the direction of river flow doesn't often reach any level of consciousness. I love to look at rivers but I don't think about which direction they happen to be going. Furthermore, it seems to me that "up" means north and "down" means south, except of course when "up" refers to a place that is higher in altitude and "down" to a lower.

A few years ago I was confused by a woman riding in my car who told me to turn "up" at the next intersection. Since the topography there was flat, I asked her which was was "up." After a good snicker, she admitted that because she has always had trouble with "left" and "right," she had switched to using "up" and "down" instead--except when there was an obvious topographical condition to clarify things.

To add to my confusion, today's sports section of Akron Beacon Journal contains an article about the opening game between the Chicago and Cleveland. The writer noted, "Sunday's game was moved back one day so ESPN2 could make it the showcase for the start of the season." Which way is "back?" To me, it seems to mean later, not earlier. So I remain perpetually mystified by particles in my native language.

Who knows? Maybe particles go wild in the Spring.

Posted by Roger Shuy at 12:06 PM

April 03, 2006

Thanking

Turning sharply from recent Language Logs about really mean and nasty language, let me turn our attention, if only briefly, to the very positive speech act of thanking. A few years ago I decided that it was high time for me to say "thank you" to a lot of linguists who had taught me in their classrooms, at meetings, and through their books and articles. So I carefully composed and sent letters (not Emails) to many of them. I tried to include specific things about their special acts of kindness to me and to mention the little things about them that often aren't given notice. Today, many of them I wrote to are no longer with us, including Charles Ferguson, Bill Stokoe, Fred Cassidy, William Moulton, and Charles Hockett. Others are still here, but not as active as they used to be, including Gene Nida, Dell Hymes, and Eric Hamp. I deeply regret that I began writing these letters too late to thank many others, including my own mentor, Raven McDavid.

In most cases, the letters initiated a continuing correspondence that lasted until these friends passed away. Among other things, I grew to know them in a new and better light. Most of all, they knew that someone remembered them and cared about them as they approached their imminent deaths. My letter to Fergie reached him while he was suffering greatly during his last days. He was too ill to respond but, after he died, his wife, Shirley Brice Heath, telephoned me to tell me how much Fergie appreciated being remembered that way. She related that he had asked her to read my letter to him over and over again because it comforted him so much during his final moments.

Saying "thank you" (with specifics) to people you honor doesn't take much time and it isn't that hard to do. I encourage linguists to give it a try if they have't already been doing it. Think of those who encouraged you when you needed it, whose work and life has benefited you, and whose spirit you admire. Then drop them a cheery letter.

I'm sure they'll love it but the best part is that doing this will make you feel good too.

Posted by Roger Shuy at 12:38 PM

Professor Pullman, in the library, with the candlestick

The April 16 issue of Psychology Today has an item about eggcorns, which quotes someone identified as "Geoffrey Pullman, a professor of linguistics at the University of California-Santa Cruz".

At first I thought this must be an example of the Cupertino Effect. However, MS Word's suggested replacements for "Pullum" are "Plum", "Peplum", "Pullet", "Pulley", "Pulled", "Pullout" and "Pull-up". And since the item's author was Mark Peters of Wordlustitude, I think we have to assume an old-fashioned editorial malfunction at Psychology Today.

I'm reminded of research by Mirko Tavosanis of Pisa University, presented at the recent AAAI symposium on Computational Approaches to Analyzing Weblogs under the title "Are Blogs Edited?" Mirko looked at the frequency of common Italian misspellings in Italian weblogs (from blog.excite.it, clarence.com, splinder.it and splinder.com), compared to Italian newspapers (Corriere della sera, Il mattino, La repubblica, L'Unità). For his chosen set of 19 words, the rate of misspelling in blogs was 0.74%, whereas the rate in the newspapers was 0.68%. By comparison, the rate in Italian-language web sites overall, for these same words, was 4.28%. In the Pullum → Pullman case, Google indexes 11 instances of "Geoff Pullman" as against 18,100 instances of "Geoff Pullum", suggesting a base error rate as low as 0.06% for the web at large.

[Update: Stephan Hollah points out by email that amazon.fr has a listing for Geoffrey K. Pullman, author of The Cambridge Grammar of the English Language, among other works.]

Posted by Mark Liberman at 11:26 AM

April 02, 2006

O grammar books, O now your virtues show

I can only join with Bill and Arnold in deploring the teachers who insist both ignorantly and unimaginatively that the is an adjective. But I have more admiration for H. W. Fowler, who in The King's English said of the phrase the more the merrier: "In this phrase the is not the article, but an adverb, either relative or demonstrative: by-how-much we are more, by-so-much we shall be merrier." That analysis took some ingenuity, and has to strike a sympathetic chord for those of us who grew up in a more exuberant age of syntactic analysis, back when and could be a verb.

Posted by Geoff Nunberg at 10:07 PM

One more brokeback report

Here at Language Log Plaza, we've been recording the diffusion of the movie title Brokeback Mountain, in variations on and allusions to the title (most recently, here) and in a extraordinary variety of uses of the word brokeback and derivatives of it (my last summary here, with a Western-wear addendum here). And now, for your entertainment, six recent additions to this genre, moving from the original in six different directions -- a real tribute to speakers' abilities to find meaning in and create meaning for the linguistic materials available to them.

(I provide this list at the risk of becoming labeled The Brokeback Guy, the keeper of all things brokeback in the linguistic realm. This could drastically increase the already alarming amount of e-mail I get from readers of Language Log. But on with the show...)

1. Brokeback 'wonderfully, lovingly gay'. I start with what is to my mind the most idiosyncratic of the uses in my files so far, a thoroughly positive and celebratory use of the word, to convey the best aspects of being gay. This from someone who posts to soc.motss under the handle "Bock" and (like a significant number of gay men) just ADORES the movie, to the point of seeing the intensity of the men's love for one another while de-emphasizing all the tough stuff. The thread is titled "Used to be a gay moment now it is a brokeback moment". Early on we get Bock's take on brokeback as conveying 'gay', but without any negative nuances that might accompany the word gay:

To me gay may not sound all that great but brokeback guy, brokeback moment, brokeback anything, could never sound anything but great. (3/20/06)

In reply to criticism he amplified:

... brokeback symbolizes love in every sense of the word. There is nothing negative about the word brokeback. (3/23/06)

This elicited several responses from gay men who found the negative associations of brokeback impossible to avoid, in particular this thoughtful reply by Jack Carroll:

I wonder if you aren't looking at the use of Brokeback from one side only. Seems to me that it also probably has a strong negative message for some people, i.e. - deceitful, perverted, adulterous, etc. etc.

The name Brokeback itself suggests something crippled, and while we gay people may see that as applying to the world enclosing the two protagonists, I have no doubt that for some it is a description of the men themselves - two broken men engaging in their folly in the shadow of the emblematic mountain. (3/23/06)

As many people have pointed out, Brokeback Mountain is not just a (gay) love story, it's also a terribly TRAGIC (gay) love story. So it seems unlikely to me that Bock will find many other passengers on the Ameliorated Brokeback Train with him.

2. Cashback Mountain. In another country entirely is the report in Time magazine (4/3/06, p. 95):

Now that Brokeback Mountain has been outed as a well-marketed, Oscar-winning love story...--instead of a controversial, low-budget, art-house flick--one of the film's supporting players says he wants his due.

This would be Randy Quaid, who's suing Focus Features for $10 million. Rebecca Winters Keegan's story on the suit was printed under the head

And Now, Cashback Mountain

This makes reference to the film, but not in any way to its content. And it strains to fit into the "X-back Mountain" template; the story is indeed about cash, but nobody's giving or getting any cash BACK.

3. Brokeback 'homoerotic, gay'. On 3/27/06, Ned Deily reported in soc.motss a passing reference in Leah Garchik's San Francisco Chronicle column that day:

Watching one man hoist another man in Matthew Bourne's "Swan Lake'' the other night, Renee Gibbons says she was thinking, "Oh my God! Brokeback Lake. What next?''

This is brokeback conveying some combination of 'gay' and 'homoerotic'. Also, to my ear, with a somewhat negative take on the whole thing.

4. Brokeback 'married and in a same-sex sexual relationship'. Like #3, this one makes reference to the content of the film. But now extended from men to women. The reference comes in an appeal (relayed to soc.motss on 3/28/06 by Jess Anderson) by Celina R. De Leon for women willing to be interviewed for a project of hers:

Just like the Down Low phenomenon...--white gay, bi, or non-labeling men who have sex with men while remaining married to their wives is all the talk now thanks to the movie "Brokeback Mountain." But unlike the media's coverage--this is not a male phenomena. Women have been doing this for years, too!

I would like to interview women who are, or were, involved on the Down Low with same-sex relationships or same-sex sexual activities.

... So, if you know of a "Brokeback woman" who may be interested in sharing her story with me, or know of someone who would be a great resource for this story--any help would be GREATLY appreciated!

5. Intensifier brokebackingly. Arthur Plotnik, author of the new book Spunk & Bite: A Writer's Guide to Punchier, More Engaging Language & Style, wrote a commentary piece (which is in fact engaging) in the Los Angeles Times on 3/27/06 about the dearth of intensifiers in current English. (My thanks for pointers to this site by Ron Macaulay, via Elizabeth Traugott, and by Ben Zimmer.) People stick, Plotnik complains, to the "standby American intensifiers: very, really, quite, awesomely, amazingly, incredibly, totally, definitely, tremendously, extremely." It's time to remedy the intensifier shortage, he maintains, concluding:

... hard times call for hard measures -- ballistically, backbreakingly, brokebackingly, boy-is-this-intensified-now hard.

It's not entirely clear what work brokebackingly is doing in this inventory. Maybe it's just a rough synonym of backbreakingly 'extremely', without any negative affect or allusion to the movie.

6. Brokeback the homophobic slur. Finally, thanks to e-mail on 3/29/06 from Katie Thomas, I've been able to read part of the NewsMax.com transcript of an exchange between actor Alec Baldwin and radio hosts Sean Hannity and Mark Levin, during a 3/26/06 radio interview with Baldwin by WABC's Brian Whitman. Hannity and Levin phone in. First Baldwin and Hannity trade insults, then Levin joins in and there's a general free-for-all, which resolves for a while into Levin and Baldwin one-on-one, culminating in Levin's riposte:

And you know what you are? You're 'Brokeback' Alec.

I'm not sure what the prosody was -- whether "Alec" was a vocative (as in "You're moronic, Alec") or a predicate noun (as in "You're Moronic Alec"). The transcript punctuates it like the latter, but I suspect it was the former.

In any case, I saw this first without the context, and entertained the possibility that "Brokeback" here was just a generic insult. But no, it's step 4 in an exchange of homophobic (and anally oriented) slurs between Baldwin and Levin (following immediately on Levin's insulting Baldwin's intelligence):

(1) BALDWIN [to Hannity]: And who's that - who's your little cabin boy there with you.

(2) LEVIN: I'm not a cabin boy, butt-boy.

(3) BALDWIN: What are you doing there, cabin boy? ... I now dub you Sean Hannity's cabin boy.

(4) LEVIN: And you know what you are? You're 'Brokeback' Alec.

They're just taking turns calling each other "fag(got)" (who takes it up the ass), without using the direct vocabulary. An everyday example of straight guys engaging in mutual name-calling by impugning each other's sexuality (and hence masculinity and hence overall worth). We're a long way from #1 here.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:17 PM

"Thinking specifically about the F-word..."

To round out a week of posts on profanity (most recently Roger Shuy's droll April Fool's Day spoof), let's consider a new Associated Press poll on the subject conducted by the market research company Ipsos. As the AP reports, nearly three-quarters of Americans questioned said they encounter people using profanity in public frequently or occasionally, while two-thirds said they think people swear more than they did 20 years ago. These statistics are enough to provoke the AP to ask: "Are we living in an Age of Profanity?" To which the only possible response seems to be: "Who the fuck knows?"

Apologies for actually using profanity in a discussion of profanity. That wasn't an option for either the Ipsos pollsters or the AP reporters, who were compelled to refer only to "profanity" and "swear words" without getting very specific. The super-meta opening sentence of the AP article highlights the necessary indirection: "This is a story about words we can't print in this story." In the graphic accompanying the article, jaunty cartoon figures are shown exchanging speech balloons full of avoidance characters, mixing in skulls and lightning bolts among the usual typological mishmash. The text of the article only alludes to one profanity in particular, and it's given the favored metalinguistic label of "the F-word" (characterized as "the gold standard of foul words"). I thought that the distancing F-label was pretty safe to use in the print media these days, but the article includes a parenthetical note: "For the record, we needed special dispensation from our bosses just to say 'F-word.'" (They run a tight ship over at the AP.) Elsewhere in the story, one of the regular folks brought in to bemoan the rampant spread of profanity opts to use the effing avoidance strategy: Darla Ramirez, a 40-year-old housewife from Arlington, Texas, says she hears "people talking about their F-ing car, or their F-ing job. I'll hear it walking down the street, or at the shopping mall, or at Wal-Mart."

Unlike the BBC-sponsored survey ranking the perceived rudeness of 28 selected terms, the F-word is the only one that the Ipsos pollsters specifically asked about. (Accompanying the Ipsos press release is a PDF file with the topline results for the "Associated Press Profanity Study.") After four questions about how often respondents use and encounter "profanity or swear words," the study poses one of the great survey questions of all time: "Thinking specifically about the F-word, how often do you personally use that word in conversations?" Assuming the demographic questions were asked first, that would leave this as the final question of the survey. I'm guessing they positioned it last in case people with delicate sensibilities responded in shock, horror, or uncontrollable laughter.

A whopping 35 percent of those questioned claim that they never drop the F-bomb in conversation. Another 37 percent say they use it a few times a month, about once a month, or a few times a year. (Another 1 percent are "not sure" of how often they say it.) But we didn't see reverse-spin headlines like "Genteel Americans eschew F-word: only 27 percent use 'gold standard' of profanity on a regular basis." Instead the AP reported that "a healthy 64 percent said they use the F-word." This better fit the overall thrust of the article, focusing on the big majorities who say they hear public profanity "frequently" or "occasionally," and those who say people swear more often than they did two decades ago.

How useful are the Ipsos/AP statistics from a sociolinguistic standpoint? I'd say they're just about worthless. First of all, self-reported evaluations of speech patterns are notoriously unreliable, let alone evaluations of taboo items that the questioners themselves can't even refer to without avoidance strategies! Also, what counts as a "profanity" or "swear word"? Besides "the F-word," that's left up to the judgment of the individual respondents, since specific lexical items are of course deemed unmentionable. And finally, what does it mean that 67 percent of those questioned think that people swear more now than 20 years ago? I haven't seen any other public survey like this in the past, but I'd wager that just about every generation in modern American history has thought that profanity was on the rise. "Swear words" are particularly susceptible to the Recency and Frequency Illusions discussed in this space by Arnold Zwicky (e.g., here, here, and here). It's the same hell-in-a-handbasket degenerationism that informs perceptions of non-standard usage in general. In the case of profanity, the perceived increase in conversational usage may be shaped by a gradual easing of taboo restrictions in recent decades — at least in certain public contexts not bound by, say, the AP Style Guide or the whims of the FCC.

One context in which those usage taboos have been eased is in our hallowed dictionaries. But don't tell that to Irene Kramer, a grandmother in Scranton, Pa., who told the AP that she is bothered by what she overhears from the nearby high school:

"What we hear, it's gross," says Kramer, 67. "I tell them, 'I have a dictionary and a Roget's Thesaurus, and I don't see any of those words in there!' I don't understand why these parents allow it."

Kramer must not have bought a dictionary since the early 1970s, when major collegiate dictionaries like Merriam-Webster's and American Heritage began including "fuck" and various other taboo items. (And please don't expose her to HDAS or Cassell's!) Lexicographers have long taken the descriptive approach to "obscene" or "profane" vocabulary, recognizing that every age is, more or less, the Age of Profanity.

Posted by Benjamin Zimmer at 01:17 AM

April 01, 2006

The LMU: A New Formula for Measuring Effective Writing

Language Log Plaza has been the source of a couple of recent posts concerning rude and disagreeable English (here) and (here). In stark contrast with these rants about how bad things are these days comes some good news about rudeness as a promising tool for diagnosing language learning. Linguists at Orizen Technical University in Canada have isolated a new way to assess language fluency--students' ability to use rudeness effectively. This discovery challenges the long-standing notion that mean length of utterance (MLU) is the most useful indicator of language ability.

The researchers began their study with the belief that a student's ability to write English functionally matters far more than how long their utterances turn out to be. "New learners can write long sentences that produce a high MLU score without saying anything worth reading," reports the Director of the project. "We find that more advanced language learners have shown a highly developed sense of meanness in their texts."

The research team, which included a classroom teacher, first investigated various speech acts, excluding agreeing, requesting, giving opinions, etc. before hitting on those speech acts that came closest to reflecting real fluency-the students' ability to communicate effectively their wrath, meanness, ill-temper, rudeness, insults, and disdain.

Thus, the researchers came up with the length of mean utterance (LMU) to replace the mean length of utterance (MLU), suggesting that it should be used in future studies of written language fluency. According to this study, students who remain focused longer on meanness and rudeness invariably display the greatest progress in their ability to produce effective written texts.

Methodology:
Finding it difficult to simulate real life rudeness through experiments (after all, the research was done in Canada), the research team rejected controlled experimentation in favor of a more ethnographic approach. From students in the English teacher's class, they collected writing samples from ten adolescent language learners during the 2005 school year.

Results:
A total of 480 really mean utterances were recorded, distributed as follows:

1-2 word utterances 10%

3-5 word utterances 40%

6-10 word utterances 40%

over-10 word utterances 10%

At first the team members were puzzled by this perfect bell-curve distribution, so they set about to discover which students fit into each of these four categories. Not surprisingly, the most effective language learning correlated with the length of the writers' mean utterances (LMU). Closer examination of the data is illustrative, including the following examples:

1-2 word utterances:
You jerk!
Bitch!
Hell no!
Drop dead!
Baloney!
Crap!

3-5 word utterances:
Go to hell!
Man, you're really stupid!
This is utter nonsense!
Get out of here!
Like you bore me silly!
This is crap!

6-10 word utterances:
All year long you've said absolutely nothing to convince me.
I've been more impressed by a blank wall.
Your dress looks like a raggedy brown potato sack.
I'll never take your courses again in my life.
This year was one of the dullest in human history.
Here's that damn who/whom rule again.

Over 10 word utterances:
This is, beyond question, one of the most boring courses I have ever taken in my life.
There are few, if any, teachers at (name of school deleted), or any other school, who could qualify as well as you for the title of Miss Emptyhead of 2006.
Whatever else might be said, this research study sucks worse than anything ever concocted by human beings and it ought to be outlawed, if possible, in the future.

These sentences made it patently clear to the research team that students who produced 3-5 word LMUs had not yet mastered English properly. Those who wrote longer, really mean sentences, however, developed their insults thoughtfully, elaborated on their basic nastiness, gave rude specifics, used ugly comparisons, and even created some syntactic embedding.

It was also noted that neophytes used exclamation marks while experts did not, a result that the research team plans to examine next. "We will really need to explore this!" explained the director.

Discussion:
The discovery of the length of mean utterance (LMU) as an indicator of written language fluency holds great promise for future studies of language acquisition. Not surprisingly, the research project was amazingly well received by the subjects.* Even beginning level students displayed positive attitudes toward the program, especially when told that they were free to use any expletive they wished.

*It should be noted that some of the example sentences above were found in the course evaluations.

Posted by Roger Shuy at 08:03 AM

Language Log

April 30, 2006

Literary shoplifting

Out-of-time, out-of-body seeing and hearing

Punctuation tip's

Courtleye makere saith copyinge was withovten entente

Kaavyagate Update

Arabic machine translation from Google Labs

Negation day in the news

April 29, 2006

Can you speak in rhinoceros?

Wild? I was livid!

Ma Ferguson, the apocryphal know-nothing

More water cooler chat from Language Log Plaza

The multilingual anthem

Bird (syntax) flu

April 28, 2006

Nationalism in all its star-spangled glory

Separating species with bullets

Starlings linguists language loggers readers follow commented on the work of studied are damn smart!

April 27, 2006

Starlings

Around the water cooler at Language Log Plaza

April 26, 2006

The race to the bottom in science reporting

A million words here, a billion words there...

Beats Workin'

Accidental spelling at Google; Mary Matalin speaks unfortuitously

Who is the decider?

April 25, 2006

Arizona knows

In defense of Kaavya Viswanathan

Subsequent related posts by me

Other Language Log posts on this subject:

Lament for Port and Starboard

Probability theory and Viswanathan's plagiarism

Full tilde

Apocalypse not now

The latest in anti-social media

Straw creatures great and small

April 24, 2006

Battling blang

Yet Another Sign of the Apocalypse

Hollywood glamour, activist passion, false rhetoric

Scrapie in ancient China?

References

April 23, 2006

An overature to the nucular family and the doctorial committee

Myopia in applied linguistics

New ideas and new words

Prior artwork

April 22, 2006

When is a phrase not a phrase?

Loose sallies of the mind

An Eats, Shoots & Leaves moment

April 21, 2006

Adventures in celebrity onomastics

The Provenzano Code

Is a cow a negotiable instrument? Can a woman be a "reasonable man"?

Ali G in the land of colorless green ideas

Croatian an endangered language?

April 20, 2006

Are we an it or a they?

"That stuff" and "the genre of 'blog'"

Heated words about "sauna"

April 19, 2006

Four subjects of a book review

McClellan's mangled sentences: where are they?

L-erba' temi tal-poeżija

MLA Language Map enters new territory

Grand theft bovine; or, when is an antelope not a document?

April 18, 2006

All that and talk about Fight Club

The Emperor's Clothes

April 17, 2006

Harry Potter and the Madding Gerund: Secrets of the Language Log Code

Doctors return to their senses

Patriots Day, Patriot's Day, and Patriots' Day

How much do those red and blue jellybeans predict about linguistic ability?

April 16, 2006