Language Log: June 2007 Archives

June 30, 2007

Invisible telepathic parrots

There's been a new development in the BBC parrot-telepathy story. Last year, the reference to telepathy was silently removed; but now the whole parrot has been airbrushed out of the journalistic record.

It all started back in January of '04 ("Parrot telepathy at the BBC", 1/28/2004). A BBC story about the linguistic abilities of N'kisi the African grey parrot said something foolish about his vocabulary, claimed to be 950 words:

"About 100 words are needed for half of all reading in English, so if N'kisi could read he would be able to cope with a wide range of material."

I explained why this was a silly thing to say by showing the start of the story as it would appear to someone with a 100-word vocabulary:

The xxxxxxx of a xxxxxx with an xxxxxx xxxxxxxxxxxx xxxxx to xxxxxxxxxxx with people has xxxxxxx xxxxxxxxxx up xxxxx.

I also noted in passing that the story seemed a bit credulous about N'kisi's abilities in general:

I yield to no one in my admiration for parrots' communicative efforts, and N'kisi does sound like a remarkable fellow ... but you have to wonder what is happening at the BBC when Mr. Kirby writes that:

N'kisi's remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine.

As a mere linguist, I'll leave this one to the experts at the Skeptical Inquirer, but let's just say that throwing in a claim about pet telepathy doesn't do a lot for my confidence in the rest of the story. It's like reading about a hypothetical engineering genius whose remarkable new windmill design and perpetual motion machine are both covered enthusiastically in the latest issue of National Geographic.

The first thing to disappear was the telepathy. In January of 2007, David Beaver pointed out to me that the reference to N'kisi's telepathy had been silently deleted from the article ("BBC's duplicity stuns Language Loggers", 1/15/2007). Consultation of the Wayback Machine showed that I had not hallucinated the original passage. The Wayback Machine also showed that the change had been made without changing the story's time stamp, which in January of 2007 continued to read "Last Updated: Monday, 26 January 2004, 15:27 GMT", although the version of this story captured on 4/24/2006 still contained the reference to telepathy.

Our reaction at the time:

At the water cooler here at Language Log Plaza, Geoff Pullum commented "Wow! The BBC are not just science idiots; they actually fake the record later, and delete things from published material!" I reminded him of the infamous chatnannies affair, but I agree that silent removal of an embarrassing phrase, while retaining a false "Last Updated" banner, is worse. I'm sure that if a government ministry did the same thing with embarrassing predictions about the war in Iraq, or something of the sort, that BBC News would (quite properly) be all over the story.

But now, as Neil Golightly informs me, the original BBC story link (http://news.bbc.co.uk/1/hi/sci/tech/3430481.stm) -- which used to point to the story about N'kisi by Alex Kirby, "BBC News Online environment correspondent", datelined 1/26/2007, under the headline "Parrot's oratory stuns scientists", -- now points to an entirely different story, "Animal world's communication kings", by Rebecca Morelle, "Science reporter, BBC News", datelined 5/1/2007.

At the bottom of this new story, there is a little note:

Note: This story about animal communication has replaced an earlier one on this page which contained factual inaccuracies we were unable to correct. As a result, the original story is no longer in our archive. It is still visible elsewhere, via the link below:
'Parrot oratory stuns scientists'

"Factual inaccuracies we were unable to correct"?

Did you mean, "the story was a complete and obvious crock, and we were caught at it and made fun of, but our policy of pretending to be infallible prevents us from explaining and correcting the problems"?

As a pompous euphemism, "factual inaccuracies we were unable to correct" doesn't have quite the compact power of "wardrobe malfunction", but it's still impressive.

Now, in fact, it wouldn't be at all hard to document and correct the factual inaccuracies in that original article. I think that I did a decent job of correcting the suggestion about the value of a 950-word vocabulary; it took Ray Girvan all of 232 words to deal with the whole thing, with useful links; and a minute with Google will find plenty of skeptical discussions of N'kisi's telepathy and/or vocabulary, for example this one by Robert Todd Carroll. The Wikipedia article on N'kisi has links to Carroll's debunking and also to a reply by the psychic researcher responsible for the claims about telepathy, Rupert Sheldrake. Surely if the parrot telepathy article were (say) some government minister's credulous white paper about using the power of prayer to reduce National Health expenditures, the BBC's ace investigative journalists would have been able to "correct" its "factual inaccuracies"?

I learned about the purged article and the "unable to correct" note from Neil Golightly, who explains:

You may be interested in a response I got on the BBC Editors' blog. The News Online editor, Steve Hermann, had a post up
( http://www.bbc.co.uk/blogs/theeditors/2007/06/conflicting_accounts.html ) re: changing events rendering stories obsolete. I wanted to reference the infamous parrot story to make the point that discredited stories do hang around and not necessarily in a visible context: however when I opened the link I found to my surprise that it went to an entirely different story. I pointed this out in my comment ( http://www.bbc.co.uk/blogs/theeditors/2007/06/conflicting_accounts.html#c1721336 ). Steve acknowledged my point to a certain extent ( http://www.bbc.co.uk/blogs/theeditors/2007/06/conflicting_accounts.html#c1790329 ) and indeed the new story (still at http://news.bbc.co.uk/1/hi/sci/tech/3430481.stm ) now has a disclaimer at the bottom saying "This story about animal communication has replaced an earlier one on this page which contained factual inaccuracies we were unable to correct. As a result, the original story is no longer in our archive. It is still visible elsewhere, via the link below:" followed by an Internet Archive link.

So the original article was first silently purged and replaced by a completely new piece on a similar theme, which doesn't mention N'kisi at all. And the note at the bottom, with a link to a copy of the original article that happened to have been harvested by the Internet Archive, was added only after Neil complained. Here's what Steve Herrmann says about the process:

2. Neil - you are right to point out we should have explained what happened to the original story, and we’re adding a note to the bottom of the new story to do that, along with the link you provided to the Internet Archive. The Newswatch website when it launched had a number of indexes which aimed to increase accountability – the ‘Notes and Corrections’ page was one of them, but was never a detailed or comprehensive list. When we launched the Editors' blog last year, it seemed to us it was a more effective way of achieving the same aims, and we reduced the scope of the Newswatch site to what it is today.

Also, (and in response to JG, Kevin and Dave) as far as corrections in general go it’s our policy to correct anything that’s wrong as soon as we become aware of it. As I said here previously when we make a major change or revision we republish it with a new datestamp, indicating it’s a new version of the story. If there’s been a change to a key point in the story we will often also point this out in the later version (saying something like "earlier reports said..."). Lesser changes, including minor factual errors, corrected spellings and reworded paragraphs - go through with no new datestamp because there hasn’t been a substantive change or update in the story. There may be ways in which, as Neil suggests, we could track all the changes automatically and make them more obvious to readers that way, but at present we haven’t got that ability.

(Apparently the BBC had no copy of the purged article in its own online archive.)

It's certainly a step forward that the BBC News blog "The Editors" exists, and that comments posted there get an honest response, even if that honesty doesn't translate into a transparent correction of errors on the news site itself. I hope that this feedback process will eventually improve the risible state of science journalism at BBC News.

[I should add a few words about why I think this matters.

The public needs accurate information about technical subjects, but the quality of reporting on science and engineering in the popular press is scandalously low. Many science journalists write as if their only goals were cheap sensationalism and various political or personal agendas -- and this is not only true of the tabloids, it applies all too often to the most respected brands in the business. In my experience, BBC News is the worst and most consistent offender among serious English-language media organizations. Its competitors are not much better, but you'd think that an outfit with the BBC's financial and cultural advantages could be ahead of the pack instead of behind it.

I don't especially enjoy playing "gotcha" with journalists, but there's some evidence that being held up to ridicule in the blogosphere has an impact on reporters and editors, not just in individual cases but cumulatively.

Also, the availability of responsible discussion in alternative media offers at least a small contrary force to the surge of misinformation from traditional sources. As a result, those who consult Google or Wikipedia -- with an open-minded and skeptical attitude, of course -- are likely to be better informed than those who rely on sources like the BBC. Perhaps this is the best outcome that we will get, but it's not the best that we could hope for.]

Posted by Mark Liberman at 07:40 AM

June 29, 2007

Revealing speech

I've recently finished reading (the hardback edition of) Richard Dawkins's book The God Delusion. If you haven't yet read the book and are vaguely curious, the first chapter is here. It's already available in paperback in Britain; the U.S. edition of the paperback is expected in September. You can see and hear Dawkins reading the preface to the paperback here.

I didn't expect to find much about language in this book, but there's actually a fair bit: some discussion of the Great Vowel Shift, a reference to the universality of "the underlying deep structure of grammar", and at least a couple of instances in which an appeal is made to stress to clarify an important distinction in meaning (p. 215: the SELfish gene vs. the selfish GENE; p. 364: more things in heaven and earth than are dreamt of in YOUR philosophy vs. more things in heaven and earth than are dreamt of in your phiLOsophy). Among all these bits about language, a couple stand out for me in particular because they are highly misleading (and could have easily been checked); one of these is discussed below the fold, and I'll follow up with the second at some future date.

Let me state first, though, that this isn't a critique of Dawkins's book nor of his abilities as a scientist, and it says nothing about his case for atheism. As regular readers of this blog already know well, there's just a whole lot of misinformation about language out there, and what I say below just shows that even the most intelligent and highly educated among us can get it wrong.

On p. 44 (in Ch. 2), Dawkins devotes two paragraphs to a story from David Mills's book Atheist Universe. Here's the first paragraph, with emphasis added. (Please note that I haven't yet read Mills's book myself.)

David Mills, in his admirable book Atheist Universe, tells a story which you would dismiss as an unrealistic caricature of police bigotry if it were fiction. A Christian faith-healer ran a 'Miracle Crusade' which came to Mills's home town once a year. Among other things, the faith-healer encouraged diabetics to throw away their insulin, and cancer patients to give up their chemotherapy and pray for a miracle instead. Reasonably enough, Mills decided to organize a peaceful demonstration to warn people. But he made the mistake of going to the police to tell them of his intention and ask for police protection against possible attacks from supporters of the faith-healer. The first police officer to whom he spoke asked, 'Is you gonna protest fir him or 'gin him?' (meaning for or against the faith-healer). When Mills replied, 'Against him,' the policeman said that he himself planned to attend the rally and intended to spit personally in Mills's face as he marched past Mills's demonstration.

I'm as disgusted as Dawkins is (and as I presume Mills is) with the intolerance of this police officer (and other members of the police department, as we'll see further below). But what does the officer's speech variety have to do with it? Why use eye dialect -- spelling what the police officer said in such a way that it's clear that he's a speaker of a non-standard variety of English -- and to top it off, include a patronizing translation of the quotation? In fact, why quote the police officer directly in the first place? The only reason appears to be to highlight the way the police officer speaks and to thereby make a(n implicit) connection between his speech variety and his intolerance. Before committing this passage to paper, Dawkins might have asked himself: "Is there no such thing as an intolerant speaker of Received Pronunciation?"

But perhaps Dawkins is just following Mills's lead here. According to Wikipedia, "David Mills was born on January 24th, 1959 and lives in Huntington, West Virginia." This is technically ambiguous (was Mills born in Huntington or not?), but listening to Mills speak (as you can on his website), I would guess he's a native speaker of Southern American English, more specifically the Midland variety which includes most of West Virginia (and which I'm personally very familiar with, having married someone from Louisville, KY). Now, this doesn't mean that Mills himself uses is instead of are or that he pronounces for and against as fir and 'gin, nor does it mean that he doesn't have negative attitudes about people who do speak this way. This brings me to Dawkins's second paragraph.

Mills decided to try his luck with a second police officer. This one said that if any of the faith-healer's supporters violently confronted Mills, the officer would arrest Mills because he was 'trying to interfere with God's work'. Mills went home and tried telephoning the police station, in the hope of finding more sympathy at a senior level. He was finally connected to a sergeant who said, 'To hell with you, Buddy. No policeman wants to protect a goddamned atheist. I hope somebody bloodies you up good.' Apparently adverbs were in short supply in this police station, along with the milk of human kindness and a sense of duty. Mills relates that he spoke to about seven or eight policemen that day. None of them was helpful, and most of them directly threatened Mills with violence.

I'm willing to bet that, like many American English speakers (not just Southerners), Mills uses good in casual conversation in many cases where well is prescribed. (Even if Mills is among those who make it a point to use well where prescribed, I very much doubt he'd say that I hope somebody bloodies you up good is better rendered as I hope somebody bloodies you up well.) So what's up with the "[a]pparently adverbs were in short supply" comment? For one, it's clear that this is entirely Dawkins's contribution to the commentary. Second, the obvious intent of the comment is to say without saying, nudge-nudge wink-wink, that speaking in the way this police sergeant does, mistaking adjectives for adverbs, reveals/reflects his intolerance (because both are reflections of his ignorance).

(Adverb awareness is a double-edged sword. Recall that many prescriptivist types consider a surplus of adverbs to be a bad thing. There seems to be a Goldilocks phenomenon to be investigated here ...)

This all illustrates how many folks -- especially the overeducated among us who are Dawkins's main audience, somewhat ironically -- are willing to accept without question that speech variety and a(n in)tolerant attitude go hand in hand (perhaps, as suggested above, mediated via level of education).

(Incidentally, you can find an .mp3 reading of both of the paragraphs quoted here on Mills's website -- it's an excerpt from the audio book co-narrated by Dawkins and his wife Lalla Ward. Listen closely to Ward pronouncing just the "fir him or 'gin him" quotation. It's a delightfully odd and entertaining mixture of Lalla's own British dialect and what she imagines Southern American English to sound like.)

[ Comments? ]

Posted by Eric Bakovic at 06:20 PM

Planet Springfield

Nancy Friedman, of Away With Words, has posted a list of Simpsons brand names.

Sure, everyone knows about the Kwik-E-Mart, Duff Beer and Costington's, but I bet you'd forgotten about the Palais du Donut. And about Pone, Pelts and Beyond. And the Movementarians. Not to speak of the Happy Earwig Motel!

Posted by Heidi Harley at 01:34 PM

Coinitude

Lynne Murphy wrote to point out a curious choice by Cory Doctorow in BoingBoing yesterday ("Cereal Straws -- powdered sugar-cereal drinking straws", 6/28/2007):

Kellogg's Cereal Straws are straws lined with powdered sugar-cereal dust that kids can drink milk through. It makes the milk taste like the sludge left at the bottom of a cereal bowl. We feed kids gross things, but this reaches new levels of grotitude.

Lynne's comment:

I liked the 'grotitude' coinage (at least as far as I can tell it's a coinage--the google hits for it seem all to be misspellings of 'gratitude').
But I was interested in the first 't' here. Not being very phonologically sophisticated myself, I wondered: is this indicative of the coiner's sensitivity to obscure latinate-English patterns of allomorphy, or is this just a blend with 'attitude'?

I had a look for 'grossitude', and found the 554 hits, including a claim of coinage.

In the case of gross/grotitude, I suspect that the s/t association is mediated by the word grotty, which the OED glosses as "Unpleasant, dirty, nasty, ugly, etc.: a general term of disapproval", suggesting that the etymology is a shortened form of grotesque.

Grotty (rhymes with "snotty") is a word that I can recall from my childhood, without any specific episodic associations. I always thought of it as a slightly upscale variant of grody (rhymes with "Jody"), and I never realized that either word had any relationship to "grotesque". In fact, if you'd asked me, I think I would have said grody was obviously related (somehow) to gross. But the OED sez that grody is "U.S. slang", with the etymological note

[In early use groaty, repr. phonetic respelling of GROT(ESQUE a. + -Y. The shift of t to d is accounted for by the phonetic equivalence of intervocalic t and d in U.S. English. Cf. GROTTY a.]

(The reference to "the phonetic equivalence of intervocalic t and d in U.S. English" is imprecise at best -- it's only non-pre-stress /t/ that is a candidate for flapping and voicing. We colonials might pronounce latter and ladder just the same way, but we're not tempted to turn "a tail" into "a dale" or attack into adac.)

The gloss for grody is

Disgusting, revolting, ‘gross’; dirty, unhygienic, squalid; unattractive, slovenly, sloppy. Freq. in phr. grody to the max, unspeakably awful, ‘the pits’.

The OED's entry for grotty has "Hence grottiness n.", with the citations

1984 Financial Times 6 Oct. 15/7 The grottiness of the room in which their under-graduate son or daughter is proposing to spend the next eight months or so.
1988 N.Y. Times 8 Mar. C13/4 ‘Why do you write so much of grottiness?’ asked a radio interviewer of a current poet.

I'm not sure whether the first syllable of grotitude is supposed to rhyme with rot or with rote, but either way, I'll guess that the resonance with grotty/grody was behind the switch from /s/.

As for the -tude part, it's an old hacker thing, I think. The 1993 edition of the Jargon File observes:

Hackers enjoy overgeneralization on the grammatical level as well. Many hackers love to take various words and add the wrong endings to them to make nouns and verbs, often by extending a standard rule to nonuniform cases (or vice versa). For example, because

porous => porosity
generous => generosity

hackers happily generalize:

mysterious => mysteriosity
ferrous => ferrosity
obvious => obviosity
dubious => dubiosity

Another class of common construction uses the suffix `-itude' to abstract a quality from just about any adjective or noun. This is used especially in cases where mainstream English would perform the same abstraction through `-iness' or `-ingness'. Thus:

win =>winnitude (a common exclamation)
loss =>lossitude
cruft =>cruftitude
lame =>lameitude

Other example from the same source include disgustitude, wedgitude, crockitude, and hackitude.

Like a lot of hackerisms, this one has spread into the mainstream; but I bet that Cory got it from the source.

[And of course, as several people have reminded me, Mark Peters has a whole blog devoted to Wordlustitude. A recent entry deals with supergeniusitude, and he's also covered aggro-goofitude, asscrackitude, ballitude, ball-suckitude, befuckitude, buttmunchitude, buttockitude, cavemanitude, cluster-fuckitude, cohortitude, crackpotitude, crack-whoritude, and many more.]

[Update #2 -- Mr. Verb presents a brief on behalf of latinate derivational morphology, in "More on -t- ~ -ss- alternations".]

[Update #3 -- Joe Stynes writes:

"Grotty", as well as the abstract noun "grottiness", also has the back-formed mass noun "grot", for any substance imbued with grottiness: goo, gunk, muck, sludge, filth. Probably relatedly, Grot was also the company founded by Leonard Rossiter in "The Rise and Fall of Reginald Perrin" for the manufacture and sale of useless rubbishy products.

And Dan Everett adds:

As far as I know grotty really began to catch on after a Hard Day's Night when a man was showing George Harrison new 'fads' in clothing and George said 'They're right grotty'.

]

[Martyn Cornell corrects Dan Everett's field notes:

Considering it's not my dialect, I knew immediately that Dan Everett's transcript of the dialogue from A Hard Day's Night was wrong - George Harrison, as a Liverpudlian, would never describe something as "right grotty", "right" used in that way is Lancashire/Yorkshire, not Merseyside (and it would be pronounced "reet" ...). A quick Google reveals that the actual exchange was:

Simon Marshall: You'll really dig them. They're fab and all the other pimply hyperboles.
George: I wouldn't be seen dead in them. They're dead grotty.
Simon Marshall: Grotty?
George: Yeah, grotesque.
Simon Marshall: Make a note of that word and give it to Susan. (Susan being the "trendy teenager" on Marshall's TV programme ...)

Funny how things get put in natural classes like "slang English intensifier" so that right is mis-remembered for dead.]

Posted by Mark Liberman at 07:42 AM

June 28, 2007

O Grammar, water bag noise!

Are knolls lace two pus, spicily delays won, tree good mime emery oven awed ick seabed third eyesore a tech's blurry Torydom, insane French disco, f-you your say go.

Lace into astir up butter ladle gull whodunit lace into whore murder, Ladle Rat Rotten Hut. Eye canoli proses sit would ma ice clues.

Wait a minute! What I meant to say was...

Arnold's last two posts, especially the last one, triggered my memory of an odd exhibit that I saw at the Exploratorium, in San Francisco, a few years ago.

Listen to a story about a little girl who didn't listen to her mother, Ladle Rat Rotten Hut. I can only process it with my eyes closed.

Unlike most Exploratorium exhibits, this one is not very clearly linked into any actual scientific lesson -- the text at the bottom says something about the 'importance of intonation' to the process of understanding. It's really bigger than that -- as Arnold said with respect to the spellchecking standup routine, it's about the importance of context, both linguistic and conceptual, to language processing.

Update, from grixit:

Analog Magazine, approx 1974, the story, "Come You Nigh, Kay Shuns". A couple of smart aleck word twisting humans use this technique to sneak a message past evesdropping aliens who only knew English through an automated dictionary.

Update II, from Nathan Austin:

Thanks for posting H. L. Chace's "Ladle Rat Rotten Hut" on Language Log! I was previously unaware of this particular instance of homophonic translation, and it's interesting to learn more about its history.
As I'm sure you are aware, there is a rich history of homophonic translation within the tradition of experimental poetry. But Chace's is the earliest instance I believe I have ever seen. So, thanks!

If you *aren't* aware of poetic play with homophonic translation, you might look at Louis Zukofsky's "Catullus" or David Melnick's "Men in Aida."

Melnick's poem is available here: http://english.utah.edu/eclipse/projects/AIDA/aida.html

He also sends a link to several other Furry Tails, Noisier Rams, and Thongs transliterated into aproxihomophones by Chace -- Guilty Looks Enter Tree Beers, e.g.:

http://www.crockford.com/wrrrld/anguish.html

Nathan also pointed out that Language Hat has a couple of posts on homophonic translations, here

http://www.languagehat.com/archives/000821.php

and here:

http://www.languagehat.com/archives/002533.php.

Of course Ladle Rat Rotten Hut isn't a translation, and the meanings of the homophones chosen don't reveal any relationship to the meanings of the words or the original. But the idea of choosing words to mimic sounds which can be processed as other words is common to both.

David Eddyshaw sent along a version which he had punctuated so that a text-to-speech program would read it with convincing intonational patterns; it's below. I tried it out with my Mac's text-to-speech utility (in Stephen Hawking's voice), and it worked pretty darn well. (To hear your Mac say it to you, go into 'System Preferences', click 'Speech'; check the 'Speak selected text when the key is pressed' box and select an unused apple-key combo (I set mine to 'apple-ctrl-shift-s'). Then highlight the text in your browser window, hit your chosen key combination, and listen to the machine read you the furry tail.)

Ladle Rat-Rotten-Hut. Wants-pawn term, dare worsted-ladle gull-hoe lift-wetter murder-inner ladle cordage honor itch-offer lodge-dock florist. Disc-ladle gull orphan worry-ladle rat-cluck wetter ladle rat-hut, in fur disc raisin, pimple cauldron "Ladle Rat-Rotten-Hut."
Wan moaning, Ladle Rat-Rotten Huts-murder colder-inset, "Ladle Rat-Rotten-Hut, heresy-ladle basking-winsome burden-barter an shirker-cockles. Tick disc-ladle basking tudor cordage-offer-groin-murder, hoe lifts honor udder-site offer-florist. Shaker lake! Dun stopper-laundry wrote! Dun daily doily inner florist! Dun stopper peck flaws! An yonder nor sorghum stenches, stopper-torque wet-strainers!"
"Hoe-cake, murder!" resplendent-ladle-gull. Den sea tucker-basking an stuttered-oft.
Honor-wrote tudor-cordage-offer-groin-murder, Ladle Rat-Rotten-Hut mitten anomalous-woof. "Wail wail wail!" set disc wicket-woof. "Effervescent Ladle Rat-Rotten-Hut! Wares are putty-ladle gull goring wizard-ladle-basking?"
"Aroma-goring tumor-grammars" reprisal-ladle gull. "Grammars seeking bet. Armor-ticking arson burden-barter an shirker-cockles"
"Owe hoe!" setter wicket woof "heffer gnats woke!" Butter taught tomb-shelf, "Oil tickle shirt-court tudor-cordage offer groin-murder! Oil ketchup witter-letter an den, ore baw!"
Soda wicket woof tucker-shirt-court tudor-cordage-offer groin-murder. Whinny raft-adder cordage, E picked inner windrow an sore debtor pore-oil-worming, worse lion inner bet. Inner flesh disc abdominal woof lipped-honor bet, paunched-honor pore-oil-worming, an garbled erupt. Den disc ratchet ammonal pud-honor groin-murders nut-cup an gnat-gun, any curdle-dope inner bet.
Inner ladle wile, Ladle Rat-Rotten-Hut a raft attar-cordage-offer groin-murder, an rancor dough-bawl.
"Comb-ink, sweat-hard!" setter wicket woof, disgracing is verse.
Ladle Rat-Rotten-Hut entity-bet-rum an stud-buyer groin-murders bet.
"Owe grammar!" crater-ladle gull-historically, "Wart bag icer-gut! A nervous sausage-bag ice!"
"Battered lucky-chew-whiff doll-ink!" setter bloat-Thursday woof, wither wicket small honors-phase.
"Owe grammar!" crater ladle gull. "Water bag noise! A nervous sore suture-anomalous prognosis!"
"Alder batter day small-ewe-whiff doling!" whiskered-dole woof, ants mouse worse-waddling.
"Owe grammar!" crater ladle gull. "Water bag mouser-gut! A nervous sore-suture bag mouse!"
Daze worry in forger-nut gulls-lest warts. Oil-offer sodden, caking-offer carvers an sprinkling-otter bet, disc curl an hoard-hoarded woof ceased pore Ladle-Rat Rotten-Hut an garbled erupt!
Mural: Yonder gnaw sorghum stenches shut ladle gulls stopper-torque-wet strainers.

Posted by Heidi Harley at 09:04 PM

Proofreading entertainment

I posted a little while back on yellow star thistle counting as two words rather than three, entertained the possibility that star thistle had originally been starthistle but had been "corrected" by a proofreader to star thistle, and noted that I was inclined to misread starthistle as start-histle. This elicited some mail about proofreading, all of it entertaining.

First, there's a Taylor Mali video entitled "The Impotence of Proofreading" (actually, Mali SAYS "The the Impotence of Proofreading"), available several places on-line; here's the link to YouTube. It's a comedy routine packed with (spoken versions of) typos of all kinds -- word confusions, letter substitutions, omitted material, extra material, transpositions -- many of them off-color (anal for any, Sale of Two Titties, and one of the lessons of the routine: "There is no prostitute for careful proofreading").

I got pointers to this video from Marilyn Martin and Chris Waterson. Waterson wrote:

I love the fact that I can watch that and completely understand what he's saying? Why is that?! :)

So there's actually a linguistic question here. The short answer is that we use context and background knowledge to interpret what we hear -- to the extent that we fail to notice many speech errors that we encounter -- and that Mali has been careful to provide enough context to help us along.

Then Mae Sander picked up on the word division question and started an exchange with me about automated hyphenation programs and their discontents. First she cited things like

Small boys in kneep-
ants

Team leaders called co-
aches

Then she told a story:

.. in the very early days of text processing, before personal computers, writers typed on typewriters. Their copy went to data-entry clerks who knew a mark-up language and created computer files with teletype machines or DecWriters. Reviewers and copyreaders received output from huge line-printers, which produced formatted copy on wide fan-fold paper in a monospace typeface. Typeset output (including results of automatic hyphenation) was the very last step in the process. The galley proofs arrived from an offsite Linotron typesetting machine, driven by paper tape from the mainframe computer. This is true: the introduction to the manual for a commercial version of one such text processor and its complex procedures contained this sentence:

This product eliminates the need for pro-
ofreaders.

This story was so wonderful that I was dubious about it, but she's now supplied a ton of convincing detail. In any case, pro-ofreaders were clearly not obsolete then. Nor are they now. Though brute-force methods -- really really big dictionaries with possible hyphenations specified -- can improve things considerably, and undoubtedly have.

I pointed out a few years back on the ADS-L that even correct hyphenations at line end can be troublesome, and Geoff Pullum posted here about my example:

to obtain what he wanted amid the scar-
city of planned economic life...

Surely someone has made a collection of line-break hyphenations gone awry. And no, I don't want to start one myself.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:14 PM

How many ways can we say "fuck off!"?

Guardian cartoonist Martin Rowson bids farewell to Tony Blair:

(Another hat tip to Steve Isard.)

Gordon is, of course, Gordon Brown. And remember that there's no [r] in fork, fur, or fair for (most) British speakers.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:41 AM

From A to Zimmer

Benjamin Zimmer, a frequent Language Log contributor, is starting a regular column at OUPblog ("the Official Blog of Oxford University Press"). Yesterday, there was an introduction by Casper Grathwohl -- and this morning, Ben's column goes live.

His opening number, "On the Front Lines of English, from 'Thirdhand Smoke' to 'Newsrotica'", starts like this:

When I told friends that I was taking a job as editor for American dictionaries at Oxford University Press, I started getting emails asking, "So how do I get a word in the dictionary?"

For the answer, you'll have to read the rest of his column.

Posted by Mark Liberman at 08:59 AM

Reduplication reduplication

In reponse to my post about contrastive focus reduplication, Jeremy Cherfas wrote to draw my attention to an anonymous comment at his Agricultural Biodiversity Weblog. But you''ll need a bit of background to get it.

His colleague Poikileus had posted a sad discovery about Chanel No. 5 ("The sweet smell of agricultural biodiversity", 6/27/2007):

I have ... always believed that unless we can connect people emotionally and positively to the cause of agricultural biodiversity, its conservation and use will be a difficult sell. ...

One of the pillars of my belief has long been Chanel 5 perfume.

Isn’t it derived from the Chanel 5 tree, also called ylang ylang? Botanically known as Cananga odorata, ylang ylang is a widely cultivated tree with heavily fragrant flowers, originally from Asia. ... Here in Cali, Colombia, where I live, the tree is common, and releases its sweet scent to perfume the balmy tropical nights. What better example could there be that agricultural biodiversity not only ensures our survival, but adds glamour and excitement to our lives?

That pillar received a devastating blow when a well-meaning colleague recently pointed out to me that Chanel 5 is a blend of entirely synthetic aldehydes, and has been since its launch in 1921. It actually epitomizes the industry’s break from a natural to a synthetic perfume model.

The anonymous comment:

So you might say that Chanel 5 is ylang-ylang but not ylang-ylang ylang-ylang.

This is the first case that I can recall having seen of full reduplication of a fully reduplicated base. Though probably I just haven't been paying attention.

A less quirky but more consequential observation by Jeremy:

Here in Italy, when someone says they are "from Naples" the reply is likely to be "Napoli Napoli, o Napoli?". That is, are you really from the centres of Naples, or merely from the surrounding region. You sometimes, but not nearly as often, hear the same said for Rome.

This raises an interesting question: is contrastive focus reduplication a linguistic universal? I'd guess the answer is "no" -- if nothing else, it'll run into trouble with the morphosyntax of modification in some languages, and there are probably subtler issues as well -- but I don't have any evidence. This suggests a fascinating set of questions in the little-studied field of historical pragmatics: What is the current geographical distribution of CFR? And did it get that way through inheritance, through diffusion, through independent re-invention -- or all three?

[A half a dozen people have written already with various observations about "New York, New York". The usual use of this phrase isn't actually a reduplication, contrastive-focus-type or otherwise -- it's a sequence of two phrasal words, one the name of a city and the other the name of a state, which happen to be pronounced and spelled the same way. (But if you've ever been asked whether you're from New York, New York, or from New York, New York, New York, New York -- that's a different matter. And a big problem for punctuation, though it's easy to say...)]

[Eric Raimy (author of The Phonology and Morphology of Reduplication) writes:

Saw your post on LL this morning. Yep, 'quadruplication' is extremely rare so the ylang-ylang ylang-ylang is very nice. First case of it I've actually seen and I'd actually guess that 'ylang-ylang' is not reduplicated for the speaker who made the anonymous comment.
CFR occurs in Colombian Spanish too. I met a lawyer who came up to the English Language Institute at the University of Delaware who used CFR in English spontaneously one time at dinner. I hadn't used it around him so I figured that he might be transferring it from Spanish and asked him about it. He confirmed that you could do CFR with basically the same pragmatics/semantics in Spanish as I could explain how it worked in English. Most other people who I've 'poked' about CFR in languages other than English give a resounding 'maybe' in that once they hear it in English, then they think they can do it in what ever native language they speak but as you suggest much more work needs to be done to actually answer this question. In any event, CFR appears to be very easy to transmit...

I agree that CFP "feels" like something that's easy to borrow -- though a language that (for example) insists on quasi-verbal inflection for modifiers of nouns might have some trouble with it.

Here's something that I'd like to know about the Spanish (and Italian) examples. In English, my intuition (which I don't know how to test) is that the second copy is the head, and the first copy is the modifier. In Spanish and Italian, it should be the other way around -- is it?]

[Mae Sander writes:

I just heard a stand-up comedy routine on "that double word thing" in which the comedian (Alexandra McHale on "Premium Blend" episode #0709, on Comedy Central) said doubling is for sorority girls as in "I kissed him, but I didn't kiss-him kiss-him." I can't find an online version of this routine, but the punch line was a question asked at a bridal shower: "Well, if you don't know the bride, you must know the groom" and the answer was "I don't just know him, I know-him know-him."

]

[Leslie Katz writes:

There's a town in Australia called Wagga Wagga.
I've not been there myself, but I read a story about it years ago which so tickled my fancy that I've remembered it ever since.
On the outskirts of town, there's a highway sign, giving you three different destinations, each with an arrow pointing in the relevant direction.
It says,
Wagga Wagga East >
Wagga Wagga West <
Wagga Wagga Wagga Wagga ^

Does the last arrow actually go up, or down?]

Posted by Mark Liberman at 07:50 AM

June 27, 2007

A colorless world

Every so often I'm baffled by a graph, table, or other illustration in the New York Times: I can't figure out what the scales are, what the numbers represent, etc. Yesterday I was stumped by an elaborate illustration concerning "genetic differentiation in modern humans", accompanying Nicholas Wade's "Humans Have Spread Globally, and Evolved Locally" in the Science Times of 6/26/07, p. 3. This is a map of the world with icons (outline forms of human beings) on it representing 52 modern human populations, each icon coded for the makeup of the average genome for that population, with respect to five "modern genetic clusters" (Africa, Eurasia, East Asia, Oceania, America). The coding assigns a different shading for each cluster, but I was able to pick out only one shading (for East Asian) as distinguishable from the other four (by being noticeably darker). The illustration was almost completely uninterpretable.

Then I realized what must have happened, and this morning I verified my hypothesis.

The clue was a note pointing to a line on the map: "Blue lines show ancestral human migrations, which formed the basis for modern populations." There are indeed lines on the map, with arrowheads suggesting the direction of migrations. But there are no BLUE lines; the entire illustration is in grayscale. Ah, I thought, this illustration was supposed to be in color.

On the Times website, it is. African is yellow, Eurasian is blue, East Asian is red (and darker than the other colors), Oceanian is green, and American is purple, or at least a purplish blue. Some of the icons have several colors, representing the makeup of the population in question. There's a lot of information in that illustration, but much of it depends on the colors.

Clearly, the illustration was meant to be in color, and there are a number of other colored illustrations in this Science Times, but somehow this one escaped reproduction in color. Well, things go wrong.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:43 AM

Reading two pages at once

In yesterday's mail, a note from Barbara Partee:

Our indomitable dept secretary and chief trivia-buff Kathy Adamczyk just sent me a link to a fascinating article in the Guardian about an autistic savant, Daniel Tammet, who is both a mathematical savant and reportedly in some sense a language-learning and language-processing savant ("A genius explains", 2/12/2005). The sentence that stopped me in my tracks was one sentence not about him but about Kim Peek, the real-life Rain Man:

"Peek can read two pages simultaneously, one with each eye.".

Can that be???

There's other interesting stuff in the article -- scientists are reportedly interested because Tammet can reportedly introspect about "how he does it". (What they quote sounds entirely metaphorical to me. But what do I know?)

An article about savants that features Kim Peek among others (Darold Treffert and Gregory Wallace, "Islands of Genius", Scientific American 286(6) June 2002) doesn't mention the "read two pages simultaneously, one with each eye" business. However, an online profile of Peek, "The Mind of a Mnemonist", says that

At birth, the doctors discovered that Kim was born with a encephalocele, which is a congenital condition characterized by a herniation of the brain through a fissure in the skull. A later MRI also showed that Kim was born without a corpus callosum (the connecting tissue between the left and right hemispheres of the brain) as well as an absent anterior commisure, and damage to the cerebellum.

Congenital lack of a corpus callosum and anterior commissure would help explain what the same source describes as "the ability to read two pages of literature simultaneously with a 98% rentention rate".

Among the many theories about the etiology of autism, there's one that attributes it to a less extreme form of lack of white matter: "Autism as lack of neurological coordination", 7/31/2004.

[Update -- Randy Alexander wrote to remind us that the title "Mind of a Mmemonist" is an allusion to a famous book by A.R. Luria.]

Posted by Mark Liberman at 06:18 AM

Snowclone of the day

On Monday it was words for cheese ("Cheeseclones", 6/25/2007); today it's philanthropists. This is the last panel from J. Jacques' Questionable Content (#913), sent in by Alain van Hout. Note that in this case, the rhetorical force is not the traditional "just as the Eskimos have N words for snow, so the X (should) have M words for Y". Instead, the idea that the X have M words for Y, by itself, is used to imply an excessive amount of experience with Y.

Somehow Brendan O'Neill resisted the temptation to use this trope in his June 13 Spiked editorial "Welcome to the People's Republic of Bono", whose last paragraph begins:

Bono is a celebrity colonialist. His patronising campaign to single-handedly ‘save Africa’ is actually damaging the continent. It is painting Africa as a pathetic place whose wide-eyed, infantile populations need a loudmouth rock star to fight their corner.

[Tim McKenzie writes:

It mildly surprised me that you made no mention of the cartoon's implication that the entire continent has only one language. "Their language" may have twenty words for anything it likes, but apparently English has no words at all for any distinct parts of Africa.

I interpreted "those poor Africans" as referring to the particular group that Thaddeus spent his time among, not the inhabitants of the continent at large.]

Posted by Mark Liberman at 06:12 AM

Problems with the Wikipedia Logo

There has been quite a bit of controversy over the reliability of Wikipedia. Now, adding insult to injury, people are complaining about errors in the Wikipedia logo, and this has been reported in The New York Times. The reason that it is possible to talk about errors in a logo is that the Wikipedia logo incorporates the word "Wikipedia" written in a variety of writing systems. Some of the spellings are wrong.

One of the errors that has elicited complaints is in the Hindi spelling. Hindi has no /w/, so the consonant /v/ is used. The problem is that in the Devanagari writing system the consonants are written with stand-alone symbols but vowels immediately preceded by a consonant within the same syllable are written with diacritics that attach to the phonologically preceding consonant in various positions. Here, for example, are /va:/, /vi:/, /vu/, and /vu:/. वा वी वु वू. In the first two, the vowel follows /v/. In the second two, the vowel diacritic goes underneath. The short /i/ diacritic actually precedes the consonant that it goes with. In Unicode, U+0935 DEVANAGARI LETTER VA precedes U+093F DEVANAGARI VOWEL SIGN I, but the combination is to be rendered with the /i/ part preceding the /v/ part. Here is the Unicode sequence: वि. If you see a vertical bar to the right with a tail coming off the top and continuing to the right, your browser, like many other rendering engines, is not rendering this properly. Here is what /vi/ should look like:

Here is what it looks like when misrendered, as in the Wikipedia logo:

The other problem that has attracted attention is of a different nature. It concerns the Japanese spelling of Wikipedia. What we see in the logo begins like this: ワィ. The first character is the katakana symbol for /wa/. The second symbol is the subscript version of the vowel /i/. The proposed change is to ウィ. Here the second character is the subscript /i/ as before, but the first character is the vowel /u/. What is going on?

The problem is that the sound system of Japanese does not permit any vowel other than /a/ to follow the consonant /w/. We can have /wa/, but not */wi/, */we/, */wo/, or */wu/. Where the morphology creates such sequences, the /w/ is deleted. That is why we have alternations like /kau/ "buys" and /kao:/ "let's buy" with /kawanai/ "does not buy". The stem of the verb "to buy" is /kaw/, but the /w/ disappears before suffixes that begin with vowels other than /a/. Since Japanese does not permit the sequence of /w/ followed by any vowel other than /a/, there is a kana letter for /wa/ but not for the other sequences.

The reason that there are characters for sequences of a consonant and a vowel is that the two phonological writing systems of Japanese, hiragana and katakana, are moraic writing systems. That is, they are not based on a segmentation of the utterance into individual sound segments but rather into the units known to phonologists as moras. To a first approximation, a mora is the thing of which a light syllable has one and a heavy syllable has two. For example, a Japanese syllable like /ho/ consists of one mora while both /ho:/, with a long vowel, and /hon/, with a final nasal, each consist of two moras. The basic rule in Japanese is that there is one kana symbol per mora. Thus, /ho/, /ho:/, and /hon/ are written ホ, ホー, and ホン respectively. (Since the Wikipedia logo is in katakana, I will limit myself to katakana here. hiragana is structurally almost the same.) The second symbol in /ho:/ marks a long vowel, while the second symbol in /hon/ marks a syllable-final nasal.

A consonant followed by a short vowel or the first half of a long vowel or diphthong constitutes a single mora and so is written with a single, unanalyzable character. Thus, in the set: カ /ka/, キ /ki/, ク /ku/, ケ /ke/, コ /ko/ it is impossible to identify a part that represents /k/ and parts that represent /a/, /i/, /u/, /e/, and /o/. One consequence of using a writing system of this type is that you can't necessarily write any combination of consonants and vowels that occur in the language: a separate character must be constructed for each mora, and in particular, for each CV pair.

The restrictions on /w/-vowel sequences are a fairly recent historical innovation. A few centuries ago, Japanese allowed /w/ before every vowel but /u/. Naturally, there were kana characters for these other sequences: ヰ /wi/ ヱ /we/ ヲ /wo/. The first two are no longer used at all. The last is used, but in effect as a morphogram, to write the accusative case marker, which is just /o/.

The problem posed by Wikipedia is then that the phonology of Japanese does not permit the sequence /wi/ and so provides no direct method of writing it. Conservative speakers of Japanese change the /wi/ of foreign words to the disyllabic sequence /ui/, that is, first an /u/, then an /i/. Less conservative speakers familiar with languages like English that have /wi/ may actually pronounce the sequence as a single syllable, but they are still confronted by the problem of how to write it. The traditional approach is to write the sequence using two vowel symbols, as if it were disyllabic. Naively, that would result in ウイ. In fact, I have seen spellings such as this. What the Japanese Wikipedians prefer, however, is ウィ, in which the small subscript version of the /i/ is used. This makes it clear the they really mean /wi/.

What of the erroneous spelling ワィ in the current Wikipedia logo? It looks like this is an attempt to use a different mechanism for writing CV sequences for which no kana letter exists. Another Japanese consonant that is restricted in its combination with following vowels is /f/, which, like /w/, occurs only when followed by a single vowel, in this case /u/. The only kana letter for an /fV/ sequence is therefore フ /fu/. However, foreign words with other vowels following /f/ have long been familiar to Japanese people, e.g. "film". These are written with フ followed by the small subscript version of the vowel, e.g. フィ /fi/, as フィルム /firumu/ "film". ワィ is the result of applying the same principle to /w/.

Correction: The original post contained a garbled sentenced about the status of /f/, which I have corrected. The original stated that /f/, like /w/, occurs only before /a/. Actually, /f/, like /w/, occurs only before one vowel, but in the case of /f/ it is /u/, not /a/.

Posted by Bill Poser at 03:11 AM

June 26, 2007

The Other Meaning of Bong

Everybody involved in the discussion of the Supreme Court's decision in Morse vs. Frederick seems to think that the only mean of "bong" is "a kind of water-pipe used especially for smoking marijuana". Actually, there is another meaning, which I at least learned first. A "bong" is a very wide piton.

Some of our readers no doubt don't know what a piton is. A piton is a kind of metal spike used by rock climbers. You hammer it into a crack and use it as an anchor for belaying (catching falls), direct aid (climbing by means of stirrups attached to anchors when the holds are insufficient for free climbing), or to hold the rope when descending. When I learned to climb pitons were the primary type of anchor. Nowadays, they have largely but not entirely been replaced by other devices. The word is a loan from French piton, but in English it is pronounced [pʰijtan].

Here is an ordinary piton. It is a Chouinard 1/4 inch angle piton, a classic that we might regard as the perfect example.

Here is a bong. This is a fairly small bong.

At the opposite extreme is the RURP, which stands for "Realized Ultimate Reality Piton". A RURP will only fit into cracks whose existence is on the borderline between real and imaginary. To place a RURP, the climber imagines a crack and gently inserts the RURP into it. I once put all my body weight on a RURP inserted into the underside of an overhang. I was lighter then.

Posted by Bill Poser at 10:25 PM

What's that sound?

One more cartoon for today. Zippy contemplates onomatopoeia. And alliteration.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:23 PM

Totally phat

Dan Piraro takes on street speech:

Nice play on the stereotypical woman's question, "Does this make me look fat?" (said of clothes). But with the slang adjective phat, originally African American slang, but now in wider use, expressing approval.

The history is complex. Uses of fat to cover 'abundant, desirable, good, etc.' have a long history, but the word we're looking at here seems to have developed as an extension, about 50 years ago, of negative fat into positive territory (such reversals are not uncommon; compare the varying uses of bad), at first by black men to describe attractive, in particular well-endowed, women.

Early on the spelling phat was used to distinguish the new use from the old one. (Compare the recent development of the spellings ghay and ghey to distinguish GAY 'worthless, stupid, inept, etc.' from the older GAY 'homosexual'.) Almost immediately, it seems, phat received backronymic interpretations; in May 2004 I collected the following:

Pretty Hot And Tempting
Pussy Hips Ass Tits
Plenty of Hips And Thighs
Pretty Hips And Thighs
Perfect Hips And Thighs
Perfect Hips, Ass, and Tits
PHysically ATtractive
Pretty Hot Ass 'n' Titties
Pritty Horrish At Times
Pretty Heavy And Tubby
Pretty Huge And Tubby

plus one assertion that the original word was actually phatt, standing for: Pussy, Hips, Ass, Thighs, Tits.

An acronymic derivation is surely incorrect, on a number of grounds -- one being the profusion of "originals" proposed -- but plenty of people are absolutely convinced it must be true, possibly because they were first told of this derivation when they were children.

In any case, phat has now extended from one specific kind of approval to something much more general, along the lines of 'cool, hip' (evaluative slang is remarkably hard to gloss accurately). And it's no longer the exclusive property of African Americans, but is used by white kids (like the one in the cartoon), especially those adopting other features of black street style (like the kid in the cartoon).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:10 PM

Oh, the ambiguity!

When you start looking for ambiguities, you find them everywhere:

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:06 PM

The pit of greeting card recursion

Stop me before I thank again!

[Addendum: David Beaver writes to note how close the cartoon came to a nice center embedding variant:

Thank you cards.

Thank you for the [thank you card] cards.

Thank you for the [thank you for the [thank you card] card] cards.

Thank you for the [thank you for the [thank you for the [thank you card] card] card] cards.

And so on, ad infinitum.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:54 PM

Three taboo cartoons

From the Taboo Desk at Language Log Plaza, a small assortment of recent cartoons...

First, from Rob Balder's PartiallyClips, a new take on the puzzle of taboo words: taboo colors? taboo musical notes?

(Hat tip to Geoff Pullum.)
[Added in response to e-mail: The cartoon takes words to be the building blocks of a language, analogous to colors in artistic composition and notes in musical composition. There's no question that larger constructs have been found offensive: the tritone interval, Stravinsky's Le Sacre du Printemps, Picasso's Les Demoiselles d'Avignon -- and Rushdie's The Satanic Verses considered as a whole. Unfortunately, the point is undercut by Balder's reference to "combinations of notes" rather than to individual notes.]

Then from Guardian cartoonist Steve Bell, in a tribute to the Falklands War of 25 years ago, when Bell developed many of the characters in his strip If...: albatross!

(Hat tip to Steve Isard.)

This will require some background information, much of which can be found on the Guardian site and on the Wikipedia page for If... To start with, Bell's cartoons are scabrously savage and dirty-mouthed, to a degree that, I think, would simply not be possible in a cartoon in a respectable publication in the United States.

The Penguin (full name Prince Philip of Greece Penguin, also known as Pulp, later Lord, Quango) comes from a very reactionary family of Falklands penguins who are also anti-albatross bigots.

The asterisking here is not Bell's current style; his latest collection, If... Marches On (2006), spells out all the words in full.

Finally, a cartoon from Hilary Price's much much gentler Rhymes With Orange: when mathematicians swear.

The careful reader will notice that the mathematician's swearing makes no more sense in print than the non-mathematician's "@&#!"

[Addendum: well, I'm an idiot. Several readers have pointed out that what's written in the thought balloon can be read as "Error!".]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:26 PM

IQ and birth order

For the past few days, the most highly-emailed study on the NYT web site has been Benedict Carey, "Study on I.Q. Prompts Debate on Family Dynamics", 6/25/2007, which reports on "new evidence tying birth order to IQ". (There's an earlier article by Carey, "Research Finds Firstborns Gain the Higher I.Q.", 6/22/2007.) I'm glad to say that the NYT story offers a link to the original study, which is Petter Kristensen and Tor Bjerkedal, "Explaining the Relation Between Birth Order and Intelligence", Science 316(5832) 1717, 22 June 2007. The same issue of Science also has a "Perspectives" piece by Frank Sulloway, "Birth Order and Intelligence". (Not that those links will be much help to those who don't have subscriptions.)

The NYT blogs area also has a lively discussion in which Frank answers readers' questions -- though this topic didn't generate as much interaction as discussions of linguistics do, it's still clearly something that people are very interested in.

Anyhow, last night Mark Seidenberg wrote to ask

I'm not sure what the language link is, but given the amount of statistics you've been gently teaching people via the blog, and your continuing focus on the credulity of the mass media when it comes to reporting science (not to mention the credulity of "Science" the magazine), are you considering doing a post on the iq and birth order story?

Is a 3 point difference in IQ that is statistically significant in a large scale study of this sort also functionally significant? Did someone really say this might represent the difference between getting a B or an A in a class? which class was that? math, history, art? Does anyone remember what has been learned about what IQ tests do and do not measure?

Well, Frank Sulloway's Perspectives piece does give a somewhat artificial argument about admissions to some highly hypothetical colleges:

Critics might still argue that the mean IQ difference documented between a Norwegian firstborn and a secondborn is only 2.3 points. Such a modest difference, however, can have far greater consequences than most people realize. For example, if Norway's educational system had only two colleges--a more prestigious institution for students with IQs above the mean, and a less desirable institution for all other students--an eldest child would be about 13% more likely than a secondborn to be admitted to the better institution (the relative risk ratio), and the odds of a firstborn being admitted would be 1.3 times as great.

And Benedict Carey's original 6/22/207 NYT article embellishes this:

Three points on an I.Q. test may not sound like much. But experts say it can be a tipping point for some people — the difference between a high B average and a low A, for instance. That, in turn, can have a cumulative effect that could mean the difference between admission to an elite private liberal-arts college and a less exclusive public one.

I agree with Mark S. that this is very misleading, not to say a complete crock, and is probably motivated by the desire to make the story artificially attractive to readers. But as Mark asked, what's the Language Log angle?

The most obvious one is personal. And I don't mean "personal" in the sense that I have siblings, I mean "personal" in the sense that Frank Sulloway was my college roommate's college roommate. We lived together with 30-odd other undergraduates in a small cooperative dormitory that later became known as the "Center for High-Energy Metaphysics". So Frank, I've decided that you were right about Max Weber after all; but about this IQ and birth order business, I'm not so sure.

For one thing, I wonder what the relationship is between this research and the famous Flynn effect, a general world-wide rise over time in measured IQ scores, which requires IQ tests to be frequently renormed in order to keep the mean at 100. Thus according to Ulric Neisser, "Rising Scores on Intelligence Tests", American Scientist, Sept-Oct 1997,

The largest Flynn effects appear instead on highly g-loaded tests such as Raven's Progressive Matrices. This test is very popular in Europe; the Dutch data mentioned earlier came from a 40-item version of Raven's test. Using the 1952 mean to define a base of 100, Flynn has calculated average Dutch Raven IQs for subsequent years. The mean in 1982 was 121.10-a gain of 21 points in only 30 years, or about seven points per decade. Data from a dozen other countries show similar trends, which seem to be continuing into the 1990s.

So a Flynn effect of 21 points in 30 years translates to a rate that would take 3*30/21 = 4.3 years to generate a 3-point difference. This is probably not relevant to the Norwegian study's findings, since their data comes from tests of army recruits over a ten-year period, from 1967 through 1976; and I presume that there was no relationship between a young man's birth order and the date of his military service (though this is not explicitly discussed).

Still, it ought to give us pause that the birth-order difference was roughly the same as the effect of being born four years later.

More to the point, though, the media are (as usual!) treating these results as if a difference in group averages told us something about each individual member of a group. If you're born first, and your sibling is born second, then your IQ will be about 3 points higher than your sib's IQ, right?

Guess again.

The three-point difference came out of a statistical model of hundreds of thousands of Norwegian military recruits. If the model is correct, then a group of first-borns will have a 3-point IQ advantage over a group of later-borns -- on average. But IQ measurements are normed so that the standard deviation is 15. Let's ask R to generate 10 random "eldest child" values from a normal distribution with mean of 103 and standard deviation of 15:

> round(rnorm(10, mean=103, sd=15), digits=1)
[1] 117.7 121.0 108.4 114.4 77.3 103.3 120.1 83.9 92.5 81.5

And similarly 10 "younger child" values with a mean of 100:

> round(rnorm(10, mean=100, sd=15), digits=1)
[1] 105.2 84.5 96.7 91.7 82.1 90.1 100.3 100.5 104.9 105.7

Well, the older kid "won" the IQ bake-off 6 times out of ten. Is this the way it'll always come out? Not exactly. Let's generate fake IQ data for 1,000 imaginary first-borns and 1,000 imaginary non-first-borns:

firsts <- rnorm(1000, mean=103, sd=15)
others <- rnorm(1000, mean=100, sd=15)

In R, the comparison firsts>others gives us a vector of 1,000 values that is TRUE in those cases where an element of firsts is greater than the corresponding element of others, and FALSE otherwise. We can add up the TRUE values by using the expression sum(firsts>others), and we can turn this into a proportion by dividing it by 1,000:

sum(firsts>others)/1000

The result? Well, I just ran it and got 0.536 -- in other words, the first-born tested with a higher IQ 53.6% of the time. Is that the "true" value, the value we'd get every time? No -- if we run it 10 times,

X <- matrix(nrow=1, ncol=10)
for(n in 1:10){
  firsts <- rnorm(1000, mean=103, sd=15)
  others <- rnorm(1000, mean=100, sd=15)
  X[n] <- sum(firsts>others)/1000
}

we get something like

0.563 0.569 0.554 0.582 0.555 0.569 0.528 0.556 0.538 0.560

This is giving us a plausible range of values, but it suggests that with a sample of 1,000 we can only tell that the answer is somewhere around 55%, plus or minus a few percent. So let's try 10 samples of 10,000:

0.5608 0.5544 0.5525 0.5555 0.5615 0.5534 0.5607 0.5651 0.5463 0.5516

and 10 samples of 100,000:

0.55422 0.55631 0.55587 0.55582 0.55594 0.55717 0.55939 0.55499 0.55892 0.55662

So if the study's estimate of the first-born's IQ advantage is correct, and all the other assumptions are correct too, it means that the first-born will have a higher IQ about 55.6% of the time. And therefore will lose the IQ contest about 44.4% of the time.

Another version of the same problem comes up when we evaluate the statement about "the difference between a high B and a low A". It's true that if there's a precise quantitative cut-off -- say 90% -- between one letter grade and another, then an arbitrarily small difference -- say 89.99 vs. 90.01 -- will make the difference between B+ and A-. This can be the basis of seemingly-endless discussions at grading time in undergraduate courses these days. And from my experience of such discussions, I can say that the factor of "being persistent in trying to persuade faculty to adjust borderline grades" is, alas, worth a lot more than 3 IQ points determining grade point average.

What about Frank Sulloway's discussion of admissions to his hypothetical two colleges,"a more prestigious institution for students with IQs above the mean, and a less desirable institution for all other students"? Well, let's try a Monte Carlo simulation in R, using Frank's value of 2.3 for the IQ difference between Norwegian first-borns and second-borns. One round might go like this:

firsts <- rnorm(100000, mean=102.3, sd=15)
seconds<- rnorm(100000, mean=100, sd=15)
m <- mean(c(firsts,seconds))
sum(firsts>m)
sum(seconds>m)

In this imaginary world, about 53.1% of the first-borns would get into the hypothetical "more prestigious institution for students with IQs above the mean", whereas only 46.9% of the second-borns would.

Now, Frank's hypothetical world is very different from the actual one. Even in countries where university admissions is determined strictly by test scores, it's not IQ tests that are used; and the power of IQ to predict scores on the actual admissions tests -- which are intended to test achievement at least as much as aptitude -- is probably modest. If someone knows the correlation of IQ scores with university admissions scores in (say) Japan, tell me and we can add to our imaginary world a model of the generation of admissions-test scores. No doubt we'd find that much of the (hypothetical) 53%-to-47% advantage would be washed out.

In the U.S., obviously, a much wider range of factors enters into admissions decisions. I can't prove that the resulting process removes even more of that hypothetical 53%-47% birth-order advantage, but I'd bet that it does. (If anyone knows anything about empirical relationships between IQ and U.S. college admissions, let me know that too.)

But even in Frank's highly-artificial world of hypothetically IQ-based college admissions, how does 53.1%-to-46.9% translate into his conclusion that

...an eldest child would be about 13% more likely than a secondborn to be admitted to the better institution (the relative risk ratio) ... and the odds of a firstborn being admitted would be 1.3 times as great.

Well, the "13% more likely" part is because 53.1/46.9 = 1.132. As for the part about "the odds ... would be 1.3 times as great", I believe that's because of the amazing magnifying power of the concept of "odds ratio":

In this case, that works out to (.531/.469)/(.469/.561) = 1.354.

So what might have thought to be a 3% shift -- from 50-50 to 53-47 -- can be spun into an effect of more than 30%.

There's nothing mathematically incorrect about this, but it strikes me as a very questionable rhetorical tactic in a publication for general readers.

Now, the Norwegian study was large and careful and serious, and it found an apparently genuine effect of birth order, and (most important) it was able to look at the effects of birth order and "social order" (what happens when an eldest child dies young) separately, so as to disentangle possible biological effects of first vs. subsequent preganancies. But the birth-order effect, though real, is pretty small, and a responsible science journalist would do more to help readers to understand what an effect of this size does and doesn't mean.

[And then there's the test/retest issue... but enough.]

Posted by Mark Liberman at 09:13 AM

Judge loses pants and suit

A few days ago I posted about the DC judge who was suing his dry cleaners for $54 million because they lost his pants. Now he is not only pantless, but he also lost his suit. The story has been reported far and wide, by the Sydney Morning Herald, by BBC radio and, of course, by the local Washington Post. Big, big news! The outcome of this case probably comes as no surprise to anyone, and the judge's interpretation of the dry cleaner's window slogan, "Satisfaction Guaranteed," upon which words the plaintiff based his case, seemed about right to me.

In her 23-page dismissal of DC Superior Court Judge Roy Pearson's lawsuit about the missing half of his business suit, Judge Judith Bartnoff said:

"A reasonable consumer would not interpret 'satisfaction guaranteed' to mean that a merchant is required to satisfy a consumer's unreasonable demands or to accede to demands that the merchant has reasonable grounds to dispute."

The word,"reasonable" rears its familiar head in lots of lawsuits (reasonable doubt, resonable minded, reasonable person). That which is "reasonable" has to be appropriate to the issue. So I suppose Judge Bartnoff meant here that a reasonable person would expect "satisfaction guaranteed" to indicate that customers should be satisfied with equitable offers of replacement value for lost items. In this case, the dry cleaners had offered Pearson $12,000 to settle the case out of court. Now he gets zero, nada, zip. Not only that, he also has to pay the legal costs charged by the dry cleaner's lawyers, common for plaintiffs in lawsuits that they lose.

It would have been far better for Pearson to have taken the cleaner's offer than to lose his shirt along with his pants.

[Language Log, on the other hand, offers even the most unreasonable reader a cheerful refund of the subscription price in case of less than full satisfaction. In fact, we often give double or even triple the subscription price if a reader is especially unreasonable, and higher multiples are available on request.]

Posted by Roger Shuy at 08:43 AM

June 25, 2007

Cheeseclones!

From an article in today's NY Times:

"From maguro to otoro, the Japanese seem to have almost as many words for tuna and its edible parts as the French have names for cheese."

They could so easily have brought out the snow vocabulary, but they knew not to! Progress! But they still wanted to do that vocabulary = interest thing, so they pulled out a new comparison class. At least they didn't try to come up with an estimate of how many cheeses the French have names for.

Update: Apparently, such an estimate has already been made, by no less an authority than Charles de Gaulle. Mark and about twenty others write in to note that CdG said:

"How can you govern a country which has 246 varieties of cheese?"
http://www.quotationspage.com/quote/79.html

No need to make up random numbers. I foresee a contagious meme... "Just as the French have 246 names for cheese, the X have N words for Y." Those exotic Gauls, and their peculiar customs -- fetishizing roosters, closing down the whole country for two weeks every August, existentialism, tongue wrestling, gangs of accordion players -- it can't fail to catch on.

Posted by Heidi Harley at 05:15 PM

Can't we all just get a bong?

nun hit The US Supreme Court has just decided that when an Alaska high school principal punished a kid for displaying, off school property, a 14 foot sign saying Bong hits for Jesus, that didn't contravene the kid's right to free speech. Here's something to listen to while you ponder the story:

It was one Ken Starr, yes, that Ken Starr, who lodged the appeal against the kid, and who has now found favor with his Supreme Court buddies. What caught my eye was the fact that Starr's appeal claims:

"'Bong' is a slang term for drug paraphernalia commonly used for smoking marijuana."

So is it slang? We tend to describe something as slang when it is a minority use, and hasn't entered the mainstream, especially, although this is a somewhat arbitrary line, if dictionaries don't list it, or explicitly list it as "colloquial". But on this basis, bong is not slang at all:

OED 2nd ed.

bong, n. Chiefly U.S.

A kind of water-pipe used for smoking marijuana.

1971 Marijuana Rev. Jan.-June 18 Many thanks to Scott Bennett..for the beautiful special bong he made for my pipe collection. 1975 High Times Dec. 11/1 One hit of this weed produces creeping nirvana when smoked in a bong. 1977 Rolling Stone 24 Mar. 81/2 (Advt.), Genuine bamboo bongs with removable bamboo bowls are wax lined and come in two sizes; the one-foot bong..and the two-foot bong. 1978 N.Y. Times 30 Mar. B2/2 Bongs, looking like pot-bellied vases.., give the most concentrated "drag" possible by channeling smoke and preventing its escape into the air. 1979 Christian Science Monitor (Eastern ed.) 21 Nov. B1/1 Bongs, roach clips, coke spoons are as familiar as blue jeans to kids in the US today.

AHD, 4th ed.

NOUN: A water pipe that consists of a bottle or a vertical tube partially filled with liquid and a smaller tube ending in a bowl, used often in smoking narcotic substances.
ETYMOLOGY: Thai baung.

I mean what would you call it? Water pipe just doesn't cut it. There is a subtle distinction between water pipe and bong, in that it is commonly understood that a water pipe need not be primarily dedicated to marijuana, but that a bong is. And that's what got Tommy Chong into trouble a couple of years ago. It's ok to sell a bong, but you'd better not call it a bong. Hence 9 months jail for Tommy Chong. Hence the link, above, to Tommy's freedom song.

By the way, I particularly liked Justice Stevens' (dissenting) comments on the Bong hits for Jesus case, with its deep insight into the nature of legal progress:

"Although this case began with a silly nonsensical banner, it ends with the court inventing out of whole cloth a special First Amendment rule permitting the censorship of any student speech that mentions drugs...."

Posted by David Beaver at 02:21 PM

Let's hear it for praise

We all like to get a bit of praise once in a while. Here at Language Log Plaza we get our share of pats on the back -- along with offers of helpful criticism, or course. The other day I received what I consider to be exactly the kind of encouragement I love to hear. A highly respected, young scholar told me that something I posted had encouraged him to take his research into an area of language that he had never thought about before. Hey folks, it doesn't get much better than this. Back in the days when I was still teaching I thought that's what teaching was all about, but now that I'm no longer in the classroom, I really hadn't expected to hear it any more.

This speech event got me thinking about the value of praise. Telling people how you appreciate them is relatively easy to do, but it isn't exactly a common, everyday event. Why? Maybe because we don't think about doing it that much. We're naturally self-absorbed and too often so engrossed in the competitive stance, trying to show the world that we are doing good work that is even better than that of our colleagues, that we forget how much of our good ideas and attitudes are derived from those who came before us. We're taught to be independent scholars, a good thing in itself, but the degree to which we are ever really independent can be pretty questionable.

For the past decade or so, ever since I retired from the classroom, I've thought a lot about the people that I need to tell how much I've learned from them, especially those who are now in the twilight of their lives. Not just my own professors, but also those whose books I've studied, whose lectures I've heard, whose behavior I've watched, whose styles I admire, and whose attidutes have impressed me. Not just people in my own field either.

So I've been writing letters to them, telling them just how much they've meant to me as I've stumbled through my own academic career and personal life. Doing this feels good and it feels even better when they write back and tell me what it means to be remembered in this simple way. Growing old and retiring from everyday contact with your field can be very frightening to the ego. Our elders are far too easy to ignore so, as an elder myself, I encourage you to take pen in hand and write letters (not emails) to those who helped mold your life, whether they did this knowingly or not (but especially when not, because then it's an even a more wonderful surprise), telling them how much you personally appreciate their contribution.

Posted by Roger Shuy at 11:40 AM

Two slow takes from the canopy

To brighten your morning, here are a couple of lovely language-connected threads from Cosma Shalizi.

First, an amusing Q & A about nature and nurture, "...In Different Voices". Here's the key segment:

Q: How would you react to the idea that a psychological trait, one intimately linked to the higher mental functions, is highly heritable?

A: With suspicion and unease, naturally.

Q: It's strongly correlated with educational achievement, class and race.

A: Worse and worse.

Q: Basically nothing that happens after early adolescence makes an impact on it; before that it's also correlated with diet.

A: Do you work at the Heritage Foundation? Such things cannot be.

Q: What if I told you the trait was accent?

A: I'm sorry?

Q (in a transparently fake California accent): When you, like, say words differently than other people? who speak, like, the same language? because that's how you, you know, learned to say them from people around you?

A: Do you have a point to make, or are you just yanking my chain?

Q: Would you agree that accent has all the characteristics I just described?

A: Higher cognitive functions — heritable — class and race — not plastic after adolesence — correlation with diet, hah! — I guess I must.

Q: But would you say that there is any genetic or even congenital component to accent?

A: Not really. Obviously, some congenital conditions, like deafness or defects of the vocal chords, make it hard to impossible to acquire any accent. And I can imagine, though I don't know of anything, that there might be very specific mutations which make it hard to hear a distinction between a given pair of sounds, or easier to learn a specific distinction. But, in general, no, there is no non-trivial genetic component to accent.

Q: Then why were you worried that I was about to start channeling Arthur Jensen?

A: Because those are the sorts of claims usually trotted out by people who want to claim that something is innate, un-plastic, and usually invidiously distributed; sometimes there is a "sadly" to the claims of group inferiority, and sometimes, I think, that "sadly" is even genuine.

(This leaves out sex among the variables with which accent is strongly correlated; but perhaps that would have blurred the joke. People are happy to interpret statistical correlations as as essential properties of racial or ethnic groups, or of males vs. females -- and thus in modern terms, as evidence for genetic effects of race or sex. But when you put both factors together, those paleolithic natural-kinds intuitions get kind of confused. Then again, Cosma has already brought in diet, which opens another can of epidemiological worms.)

The Q & A continues in "Those Voices Again", which includes a typically Cosmic thought experiment:

Suppose that our new alien overlords showed up tomorrow, and after demonstrating that resistance is futile, decide to institute a selective breeding program. They tie everyone's tubes just before puberty, and then (say) age 25 everyone is given a test in which they must prove certain theorems about non-Abelian Yang-Mills field theories; those who pass are allowed to breed, those who fail are permanently sterilized. If this persisted for, say, a thousand years, I am quite confident that a randomly selected human being from 3007 will be much more likely to be able to do this than a randomly selected member of the present population.

Slightly more seriously (by which I mean only that it's slightly less funny, since both are highly serious), there's a suggestion about the nature and origins of the Flynn effect.

Next, there's a post "So You Think You Have a Power Law — Well Isn't That Special?", describing a recent paper by Clauset, Shalizi, and Newman, "Power-law distributions in empirical data", arxiv:0706.1062. This follows (and I would guess, results from) a series of weblog ~~rants~~ posts on the same topic. The paper's abstract:

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for power-law data, based on maximum likelihood methods and the Kolmogorov-Smirnov statistic. We also show how to tell whether the data follow a power-law distribution at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twenty-four real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.

The first author, Aaron Clauset, previously posted about this paper under the title "Power laws and all that jazz", with a different abstract:

Three Power Laws for the Physicists, mathematics in thrall,
Four for the biologists, species and all,
Eighteen behavioral, our will carved in stone,
One for the Dark Lord on his dark throne.

In the Land of Science where Power Laws lie,
One Paper to rule them all, One Paper to find them,
One Paper to bring them all and in their moments bind them,
In the Land of Science, where Power Laws lie.

Here's a fun sample figure from the paper:

FIG. 8 The cumulative distribution functions P(x) and their maximum likelihood power-law fits, for the first twelve of our twenty-four empirical data sets. (a) The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville. (b) The degree distribution of proteins in the protein interaction network of the yeast S. cerevisiae. (c) The degree distribution of metabolites in the metabolic network of the bacterium E. coli. (d) The degree distribution of autonomous systems (groups of computers under single administrative control) on the Internet. (e) The number of calls received by US customers of the long-distance telephone carrier AT&T. (f) The intensity of wars from 1816–1980 measured as the number of battle deaths per 10 000 of the combined populations of the warring nations. (g) The severity of terrorist attacks worldwide from February 1968 to June 2006 measured as the number of deaths greater than zero. (h) The size in bytes of HTTP connections at a large research laboratory. (i) The number of species per genus in the class Mammalia during the late Quaternary period. (j) The frequency of sightings of bird species in the United States. (k) The number of customers affected by electrical blackouts in the United States. (l) Sales volume of bestselling books in the United States.

The bottom line:

For most of the data sets considered the power-law model is in fact a plausible one, meaning that the p-value for the best fit is large. Other distributions may be a better fit, but the power law is not ruled out, especially if it is backed by additional physical insights that indicate it to be the correct distribution. In just one case—the distribution of the frequencies of occurrence of words in English text—the power law appears to be truly convincing in the sense that it is an excellent fit to the data and none of the alternatives carries any weight.

For seven of the data sets, on the other hand, the p-value is sufficiently small that the power-law model can be firmly ruled out. In particular, the distributions for the HTTP connections, earthquakes, web links, fires, wealth, web hits, and the metabolic network cannot plausibly be considered to follow a power law; the probability of getting a fit as poor as that observed purely by chance is very small in each case and one would have to be unreasonably optimistic to see power-law behavior in any of these data sets. (For two data sets—the HTTP connections and wealth distribution—the power law, while not a good fit, is nonetheless better than the alternatives, implying that these data sets are not well-characterized by any of the functional forms considered here.)

[...]

Note however that the log-normal is not ruled out for any of our data sets, save the HTTP connections. In every case it is a plausible alternative and in a few it is strongly favored. In fact, we find that it is in general extremely difficult to tell the difference between a log-normal and true power-law behavior. Indeed over realistic ranges of x the two distributions are very closely equal, so it appears unlikely that any test would be able to tell them apart unless we have an extremely large data set.

And there's code (in Matlab and R)! From Cosma's weblog post again:

Because this is, of course, what everyone ought to do with a computational paper, we've put our code online, so you can check our calculations, or use these methods on your own data, without having to implement them from scratch. I trust that I will no longer have to referee papers where people use GnuPlot to draw lines on log-log graphs, as though that meant something, and that in five to ten years even science journalists and editors of Wired will begin to get the message.

I'm less hopeful than Cosma about that last clause -- I mean, this sort of thing involves innumeracy way below the level of recognizing that frequency distributions exist, much less distinguishing among their functional forms. But hope is good.

Posted by Mark Liberman at 06:01 AM

June 24, 2007

Nearly and almost

[This is a guest posting by Jerry Sadock, following up on Mark Liberman's two postings about "nearly no".]

These two items are almost synonymous, but not necessarily nearly synonymous. They may well be truth conditionally equivalent, "almost P" and "nearly P" both entailing "not P" (Atlas, Horn, but pace Sadock) and "close to P". Thus "Eric knows almost 500 languages" and "Eric knows nearly 500 languages" would both be false either if Eric knows 501 languages or if Eric knows no more than 400. But there is some kind of difference between them as revealed by the differential goodness of Almost no one was there and Nearly no one was there.

The difference would then have to be a difference in nuance, or connotation, or (more technically) conventional implicature. That difference, it seems to me, has to do with expectations: Nearly n connotes that n exceeds (hence is better than) what was expected or hoped for, while almost n does not conventionally connote any particular desire, hope or expectation, but easily supports a conversational implicature to the same effect as the conventional implicature associated with nearly. So, for example, if I say I have nearly $10 in my wallet I suggest that that's a lot and since it isn't, the sentence is strange in most contexts. Compare Molly has nearly $10 in her piggy bank, a lot for a three year old, perhaps. But if I say I have almost $10 in my wallet, while not a lot for a man of my means, it would be a fine thing to say if we each wanted a tall iced latte and you asked me how much I had on me. As another example, almost tolerable is much more natural than nearly tolerable since the latter suggests that being merely tolerable is better than we could have hoped for. In the right context, however, it's OK. I could describe the noonday temperature in Tucson in July as "nearly tolerable" if it was 101º since I expected it to be absolutely intolerable, 113º say, and something approaching tolerable is better than that.

If this is so there is a reason for the distinction in out-of-context acceptability between examples like Almost no one was there and Nearly no one was there. The first can be a hedged estimate, pure and simple, and hence it's OK. The second suggests that zero attendance exceeds our expectations, which in most contexts is odd. What did we expect, that a negative number of people would be there? But cobble up a context in which zero does exceed expectations, and it becomes OK. This would ordinarily involve scale reversal, because negative quantities exist only in the realm of mathematics. Let's say I've organized a boycott on Humvees. I hope and expect that my campaign will be successful and only a small number will be sold. My expectations would then be exceeded if none are sold and then Nearly no Humvees were sold last month seems fine in that context.

Some of the examples of nearly no on the web clearly show such reverse expectation or hope, like this one:

Nearly none of the known near-earth objects have any chance of hitting the earth, and there are just two or three such objects that researchers can't dismiss yet, said Marsden. (link)

One would hope that one of these babies won't slam into our planet and send us the way of the dinosaurs. But there are a lot of these puppies out there, so the best we could hope for is that only a few stand a chance of closing our chapter for good. None would be a big relief -- the most we hoped for, so to speak. The last clause in the quotation, there are just two or three such objects that researchers can't dismiss yet, is revealing. It shows that there's an active effort to get the number down, hence the scale reversal.

Posted by Arnold Zwicky at 02:02 PM

Names?

A moment of annoyance: I discovered yesterday that Leonard R. N. Ashley's What's in a Name? Everything You Wanted to Know (revised ed., 1996) lacks an index. In particular, there's no index of names.

I can understand why there's no index to the names cited in the book; there are zillions of them. But there's also no index of the names of scholars mentioned in the book, or of topics covered, and those wouldn't have been difficult to assemble and would be useful to readers. Even the bibliographic references are sprinkled throughout the book. It's almost impossible to find anything without leafing through the whole volume.

[Addendum 6/26: Rey Aman tells me that Ashley complained to him long ago "that the publisher was too cheap to spend money on compiling and adding an index." I suspected as much.]

The book has a gee-whiz, falling in love with weird and wonderful words, tone to it that doesn't wear well. (And I'm a member of the American Name Society.) I'd recommend against trying to read it through. And I'd totally disrecommend it to Geoff Pullum, who wanders about Language Log Plaza muttering imprecations against word puzzles and the like.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:19 PM

Roeper on recursion

[This is a guest post by Tom Roeper, commenting on my earlier post "The enveloping Pirahã brouhaha" (6/11/2007).]

Mark Liberman’s claim that the Chicago Tribune piece was misleading was quite right---it was even misleading about the content and tenor of the conference. I was given one of the most minimal quotes one can imagine (one word, repeated (no recursion!)): “no, no” with no explanation.) Here’s what got left out.

I pointed out, and Terry Langendoen from NSF immediately agreed, that there were different kinds of recursion with important consequences.

First ---which I asked Dan about---is the fundamental and universal form (according to Chomsky) of assymetric Merge, which means you cannot put three words together without recursively merging them and letting one dominate. That’s what happens in compounds like:

bird migration computer manual trouble-shooting guide

which we can understand (perhaps in multiple ways) by this method. If this is correct (hard to see how it could be wrong), there is no more discussion. Every sentence involves recursion.

Perhaps the most interesting fact is that children grasp this asymmetry from the start. They say things like “Mommy shoe” but not *“Mommy Daddy”. Why? Because in “mommy shoe” there is a modification relation, “shoe” is the Head and “mommy” the modifier, but not where coordination is implied (Mommy is not a modifier of Daddy nor Daddy a variety of Mommy).

Second are language particular forms like recursive possessives (John’s friend’s mother’s hat).

In my new book The Prism of Grammar : How Child Language Illuminates Humanism (MIT Press---now out: publication date: May 22, 2007) there is an extensive chapter on merge and another on recursion with evidence that children have difficulty with language particular dimensions. Here’s an example. Only English, but not German, Swedish, or Dutch (which allow a single pronominal possessive), allows recursive possessives. They are not easy for children. Witness Sarah (Childes files):

MOTHER: What's Daddy's Daddy's name?
SARAH: uh.
MOTHER: What's Daddy's Daddy's name?
SARAH: uh.
MOTHER: What is it? What'd I tell you? Arthur!
SARAH: Arthur! Dat my cousin.
MOTHER: Oh no, not your cousin Arthur. Grampy's name is Arthur. Daddy's Daddy's name is Arthur.
SARAH: (very deliberately) No, dat my cousin.
MOTHER: oh. What's your cousin's Mumma's name? What's Arthur's Mumma's name?
SARAH: uh. oh.
MOTHER: Thinking?
[Sarah nods]
MOTHER: And what's Bam+Bam's daddy's name?
SARAH: Uh, Bam+Bam!
MOTHER: No, what's Bam+Bam's daddy's name?
SARAH: Fred!
MOTHER: No, Barney.
SARAH: Barney.
MOTHER: What's his mumma's name?
SARAH: She's right here.
[ points to figure on Sarah's pajamas which have TV characters on them]

Try this out on your four-year-old and you will almost certainly have the same experience. (Some methods are suggested in the Prism book.)

Thirdly there is sentential recursion—what all the hot discussion is about. And this seems to be connected to how complicated our thoughts get. As Aravind Joshi in one paper and Bart Hollebrandse and I in another pointed out at the conference, it is in principle possible to get the same effect in discourse, but it seems to be prohibitively difficult. This is the subject of several papers and acquisition experiments (by Bart Hollebrandse, Kate Hobbs, Jill deVilliers and me---to be presented in at the GALA acquisition conference in Barcelona in September). Sequences like

John thought the earth was flat.
Fred believed it
But Bill could not believe that.

[=Bill could not believe that Fred believed that the earth was flat.]

However, note the inexactitude. The last that is much more ambiguous than the sentential counterpart: it could mean [that the earth is flat] or [that Fred believed the earth is flat] or [that Fred believed John thought the earth is flat]. (See also Liberman’s discussion of parataxis in Language Log ("Parataxis in Pirahã", 5/19/2006)] Discourse connections get too much, allowing any sort of inference---that’s the problem. If one wants to do further deductive reasoning, then only the sentential version is ideal, but the obscure discourse form is rich enough for complex human interactions to be understood. Hollebrandse and I (see forthcoming conference paper) have argued that recursion produces exclusive complementation under embedding.

Bart Hollerandse is in fact preparing materials to actually carry experiments like those used in acquisition with Dan Everett in Brazil with the Piraha together in hopes that one can find out exactly what linguistic mechanisms they use to approach complexity of this kind.

What real exclusive readings make more efficient can be found every day in the New York Times, where sequences like this can occur:

Gore believed that Bush’s belief that Gore had exaggerated his claim that he authored the internet forced Gore to defend himself against the claim that he was sleazy instead of his forcing Bush to defend himself against the claim that he had not acknowledged that has fund-raising was illegal.

The style is lousy but it gets across a complex set of dependencies that inferences across a discourse probably could not keep straight. [Clarification by Liberman: a search of the NYT archive suggests that this sentence is a hypothetical quotation, invented by Tom. For some real-world examples, see here, here, here, etc., though in fairness, one does not see these every day.]

Recursion is useful----which is not to say that the essence of our humanity depends upon it. Just like zeroes in math are very useful—much better than Roman numerals for multiplying-. Still the Romans managed to multiply anyway. Their society was still complex without having zeroes.

[Above is a guest post by Tom Roeper.]

[Update -- here is a comment by Dan Everett:

Tom's response on the recursion conference not only accurately conveys my own sense of the way things transpired at the conference, it also shows why his own contributions to discussions of recursion are extremely important. The paper by Bart and Tom at the conference really helped me to think more effectively about the issues.
Part of the cultural influence on Piraha's absence of syntactic recursion is the fact that the Pirahas purposely circumscribe the things they talk about, so that they have a much narrower range of discourse topics (again, by choice). As I understand at least part of what Tom Roeper and Bart Hollebrandse said in their contribution, recursion is a tool that does not define language per se but turns out to be extremely important in organizing complex utterances, a complexity which is at least in part a result of the complexity of the things most societies want to talk about.
At the same time, again as Tom accurately points out, Aravind Joshi and they agreed that it is in principle possible to get the same effects via discourse. This is interesting and, I believe, quite important because it is at the core of my own claim that Pirahas do have recursion in their discourse, but not in their syntax. What would be the source of this recursion? The brain. Humans think recursively. I think that this is a characteristic (though more study is needed) of our species. But as Ken Hale and I, and no doubt others if we look through extant grammars more carefully, have shown is that syntactic embedding is certainly not a universal characteristic of human languages.
Once again, when societies limit their discourse - not through any cognitive shortcomings (!) - because of their isolation or their cultural values (explicit or implicit), then the need for recursion, by my hypothesis, by Roeper and Hollebrandse's hypothesis, and by Joshi's work, is reduced, if not eliminated. Recursion is thus a tool for organizing our thoughts and our speech. It seems always necessary in the former, but not always necessary as a tool for speech. Once again, evidence for this in Piraha is recursion in discourse (without formal marking).
This issues and a large selection of the papers from the recursion conference at ISU will be the subject of a special issue of The Linguistic Review dedicated to recursion which I am guest-editing.
I hope very much that in the near future someone will be able to run Bart's experiments among the Pirahas.

Echoing what Tom said, it seems to me that we should start trying to use terminology more carefully in these discussions. You could have limited embedding without recursion; either recursion or embedding might involve clauses, or alternatively (for example) the various sorts of non-clausal structures involved in complex nominals; right- or left-branching structures are different from center-embedded structures; etc. Thus it would be clearer if Dan wrote "clausal syntactic embedding is not a universal characteristic", etc. Various other sorts of embedding, e.g. of a modified noun as a dependent of a verb, clearly are universal. Whether these embeddings are universally recursive is another question.]

Posted by Mark Liberman at 09:58 AM

June 23, 2007

Word counts

From the 6/25/07 New Yorker, p. 48:

NO COMMENT DEPARTMENT

From the San Francisco Chronicle.

With California Invasive Weeds Awareness Week just around the corner (July 17-23), there are two words every Californian should know: yellow star thistle.

Yes, I know, how silly of the Chron (or its source on invasive weeds): yellow star thistle is obviously three words. Or is it?

Counting "the number of words" in an expression is a tricky business. The New Yorker staff is acting like the word counting software that comes with your word processor: basically, it counts things separated by spaces. That means the algorithm is sensitive to the arbitrariness of English orthography.

English noun-noun compounds, including those whose meanings are in part conventionalized, are written in three ways: solid (doghouse), hyphenated (dog-ear), separated (dog tag). There are some generalizations about which spelling is used for which compounds, but there's a good bit of arbitrariness, and also significant variation. In any case, as far as the system of English goes, for conventionalized compounds the three types are entirely parallel, and a dictionary of reasonable size will have entries for all three. We're looking at "a word" in each case, regardless of how they're written -- granted, a word that has words as its parts, but still in some sense a word.

Dictionaries, AHD4 for instance, do have entries for star thistle (and star anise and star apple and star fruit). And my Peterson Field Guide to Pacific States Wildflowers (Niehaus & Ripper 1976) has the yellow star thistle (Centaurea solstitialis) listed in its index under "star thistle, yellow" (also under "thistle, yellow star", using the head noun thistle of the compound star thistle).

So you could argue that yellow star thistle is in fact a two-word expression: yellow plus the compound noun star thistle.

[Yes, solstitialis, suitable for this season of the year. And the pernicious yellow star thistles are in fact blooming on the hillsides.]

[Addendum 6/26: Mae Sander has written with a plausible proposal about how the Chron ended up with "yellow star thistle": the piece originally had "yellow starthistle" -- this spelling can be found in many publications, for instance the University of California Cooperative Extension fact sheet on the plant -- but a proofreader "fixed" the spelling by separating the two parts of "starthistle" (I myself dislike this spelling, because I'm inclined to (mis)read it as "start-histle"). Now, this requires a proofreader who isn't really reading the text for content, but there are such people -- people who would change "an item of data is" to "an item of data are" because, sigh, they change ALL instances of "data is" to "data are".]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:29 PM

What we need to know about international travel

Today's snail post brought a colorful message from our good friends at DHL. Personally, I use Fed Ex and I suspect that DHL just may be trying to lure my business to them. The message informed me that I can get a free, 400 page book called 1,000 Smart Travel Tips. All I have to do is to fill out and return a short survey.

Since 5 of the 1,000 travel tips were included in this mailing (as teasers, I suppose), I'll share them with you, just in case you happen to be traveling in the UK, China, Mexico, Canada, or Germany in the near future.

Tip #22 In Canada, it's considered bad form to talk with your hands in your pockets.

Tip # 121 In Mexico, it's common practice to use the "psst-psst" sound to catch someone's attention.

Tip # 211 In the UK, to signal that something is confidential, tap your nose.

Tip # 278 In China, point with an open hand, not an index finger.

Tip # 341 In Germany, flowers are given in odd numbers (even numbers are considered bad luck).

I don't know about the other 995 travel tips in this book, but based on these teasers, it doesn't look like speaking the language plays much of a role. But at least I know not to order a dozen roses in Berlin, or talk with my hands in my pockets in Vancouver, or try to get a Guadalajaran waiter's attention without saying "psst-psst," or fail to tap my nose when I pass along gossip in Manchester. And I'll never, never point at the Great Wall of China with my index finger.

All that remains unclear is whether the 400-page book says anything about speaking the local languages. All we can do is hope.

[Update] Andrew Clegg writes about Tip #278: "Someone should tell Optimus Prime." see here

Posted by Roger Shuy at 08:10 PM

Additive denotation boogie

That was the subject line of the most poetic spam email to have made it through the filters into my inbox recently.

The body of the message was more uneven in quality:

bock borealis bmw apace. cesium coolheaded alcoa christoffel bayda bagging. adject champ baseboard cooperate calhoun damn counsel detector consignee. brainchild clapboard bruise candlestick desicate baud calcutta.

After a metrically inspired start ("bock borealis bmw apace. cesium coolheaded alcoa christoffel"), there's an awkward stretch of trochaic banality ("bayda bagging. adject champ"). But the end is worthy of the beginning: "candlestick desicate baud calcutta."

Posted by Mark Liberman at 10:46 AM

Arabic proficiency levels

A few days ago, Spencer Ackerman quoted a Q&A ("U.S. Embassy-Baghdad: Y Kant State Kommunikate", 6/19/2007):

There are about 200 Foreign Service Officers in the U.S. Embassy in Baghdad. How many of them do you figure are fluent in Arabic? The question was posed in today's State Department press briefing, and here's the answer:

Question: How may Arabic speakers with 3/3 levels of proficiency are currently serving at Embassy Baghdad?
Answer: We currently have ten Foreign Service Officers (including the Ambassador) at Embassy Baghdad at or above the 3 reading / 3 speaking level in Arabic. An additional five personnel at Embassy Baghdad have tested at or above the 3 level in speaking. A 3/3 indicates a general professional fluency level.

The editors of Foreign Policy commented on their Passport blog ("The State Department's Arabic problem is worse than you think", 6/21/2007):

This is actually more alarming than it sounds. No wonder U.S. Ambassador Ryan Crocker was raising hell.

A 3/3 level of proficiency is virtually useless for conducting serious business in Arabic. The use of the word "fluency" here is deeply misleading: Someone with a 3/3 would not be able, for instance, to do simultaneous translation of a meeting, and would struggle to translate complicated documents. Anything technical, legal, or politically sensitive would not be something you'd want a 3/3 to handle. For that, you'd need someone closer to a 5 or better yet, a native speaker with a large vocabulary and superior writing skills in two languages. Such people are rare, because the amount of investment and time it takes to reach such rarified heights is more lucratively deployed elsewhere.

[Sally Thomason posted some less negative information back in February ("Another view of Americans & Arabic in the Gulf", 2/19/2007), and I've gotten some additional feedback from a couple of readers, which I'll post separately. And you may also enjoy the jokes in a post from 2004, "Iraqi chicken", though I sincerely hope that they are now out of date. Meanwhile, I believe that the information below on the meaning of proficiency scales, and the problem of Arabic languages and registers, remains relevant, whatever the facts and interpretations about proficiency in the Baghdad embassy.]

Those numbered levels refer to the ILR ("Interagency Language Roundtable") Language Proficiency Skill Levels, which rate proficiency on a scale of 0 to 5 for each of five skills, namely speaking, listening, reading, writing, and translation. The five levels are described as 0 = "no proficiency", 1 = "elementary proficiency", 2 = "limited working proficiency", 3 = "general professional proficiency", 4 = "advanced professional proficiency", and 5 = "functionally native proficiency".

As you can imagine, there are elaborate testing materials and procedures designed to evaluate these skills in a reliable way.

In the range of skills under discussion, Reading 3 is described in detail as:

Able to read within a normal range of speed and with almost complete comprehension a variety of authentic prose material on unfamiliar subjects. Reading ability is not dependent on subject matter knowledge, although it is not expected that the individual can comprehend thoroughly subject matter which is highly dependent on cultural knowledge or which is outside his/her general experience and not accompanied by explanation. Text-types include news stories similar to wire service reports or international news items in major periodicals, routine correspondence, general reports, and technical material in his/her professional field; all of these may include hypothesis, argumentation and supported opinions. Misreading rare. Almost always able to interpret material correctly, relate ideas and "read between the lines," (that is, understand the writers' implicit intents in text of the above types). Can get the gist of more sophisticated texts, but may be unable to detect or understand subtlety and nuance. Rarely has to pause over or reread general vocabulary. However, may experience some difficulty with unusually complex structure and low frequency idioms.

Reading 4 is:

Able to read fluently and accurately all styles and forms of the language pertinent to professional needs. The individual's experience with the written language is extensive enough that he/she is able to relate inferences in the text to real-world knowledge and understand almost all sociolinguistic and cultural references. Able to "read beyond the lines" (that is, to understand the full ramifications of texts as they are situated in the wider cultural, political, or social environment). Able to read and understand the intent of writers' use of nuance and subtlety. The individual can discern relationships among sophisticated written materials in the context of broad experience. Can follow unpredictable turns of thought readily in, for example, editorial, conjectural, and literary texts in any subject matter area directed to the general reader. Can read essentially all materials in his/her special field, including official and professional documents and correspondence. Recognizes all professionally relevant vocabulary known to the educated non-professional native, although may have some difficulty with slang. Can read reasonably legible handwriting without difficulty. Accuracy is often nearly that of a well-educated native reader.

Speaking 3 is:

Able to speak the language with sufficient structural accuracy and vocabulary to participate effectively in most formal and informal conversations in practical, social and professional topics. Nevertheless, the individual's limitations generally restrict the professional contexts of language use to matters of shared knowledge and/or international convention. Discourse is cohesive. The individual uses the language acceptably, but with some noticeable imperfections; yet, errors virtually never interfere with understanding and rarely disturb the native speaker. The individual can effectively combine structure and vocabulary to convey his/her meaning accurately. The individual speaks readily and fills pauses suitably. In face-to-face conversation with natives speaking the standard dialect at a normal rate of speech, comprehension is quite complete. Although cultural references, proverbs and the implications of nuances and idiom may not be fully understood, the individual can easily repair the conversation. Pronunciation may be obviously foreign. Individual sounds are accurate: but stress, intonation and pitch control may be faulty.

Examples: Can typically discuss particular interests and special fields of competence with reasonable ease. Can use the language as part of normal professional duties such as answering objections, clarifying points, justifying decisions, understanding the essence of challenges, stating and defending policy, conducting meetings, delivering briefings, or other extended and elaborate informative monologues. Can reliably elicit information and informed opinion from native speakers. Structural inaccuracy is rarely the major cause of misunderstanding. Use of structural devices is flexible and elaborate. Without searching for words or phrases, the individual uses the language clearly and relatively naturally to elaborate concepts freely and make ideas easily understandable to native speakers. Errors occur in low-frequency and highly complex structures.

It seems a little exaggerated to describe "3 reading/3 speaking" as "virtually useless" -- I'm sure things would be better if more than 5% of U.S. foreign service personnel in Iraq had that level of proficiency in Modern Standard Arabic -- but the Passport blog goes on to point out a much more serious problem:

What's more, I would assume that the proficiency scale refers to Modern Standard Arabic (MSA), which is what most students of Arabic learn and is the language used in most newspapers and for Al Jazeera's broadcasts. The dialect spoken by Iraqis is very different from MSA and from other Arabic dialects.

A rough (but fairly accurate) anology would be to see MSA in the role of Latin in 17th-century Europe as the language of formal discourse. As I understand it, the modern Arabic "colloquials" are as different from MSA and from one another as Italian is from Latin or from Spanish. The fact that American diplomats (and soldiers) aren't taught Iraqi (or Egyptian, or Moroccan, or whatever) is a bigger problem than whether they reach proficiency 3 or proficiency 4 in MSA.

For a discussion of these issues from the point of view of contemporary speakers of different varieties of Arabic, see Mohamed Maamouri, "Language Education and Human Development: Arabic Diglossia and its Impact on the Quality of Education in the Arab Region" (1998). On pages 32-42, there's an enlightening discussion of the "linguistic nature of Arabic diglossia".

The "Arab schoolchildren profiles" that Mohamed gives on pp. 27-28 provide a helpfully concrete picture of the situation. Here's one (note that Fusha is an Arabic term for the literary language, including MSA):

Hela is a sixth-grade primary school student living in Tunis. She spends her summers in Nabeul with her grandmother. Her two best friends there are Hiba and Meriem. Hiba lives in Nabeul all year round and is the same age as Hela. Meriem is a year older and lives in La Marsa during the school year. Hela goes to a private school where she started French and Arabic at the same time. She has more than 20 hours of classes in Arabic and about 10 hours in French a week. All the subjects other than French, such as Math and Biology, are taught in Fusha. Sometimes the teacher explains things in Arbi, but the students often have to speak in Fusha. Hela does not like Fusha as much as Arbi, it feels too alien to her. She even likes French better than Fusha. Meriem’s classes are a lot like Hela’s. She prefers French and often uses French words when she’s speaking Arbi. She thinks it makes her sound cool, like an adult. Hiba, on the other hand, didn’t start French until the third grade. Even though she now has the same number of hours of each language as Hela does, she prefers Arabic (both fusha and Arbi) to French and reads more Arabic books.

The three girls play together and watch television. Their favorite shows are Saoussen, which is in Fusha, and Les Schtroumfs, which is in French. Sometimes, when they play, they pretend to be the cartoon characters and try to sound like them. Hiba likes playing Saoussen best, because she doesn’t play well when they speak in French. Meriem prefers Les Schtroumfs because her Fusha is poor. They usually just speak Arbi together. After the summers over, Hela and Meriem go back to their homes. They decide to write each other letters over the school year. After the first day of school, Hela runs home to write letters to her friends. She starts to write a letter to Hiba, in Fusha, but feels that this is not a friendly letter. It feels more like homework. She thinks in Arbi, but cannot write what she means, and has to translate. Frustrated, she decides to write to Meriem first. She quickly realizes that her best bet is to write in French, but still struggles with finding the right words to say what she means. Finally, she settles on using Arbi words that she approximates phonetically and finishes one letter. For Hibas letter, though, its harder for her to do this with Fusha, so she just writes a very short letter and writes some words in French. These solutions work, but leave her feeling unsatisfied. She feels closer to Meriem because she can communicate with her better. She rapidly loses interest in writing to Hiba, though.

Hela's cousin, Farah, grew up in Saudi Arabia. She is the same age as Hela and is in the fourth grade. Farah only speaks Saudi Arabic, Fusha, and English, which she studies at school. She feels that Fusha is strange and silly. Nobody really speaks it there either. When Farah and Hela get together, they can only speak a mixture of their dialect with Fusha. It is very strange for both of them. They hardly ever write each other letters, because they’d have to do it in Fusha, which neither feels comfortable with. Farah feels resentment towards Fusha and reads even less. She doesn’t like music in Arabic as much as English or French music and only reads in Arabic if it is mandatory. Her French continues to improve and her Fusha remains poor. This does not bother her though, because she knows that once she gets to secondary school, Fusha would be much less important and if she wants to be a doctor when she grows up, she will only need French.

And here's an anecdote from p. 33:

Parkinson relates the story a friend who was a passionate supporter of fusha and who decided to stick to it exclusively in his family in order to give his children the full advantage of having it as a native language. Getting on a busy Cairo bus with this friend and his three-year-old daughter, the two of them, father and daughter, were separated and the yelling that was necessary to reestablish the contact took place in fusha making the entire bus burst out in laughter.

On the other hand (p. 38):

The superiority that Arabs bestow on their heritage language leads to a quasi-general denial of the existence of a home language, in this case colloquial Arabic. Arabs consider in fact that what is spoken at home, and elsewhere in common daily activities, is merely incorrect language which is only acceptable because it deals with lowly functions and topics. There is a prevailing feeling among Arabs that their language is imbued with a natural superiority. This ‘prestige valuation’ of fusha is explained by Arabs as relating to such qualities as beauty, logic, and a high degree of expressiveness. Fusha carries in its own etymology the myth about its eloquence and high degree of correctness. Moreover, Arabs despise the spoken colloquial forms and even deny that they use them because they consider the colloquials they speak as ‘degraded’ and corrupt forms of the language. They give them derogatory names such as barbri “barbarian” or yitkallam bi-l-fallaaqi “he speaks the language of woodloggers.”

This situation makes the task of foreign learners more difficult, since they need to learn to deal appropriately with a very broad range of mixtures of "high" and "low" languages. This is true to some extent in any language, but the range of diglossia in "Arabic" appears to be significantly greater than in most other modern situations. You need to imagine a situation in which "Latin" is used to refer not only to classical and patristic Latin, but also to the spoken versions French, Italian, Spanish and Portuguese (with none of them having any standard written form).

The sociolinguistic situation in Iraq is different in many ways from the situation in North Africa, but it doubtless remains true that Fusha "feels alien" to most Iraqis, even if they are able to understand it and to speak it to some extent. At the same time, formal settings require formally correct language.

[Update -- Joel Thibault writes:

It may interest you to know that Les Shtroumpfs is well known to English speakers as The Smurfs.

]

[Update #2 -- Lameen Souag writes:

Great post. A minor point though, relating to Mohamed Maamouri's article: "barbri" (bəṛbṛiyya, it would be in Algeria) is not so much "barbarian" as to "Berber" - as far as I know, it's only used to refer to North African Arabic (although it's nearly obsolete in that sense in my area), and hence to dialects of Arabic adopted by what were ethnically largely Berber populations. In Moroccan dialects (I think), and commonly in Fusha (eg in ibn Khaldun), it means "Berber". It still has a derogatory origin, of course - the ethnonym "Berber" does probably derive from "barbarian" - but this particular usage probably derives from the ethnic sense, rather than directly from the derogatory one.

]

[Update #3 -- Nathan Wagner writes:

While I don't know what the current situation is, I attended DLI (the Defense Language Institute, which is where the military trains its foreign language speakers) in 1988-1990 where I took the basic Arabic course. In contrast to your claim, everyone took a 47 week basic course in MSA, followed by a 16 week dialect course. As I recall, the dialects then were Syrian and Egyptian (though I was assigned Egyptian and that is the only one I am certain about). While I hope that the dialects have changed since then, I would imagine that the basic curriculum is more or less the same. A brief look at their website isn't very informative, but they do note a 63 week basic course in Arabic, which would correspond to what I took. Interestingly however, there is no actual mention of specific dialect training.

I also don't know what the current situation is. However, as of a few years years ago, U.S. military personnel being trained at DLI for duty in Iraq apparently did not learn any colloquial Iraqi, and as far as I know, no significant amount of any other colloquial Arabic either. As I wrote in "Iraqi chicken" (7/15/2004),

Among the interesting points that yesterday's speaker (a captain in the Army reserves) made:
Her DLI (Defense Language Institute) training in Arabic was less useful than she would have wished, mainly because it was in MSA (Modern Standard Arabic), whose relationship to Iraqi Arabic is roughly like the relationship between Latin and Italian. As a result, as she put it, "they could understand me but I couldn't understand them". I've heard that DLI used to teach a number of modern Arabic languages (often called "colloquials" or "dialects"), but stopped some time ago because the military's personnel system couldn't deal with the distinctions. As far as the personnel system was concerned, Arabic is Arabic; but sending someone trained in Moroccan Arabic to (say) Kuwait is like sending a Portuguese speaker to Romania.So DLI decided to stick with MSA, which is the language of formal discourse throughout the Arab world, though it's no one's native language. I don't know whether this is the reason, but it's certainly true right now that DLI teaches MSA rather than the local languages.

In general, I believe that DLI does an excellent job, and the people who run the place and determine the curriculum are certainly aware of all these issues, so some changes have probably been made over the last few years. Without any specific knowledge of what's happened, though, I would guess that providing good course materials and recruiting enough teachers fluent in Iraqi (since few of their existing Arabic instructors were from Iraq) would both have been challenging problems. ]

[We should also note that there are several different varieties of colloquial Arabic spoken within Iraq -- Ethnologue distinguishes Mesopotamian Spoken Arabic (estimate 11.5 million speakers in Iraq, 15.1M in all countries), North Mesopotamian Spoken Arabic (estimate 5.4M speakers in Iraq, 6.3M in all countries), Najdi Spoken Arabic (estimate 900K speakers in Iraq), and Gulf Spoken Arabic (estimate 40K speakers in Iraq). That doesn't count the 3.3M speakers of various kinds of Kurdish, and smaller numbers of speakers of more than a dozen other languages.]

Posted by Mark Liberman at 10:31 AM

BBC approves "shite" and "gobshite" (in moderation)

In case you were wondering, it's apparently okay to call someone a shite, a gobshite, or even a bogshite on the airwaves of Northern Ireland. The BBC Trust received a complaint from a listener of Gerry Anderson's morning show on Radio Ulster after Anderson used all three epithets to refer to his broadcasting colleagues. As The Times reports, the BBC Trust rejected the complaint, though Anderson must now diminish his use of these words:

Asked to rule on the complaint, the BBC Trust editorial standards committee found that the words carried little more offence than “eejit” in Irish slang.
It ruled: “The meaning conveyed by the words ‘shite’ and ‘gobshite’ in the vernacular of Northern Ireland, and in the context of this programme in particular, was different from other parts of the UK in that they did necessarily not carry the same level of offence and aggression and could be seen as a form of comedic banter.” The language was also “appropriate for children listening during school holidays”.
However, Anderson must submit to a quota of colourful language and Radio Ulster must “mitigate the overuse of the words”.
The trust said that the station had “set in place a system that ensured the programme did not use these words in a way that went beyond the audience’s expectation”.

The BBC Trust is careful to say that these words are not very offensive in Northern Ireland, as opposed to "other parts of the UK." I wonder where those "other parts" might be. Shite, in addition to its Irish and Northern Irish usage, is still commonly heard in Scotland (appearing more than a dozen times in the script for Trainspotting) as well as northern England. In southern England, according to commenters on the always enlightening Separated by a Common Language blog, shite is considered a jocular alternative to shit, perhaps due to its association with Irish/Scottish/Northern usage. Shite and gobshite can't be all that bad, since they appear unasterisked in BBC America's British American Dictionary (previously discussed here), as opposed to sh*t (or c*nt or f*ck). And neither appears in a recent study commissioned by the BBC on rude words in British English. I can't speculate about bogshite, as it appears to be a relatively recent innovation (it doesn't appear in the authoritative slang dictionaries like Cassell's and only gets about 140 Googlehits). But it looks to be a playful metathesis of gobshite, helped along by the British colloquialism bog meaning 'toilet.'

Gobshite is an interesting case, because even though it is now identified as chiefly Irish slang, it actually has an older documented history in American usage, surprisingly enough. The word has been used at least since 1910 to refer to an enlisted seaman in the US Navy, according to the OED and the Historical Dictionary of American Slang. HDAS editor Jonathan Lighter suggests that the Navy usage of gobshite is derived from another presumably earlier sense meaning "an expectorated wad of chewing tobacco." (Sailors on both sides of the Atlantic have long been associated with tobacco dribble, as illustrated by the related nautical epithets gob and gobby.) No pre-1910 cites for the chewing tobacco sense of gobshite have turned up yet, however — the earliest that Lighter has found dates to 1918, in an unpublished "manuscript glossary of US Navy terms" by Laurence G. Noyes (also cited by Lighter in his 1972 American Speech article, "The Slang of the American Expeditionary Forces in Europe, 1917-1919: An Historical Glossary").

So how did the Irish English pejorative gobshite develop? It has only been attested since 1948, according to the OED's draft entry of March 2002 (which defines the sense as "a stupid, incompetent, or gullible person; a person who talks nonsense or talks incessantly; a loudmouth"). The OED surmises that the Irish English usage may have arisen independently from the US Navy usage, and it may possibly be the earlier of the two senses, even if the citational evidence doesn't yet support that theory. Moreover, the Irish version could represent an assimilation, with a previously distinct word merging into gobshite. Bernard Share's Slanguage: A Dictionary of Irish Slang relates gobshite to Irish English gobshell meaning 'a gobbet of spittle.' Further confusing matters are the various meanings of gob that seem to be at play in the development of gobshite: as a noun gob can mean either 'mass, lump, gobbet' or 'beak, bill, mouth,' and as a verb it can mean 'to spit.' In the current use of gobshite, the gob element is primarily understood as a colloquialism for 'mouth': a contributor to BBC America's online dictionary succinctly defines gobshite as "one who speaks shite out of their gob."

Though the slang use of gob for 'mouth' is largely unknown in the US, it pops up in various expressions that Americans are familiar with. Willy Wonka's Everlasting Gobstoppers owe their name to the British term for what Americans usually call "jawbreakers": pieces of hard candy that "stop" (close) your "gob" (mouth). Then there's gobsmacked 'flabbergasted, astounded,' which is one of those Briticisms that some Americans pick up without quite appreciating the semantic particulars — in this case, the feeling of being astounded is equated with the feeling of being smacked in the mouth. (And I'll be gobsmacked if I don't receive at least a few emails from UK readers pointing out some detail or other where my American eyes and ears have led me astray.)

[Update #1: Lynne Murphy of Separated by a Common Language has this to say about bogshite:

Two untested hypotheses on bogshite:

(a) '(the) bog' is BrE slang for toilet/(AmE) bathroom, so it could be 'toiletshit', (b) Ireland has lots of peat bogs, so maybe it's relate to that — 'shit of the Irish earth', as it were. I'm preferring (b) since the term seems to have more currency in Ireland (I didn't know it before) and because of this quote:
"Fecking smell of turf and other Bogshite."
or this one:
"Trust me, I'm Irish as they come, I just don't sound like a bogshite from Cork."
Just asked Better Half and he said 'it's Northern', to which I said 'isn't it Irish?' and he replied 'Isn't it about the Irish? They're bogshite, they come from the bogs'.

So, that's what I can add (pure speculation), for what it's worth!

Joe Stynes agrees with the second explanation and further elaborates:

I'm sure the "bog" in "bogshite" is from a variety of Irish insults relating to country bumpkins, rustics, rednecks, call them what you will, used mainly by the urban sophisticates of the World City that is modern Dublin. "Bogger" is perhaps the standard of these insults. "Bogwog" I believe had some currency among the British Army. "Bogball" is an pejorative name for Gaelic football popularised by "Hot Press", a music newspaper written by aspiring bohemian-metropolitans in Dublin who presumably prefer soccer or rugby. See also "the mist that do be on the bog". Thus, a "bogshite" would be a gobshite from the bogs. ]

[Update #2: Apropos of shit/shite usage, Jonathan Knibb sends along his "all-time favourite limerick":

IIRC I found it in Kingsley Amis' autobiography, attributed there to Philip Larkin; I can only find one Google hit for it, which unfortunately disagrees, attributing it to Robert Conquest. Oh well:
A usage that's seldom got right
Is when to say shit and when shite;
And many a chap
Will fall back on crap,
Which is vulgar, evasive and trite.

Michael Albaugh emails to point out that Amis and Conquest co-edited several anthologies, which could explain the confused attribution of the limerick.]

Posted by Benjamin Zimmer at 12:05 AM

June 22, 2007

Here it is again without the nouns, verbs, and adjectives

Just as a postscript to Mark's comment in the previous post on the man who claimed modern performers do not use nouns, verbs, or adjectives, let's look again at the quoted passage from William Katz, this time removing the nouns, verbs, and adjectives to get a sense of how he thinks current performers talk. I have very generously kept the pronouns (though The Cambridge Grammar treats them as a special kind of noun) and the auxiliary verbs (though all right-thinking sources treat them as a special case of verbs), and I have classified like as a preposition (an argument could be given that it is really an anomalous adjective) and today as an adverb (it may actually be a noun phrase functioning as a temporal adjunct, like last night). Here is the result. Good luck with understanding it.

My most was with, who'd been, of, to. Was at the, and only was that he not be. I with her by for an. What through was her — she'd as a — and her. During that, she never a, and in — all the today don't. She was a. Like many from, she how to be, and it was of the. I can why her.

And yes, I was about to say something about the modern decline in grammatical literacy and the mendacious pontificating old fools who drone on about people not having proper grammar when in fact they couldn't syntactically analyze their way out of a wet paper grocery sack. But then I realized you would all be yawning and saying "Oh, another rant." So I just decided to leave the above for you to look at the next time someone says something in your presence about nouns or verbs or adjectives or grammar.

Posted by Geoffrey K. Pullum at 08:00 PM

Talking without nouns, verbs or adjectives

According to William Katz, formerly a writer and "talent coordinator" for the Tonight Show, who has contributed a series of reminiscences to the PowerLine blog:

My most memorable pre-interview was with Jane Wyman, who'd been married, of course, to Ronald Reagan. Reagan was governor at the time, and Jane's only stipulation was that he not be discussed. I spoke with her by phone for an hour. What came through was her wonderful, musical voice – she'd started as a singer – and her clear-headed intelligence. During that hour, she never made a grammatical error, and spoke in complete sentences – nouns, verbs, adjectives, all the stuff performers today don't use. She was a marvelous conversationalist. Like many stars from film's golden age, she knew how to be interviewed, and knew it was part of the job. I can understand why Reagan admired her. [emphasis added]

A few years ago, we learned about Michel Thaler's novel without verbs ("The verbless of the earth", 5/12/2007), and speculated about writing without nouns ("Writing verblessly is so jejeune!", 5/13/2007), and "Writing without adjectives" (5/13/2007).

But without any of them? And now they all don't? Wow!

[Hat tip to David Donnell]

Posted by Mark Liberman at 05:06 PM

Language regulation in the courts

I already knew that the majority of defamation cases today are brought against newspapers, magazines, television, and radio. But among the many things I didn't know about libel law is what I just learned from the June 21 issue of Legal Times -- that nearly ten percent of all libel suits nationwide are filed by judges who charge that they were defamed by the media. Recent examples include Chief Justice Robert Thomas, head honcho of the Illinois state judiciary, who just won a seven million dollar verdict (later reduced to a mere four mill) against The Kane County Chronicle, a small Illinois newspaper. Another defamation case won by a judge is the suit filed by Massachusetts Superior Court Judge Ernest Murphy, who just won a $3.4 million libel suit against TheBoston Herald.

In his 1999 book, Legal Language, Peter Tiersma calls defamation "a variety of language regulation that prohibits the uttering of certain types of speech, more precisely, allows those types only in very specific circumstances." He adds, "a public accusation of wrongdoing is a linguistic act that lowers the status of an individual who has violated community norms." (304) Of course, defamation law is more complicated that that, but this isn't a treatise on law.

These two cases have caused a buzz in the law community, which has some reason to suspect that judges who become plaintiffs in defamation cases hold an unfair advantage. For example, Justice Thomas had six current and four former Supreme Court justices testify on his behalf. But when the newspaper's lawyers tried to cross-examine them, the justices invoked the "judicial deliberation privilege" and refused to answer. Then there is the interesting problem of judges judging judges, to say nothing of the fact that the justices who testified on behalf of Justice Thomas were assigned to hear the newspaper's appeal. Also there is the nagging problem that many judges hold elected positions, making them political figures -- and we all know what the media can to to political figures. As for lowering "the status of an individual," the article points out that Justice Thomas was named chief justice by his colleagues after the Kane County Chronicle articles were published.

Oh, the buzz.

So young Language Log readers who are still planning career paths might consider studying to become a judge. There's real money to be made, if you can just get the media to print some language that defames you.

John Cowan adds, "To make things worse, the contest is uniquely one-sided. Judges have an absolute privilege to defame anyone, falsely, maliciously, and irrelevantly...What a judge says from the bench is completely out of the reach of defamation law, not matter how outrageous it is."

Posted by Roger Shuy at 11:36 AM

My hovercraft is full of turtlenecks

According to a editorial in the May 31 issue of Nature, dealing with science and technology at DHS ("The safety catch"),

Although opportunities exist to use technology to improve performance at the margins, much of the work is about the efficient application of simple techniques. Patrolling the borders requires little more than a pick-up truck and a pair of binoculars; managing immigration paperwork plays to the skills of adept clerical staff, not turtlenecked hackers; and patrolling a coastline can be done as well in a 1950s-era cutter as it can in a hovercraft. [emphasis added]

The reference to "turtlenecked hackers" took me by surprise. Nature's editors are certainly following George Orwell's admonition "Never use a metaphor, simile, or other figure of speech which you are used to seeing in print" -- a Google search for {"turtlenecked hackers"} yields their editorial and nothing else. But in this case, they may have followed the will-o-the-wisp of originality into the swamp of reader bafflement. I mean, it's true that I'm not exactly obsessed with clothing styles, but I've been at least a marginal member of the hacker category for a long time, and you'd think I would have noticed the turtlenecks.

We're talking about the "computer enthusiast" sense of hacker, not the "computer criminal" sense for which some people prefer the term cracker. I've never spent any time in the company of computer crackers, so I don't have any idea how they stereotypically dress -- but a search for {turtleneck hacker}, without -ed or quotes, does turn up a 2001 newpaper article "From convicted hacker to dotcom backer" with this passage:

Schmitz comes to the door. He is wearing a huge black suit, a black turtle-neck shirt and a pair of extraordinary black and white shoes that would not look amiss on a golf course. He is carrying a pair of dark glasses and wears one of those super-expensive Breitling watches that can send out an emergency signal if ever he gets into trouble.

However, this seems more like the signature outfit of a dotcom entrepreneur than the characteristic dress of a computer security threat. And I think this is the key to Nature's confusion -- they've been seduced by Steve Jobs, whose turtlenecks have become legendary (e.g. "The man, the myth, the turtleneck: Apple CEO Steve Jobs", 3/6/2006). Jobs is neither a hacker nor a cracker, but a high-tech marketer, though at least he's in the right industry.

And it does seem that the leaders of that industry have embraced the turtleneck. For some sartorial stereotyping hot from the engine-room of cybercreation, there's this play on words from "Valleywag Hotties: Quarterfinals results" (as of 6/21/2007): "In a round of clear winners, one race went turtleneck-and-turtleneck." Neither Steve Jobs nor Jim Buckmaster is exactly typical of the "hackers" you'd hire in place of "adept clerical staff" for "managing immigration paperwork", but at least we've figured out where those turtlenecks came from.

Returning to the passage from Nature, the turtlenecks are not the only jarring bit of technological iconography. I freely admit to ignorance of current directions in coastal patrol technology, but I think of the hovercraft as one of those 1960s visions-of-the-future that turned out not to be such a great idea in practice (even if The Matrix kept the flame alive by making Morpheus a hovercraft captain). Charles Hageman Frey made hovercraft disappointment the theme of an online magazine piece in 2000, "Where's my damn hovercraft?":

Hovercrafts--we were all suppose to have hovercrafts by now but instead we got little computers and moving walkways. Why? ... The main restricting element is having enough space to try to construct one of the machines ... the mere number of people who have enough space to tinker about with a hovercraft device is minimal.... Perhaps it will be soon now that Mr. Gates and some of his silicon valley cronies have gotten themselves decent housing with large garages and huge backyards, but then again they have their millions to count, not to mention they are all nesting with families.

But a 1997 Business Week profile of Alan Shugart, the co-founder of Seagate, suggests that hovercrafts were already discarded as toys in Silicon Valley 10 years ago: "[his] interests include collecting wine and gadgets -- he has a hovercraft he's never used -- and politics, where he ... tried to get his dog, Ernest, on the ballot in nearby Santa Cruz in 1996."

So if Steve Jobs took on the challenge of coming up with insanely great technology for the Coast Guard, I don't think that hovercrafts would be part of the pitch.

All of this highlights the difficulties of trying to choose those concrete and characteristic details that anchor an abstraction in the reader's mind. What do hackers wear? Well, if you've been imprinted by pictures of Steve Jobs keynotes at Macworld, that would be a turtleneck. What would be a high-tech way to patrol a coastline? Well, the hovercraft was the last new type of over-water vehicle to be invented, even if the idea was first tried out half a century ago, and lost technological momentum a couple of decades later.

And in Frey's remark about hovercrafts, he tried to characterize the set of successful high-tech inventors as "Mr. Gates and ... his silicon valley cronies". But Redmond's a long way from San Jose, both geographically and socially. Metonymy's a bitch.

[If you'd like to revisit the Monty Python sketch that inspired this post's title, a transcript is here, and you can watch it here. And I'm not the first person to have substituted something other than "eels" in {"my hovercraft is full of __"}.]

[Update -- David Vinson suggests:

Maybe the "turtlenecked hackers" is not a reference to costume, but to posture (and/or body type), i.e., turtle-necked rather than turtlenecked. Sometimes I am guilty of peering at my own computer screen in a very turtle-necked manner (although I am "geek" at best, not "hacker").

True enough. But this is an occupational hazard of early-21st-century humans in general, including that "adept clerical staff".]

Posted by Mark Liberman at 07:40 AM

June 21, 2007

Prepositions over at Volokh

Those of you interested in the oddities of preposition usage might want to pay a visit to The Volokh Conspiracy, an interesting legal blog at which Eugene Volokh has just posted about the weirdness of English prepositions.

Posted by Bill Poser at 06:01 PM

Caribbean Monetary Notation

One helpful reader of my post on Number Delimitation pointed out that some localization data indicates that in several Caribbean countries, in both English and Spanish, although ordinary numbers are written with the usual grouping into threes, in monetary values only the low group of three is delimited. In other words, an ordinary number looks like this: 123,456,789 but the same number of dollars is written $123456,789. Can anyone confirm this practice? If so, does anyone know its origin?

Posted by Bill Poser at 05:13 PM

The 100 Top Celebrities

Forbes magazine has published its list of the 100 top celebrities. Unbelievably, not a single one is a linguist! At least, not as far as I can tell. I haven't the faintest idea who 44 of them are. But I figure that if they were linguists I would probably know.

Unless you spot a linguist on the list, please don't write to inform me about the 44 I don't know. Many years ago, a bunch of grad students were stunned to discover that I didn't know who Michael Jackson was. Having found out who he was, I cannot say that I considered myself to be better off than before. Subsequent developments have only confirmed this view.

Posted by Bill Poser at 01:23 PM

Banning "rape" in a rape trial

Sean Albright called my attention to a Slate article about a recent gag order given by a district court judge in Lincoln, Nebraska. As the writer puts it, it's a language war. The judge granted a defense motion to ban use of the words, "rape," "sexual assault," "victim," "assailant," and "sexual assault kit" from an alleged rape trial. The prosecution countered with a motion requesting that the words, "sex" and "intercourse" also be banned, but the judge denied this motion, possibly out of fear that the lawyers and witnesses might run out of words to describe what happened. My friend Bruce Lyons, a Ft. Lauderdale criminal defense lawyer, doesn't see a problem here. He pointed out what any good defense lawyer might be expected to say:

A prosecutor who has the facts does not have to rely on words like 'victim' or 'rape.' I don't see what the hullabaloo's about. It seems like someone is looking for a scapegoat.

Lyons also believes a prosecutor should be able to elicit testimony from witnesses, including the persons on whom the act was allegedly committed, without using words like "rape."

Here we have a problem of the conflict between legal language, required at trials, and the language of outsiders to law, especially victims (oops, I used that word) and witnesses, even expert witnesses. Outsiders try to describe what happened, using their own words. They are often oblivious to the requirements of legal language and concepts. "Sex" and "intercourse," for example, are considered legally neutral words. "Rape" and "victim" presuppose conclusions that are to be drawn by the judge and jury, not by the witnesses and lawyers. Saying "the man assaulted me" goes beyond a description of what happened and points to a legal conclusion.

But this distinction is not easily evident to a woman in a trial like this as she tries to testify about what she says happened to her. The judge ruled that she will have to use legal terms, not her own. But it is likely to be difficult for her to say, "then we had intercourse" when what she wants to say is "then he raped me." Here we see a conflict with the required legal register. Physicians should not expect their patients to speak medicalese, but law seems to require that witnesses learn to talk in the legal register. The banning of certain words at trial has its good and bad points but we have to remember that it's part of the court's responsibility to keep trials fair.

Although I can understand both sides of the banned language issue, sometimes it can get very complicated. For example, a few years ago I was about to testify at a murder trial in Virginia. I analyzed the tape recordings of some very hard-to-hear tapes and prepared a transcript so that the jury could follow my testimony. As it turns out, the prosecution did not even try to transcribe these muddy sounding tapes but relied instead on what the undercover officer had testified about what was on them. The officer was way off in his interpretation and I was primed to show this. Virginia has some very odd procedures when it comes to expert witnesses. I was isolated in a room next to the courtroom with a guard watching my every move. He even accompanied me to the men's room. After about five hours under guard, I was finally called to testify. As I described my transcript to the jury I was surprised that the judge became furious. It seems that while I was waiting under guard in the witness room, the judge had ruled that the word, "transcript," could not be used in the trial. My problem was that nobody had bothered to tell me this. It was a discombobulating experience and on the spot I had to reconstruct how I could bring out the points I wanted to make without referring to the transcript that I was not permitted to use with the jury. I felt a bit like the woman in the alleged rape trial, but at least she knows about the gag order. I didn't.

Posted by Roger Shuy at 12:11 PM

More on data catalysis

In commenting on Patrick Pantel's "Data Catalysis" paper, I quoted a remark that Fernando Pereira made a few years ago, to sum up the problem of effective access for computational linguists to web-scale data. This was after a talk by Peter Norvig on aspects of Google's insfrastructure; Fernando said something like "I feel as if we're particle physicists and you have the only accelerator".

Fernando read Patrick's paper, and laid out on his blog the way that he feels about the problem now. There are two key quotes:

"I'm worried about grid-anything. In other fields, expensive grid efforts have been more successful at creating complex computational plumbing and bureaucracies than at delivering new science."

"Our problem is not the lack of particle accelerators, but the lack of the organizational and funding processes associated with particle accelerators."

I strongly agree with the first of these -- it's why I emphasized that Patrick is trying to create modular and shareable architectures and software, not yet another supercomputer center. And I also agree with the points that Fernando makes about choosing problems, and about the serious mismatch between current research opportunities and current academic models for funding, staffing and research management.

However, I continue to believe that Patrick is addressing an important set of issues. As both Patrick and Fernando observe, the hardware that we need is not prohibitively expensive. But there remain significant problems with data access and with infrastructure design.

On the infrastructure side, let's suppose that we've got $X to spend on some combination of compute servers and file servers.. What should we do? Should we buy X/5000 $5K machines, or X/2000 $2K machines, or what? Should the disks be local or shared? How much memory does each machine need? What's the right way to connect them up? Should we dedicate a cluster to Hadoop and map/reduce, and set aside some other machines for problems that don't factor appropriately? Or should we plan to to use a single cluster in multiple ways? What's really required in the way of on-going software and hardware support for such a system?

To some extent, the answers to such questions depend on your local problems and opportunities. (Maybe the key constraint turns out to be power and cooling, for example.) But with some luck, people like Patrick will come up with experiences that others can copy (or avoid, depending on how they turn out), and even with whole designs that can be replicated on various scales.

These are problems worth solving, even if they're not the ones that Fernando lays out in his post.

Posted by Mark Liberman at 07:44 AM

Great moments in antedating

In the search for the early history of common words and phrases, sometimes a discovery that pushes back the documentary record just a few years can be quite momentous indeed. Such is the case with an April 25, 1964 article in the Tucson (Ariz.) Daily Citizen (a recent addition to the Newspaperarchive database), which Sam Clements recently unearthed and reported to the American Dialect Society mailing list. In the article, entitled "Talking Hip In The Space Age," writer Stephen Trumbell surveys the lingo surrounding the then-burgeoning space program. In the midst of all the NASA-talk comes this paragraph:

See anything notable there? Take a look at the penultimate sentence — "'Give 'em the whole nine yards' means an item-by-item report on any project." This represents something of a Holy Grail among word sleuths: a significant antedating (i.e., an earlier citation than what is already known) for the elusive phrase the whole nine yards, meaning 'the full extent of something.'

The whole nine yards serves as a rare counterexample to the Recency Illusion: despite many theories for its origin in the distant mists of time, it has only been documented since the 1960s. (For a roundup of the theories, from Scottish kilts to concrete trucks, see Dave Wilton's Wordorigins.org, Michael Quinion's World Wide Words, Gary Martin's Phrase Finder, and Cecil Adams' Straight Dope, with further coverage in Wilton's book Word Myths: Debunking Linguistic Urban Legends and Quinion's Ballyhoo, Buckaroo, and Spuds.) Previously, the earliest known cite for the whole (or full) nine yards appeared in Elaine Shepard's 1967 book The Doom Pussy, written in 1966 about Air Force pilots serving in Vietnam. One of these pilots, Major "Smash" Crandell, is quoted as using the whole nine yards on more than one occasion in the book. There's some other scattered evidence from the late '60s supporting the idea that the phrase was first popularized in US Air Force circles before spreading to wider usage.

The newly discovered cite from 1964 lends credence to an Air Force origin, since the space program has been strongly intertwined with the Air Force throughout its history. Also, any origin theory specific to the Vietnam War (such as speculation that the "nine yards" had something to do with nine Montagnard hill tribes) now seems unlikely, since the article came out a few months before the Gulf of Tonkin Resolution escalated US troop levels in August 1964.

Unfortunately, the article still doesn't help us figure out what the "nine yards" might have first referred to. I highly doubt that an "item-by-item report" was imagined to take up nine yards of paper, so the phrase had already been transferred to a figurative sense by this point. The search goes on for the original referent, and with the help of ever-growing newspaper databases I think we'll find it one of these days.

[Update: Last year, Arnold Zwicky discussed the whole nine yards as an example of the Antiquity Illusion, the converse of the Recency Illusion.]

[Update, June 26: Using Newspaperarchive (which seems to be expanding its holdings by the day), Sam Clements has found the same article appearing a bit earlier in another paper: San Antonio (Tex.) Express and News, April 18, 1964, p. 11-A.]

Posted by Benjamin Zimmer at 01:08 AM

June 20, 2007

The BBC Admits an Error - Only It Isn't

We have commented quite a few times on the BBC's incompetent coverage of matters linguistic, and more generally its tendency to report dubious science, and on the fact that the BBC declines to acknowledge errors and even reprints erroneous stories. That might incline one to think that the BBC simply never admits error. One would be wrong.

The BBC admitted an error on June 12th. What was this error? It was the fact that

the reporter in the film broadcast immediately before the England v Israel football match in Football Focus (BBC1, 24 March 2007) had referred to Jerusalem as the capital of Israel.

There was no error. Jerusalem is the capital of Israel. Israel identifies Jerusalem as its capital, as do Jews throughout the Diaspora. The Knesset (parliament) sits in Jerusalem. The official residences of the President and the Prime Minister are in Jerusalem. The Supreme Court sits in Jerusalem. The Bank of Israel and various ministries have their headquarters in Jerusalem. The United States recognizes Jerusalem as Israel's capital. It is true that Muslims, wishing to claim Jerusalem for themselves, dispute its role as Israel's capital, and that many countries follow their wishes in not recognizing Jerusalem as Israel's capital, but this is irrelevant to the question of whether Jerusalem is the capital of Israel. Whether Jerusalem should be Israel's capital is a controversial political question; whether it is, is a simple matter of fact.

So, what is going on at the BBC? Time and again they screw up their science coverage and refuse to correct it, but when the Muslim Public Affairs Committee complains they "correct" a perfectly accurate broadcast?!

Update: Not surprisingly, lots of readers have reactions to this post. I'm not going to debate those who want to argue about Israel. This isn't the place, and that isn't the point. The rights and wrongs of Middle Eastern politics simply have nothing to do with the factual question of whether Jerusalem is the capital of Israel. If a country designates a city as its capital and locates its central governmental institutions there, that city is its capital. That is true whether or not you or I like that country's policies or approve of its choice of capital. The choice of capital is not, in international law, up to anyone other than the country itself. While numerous critics of Israel claim that "in international law" Jerusalem is not recognized as the capital of Israel, I have yet to see a single reference to the law, treaty clause, declaration, treatise or other source for the claim that a country's choice of capital is a matter of international law. Quite the contrary, it appears to be settled law that a country's choice of capital is a purely internal matter. Other countries may, as a political gesture, insist on locating their embassies elsewhere, but nothing either in the notion of "capital city" or in international law makes this determinative of the location of the capital.

By the same token, those who do not recognize Israel at all frequently object to the mere mention of Israel and prefer something like "the Zionist Entity". Even if one has some sympathy for their views (which I do not), surely the existence of Israel and the fact that it is called Israel are facts independent of one's political position and it is not objectively an error to refer to Israel.

Some people dispute my statement that the United States recognizes Jerusalem as the capital of Israel on the grounds that the US embassy is in Tel Aviv. I stand by my statement. The Jerusalem Embassy Act of 1995 states that: "(1) Each sovereign nation, under international law and custom, may designate its own capital. (2) Since 1950, the city of Jerusalem has been the capital of the State of Israel". The fact that, for political reasons, the Executive branch has not complied with Congress's stated desire to move the US embassy to Jerusalem does not change the fact that the official position of the United States is that Jerusalem is the capital of Israel.

Posted by Bill Poser at 02:29 PM

Single-X education

I was struck by this sentence from Janine DeFao's "Single-gender education gains ground as boys lag", SF Chronicle, 6/18/2007:

There were three public schools nationwide offering single-gender instruction in 1995 and 262 today, still a small fraction of the country's more than 90,000 public schools, according to Leonard Sax, executive director of the National Association for Single Sex Public Education.

Given the name of his organization, Leonard Sax probably used "single-sex" to describe the schools in question -- and apparently DeFao (or her editor) corrected this to "single-gender". Why?

One obvious hypothesis: she felt that gender is the more appropriate word, based on a distinction that the wikipedia explains this way:

The sex/gender distinction is a concept in feminist theory, political feminism, and sociology which distinguishes sex, a natural or biological feature, from gender, the cultural or learned significance of sex.

But this seems unlikely. As far as I can tell, the 262 public schools that segregate males and females divide their pupils by biological sex, not cultural outlook. I bet that tomboys don't get put in the male wing, for example.

Instead, it seems to be the practice at the Chronicle (and elsewhere in the media) to reserve sex for genital pleasure and its associations, and to use gender for the distinction between males and females.

Overall, DeFao's article uses gender 13 times, including in its headline:

... provided gender training to its entire staff and parents by the Gurian Institute of Colorado...
... trained 30,000 teachers in gender differences and learning...
... gender brain differences remain controversial...
...a review of studies on neurology and gender...
...race and class play a bigger a role than gender...
...brain research also is reigniting interest in single-gender education...
...three public schools nationwide offering single-gender instruction in 1995...
...make it easier for schools to create voluntary single-gender classes...
...author of "Why Gender Matters"...
...to see gender differences in action...
...the only single-gender public school in the Bay Area...
...hard to know how much is attributable to single-gender instruction...

The lexeme sex comes up three times, once in the plural, once in the name of an organization, and once in the legal term of art "sex discrimination":

...the two sexes learn differently...
...the National Association for Single Sex Public Education...
...Title IX -- which banned sex discrimination in schools in 1972...

A quick search of the paper's online site suggests that sex (in the singular) is generally used to refer to genital pleasure, e.g.

A veteran San Francisco police sergeant was charged Monday with having sex with an underage girl, authorities said.
A mother accused of arranging a sex pact to allow her boyfriend to have sex with her 15-year-old daughter while the woman recuperated from surgery was sentenced Monday to 12 to 22 1/2 years in prison.
Two-thirds of parents said they are very concerned about sex and violence the nation's children are exposed to in the media...
Seven convicted sex offenders with profiles on MySpace.com have been arrested ...

The only exception seems to be the phrases "same-sex union(s)" and "same-sex marriage" -- for some reason, these are not corrected to "same-gender" -- except in one quoted context:

Florida forbids "homosexuals" from adopting, Mississippi bans "same-gender" couples from adopting, Utah bans fostering and adoption by all unmarried couples and Nebraska has a policy prohibiting gay people from fostering.

The plural sexes retains the biological-categories meaning, not only in fixed phrases like "battle of the sexes", but also more generally.

As for gender, it's sometimes used for socially-constructed categories:

Freeplay plays fast and loose with gender roles: In Klipp's latest dance, guys grind against guys and girls against girls, and Klipp's ultra-feminine girlfriend, fellow choreographer Sarah Bush, does a solo dance to a song titled "If I Was Your Man."

However, it's also used to refer to the basic biological division:

Bradley, who was on his way to work in Antioch, told authorities he had stopped to stretch his legs when he spotted the gator, whom he called Maria after his granddaughter (although state Fish and Game officials aren't sure of the reptile's gender).

(It's true that sex in alligators is determined by different biological mechanisms than in mammals, but I don't think that the usage is different on that account.)

Historically, sex has been used since the 14th century to denote what the OED calls "Either of the two divisions of organic beings distinguished as male and female respectively; the males or the females (of a species, etc., esp. of the human race) viewed collectively":

1382 WYCLIF Gen. vi. 19 Of alle thingis hauynge sowle of ony flehs, two thow shalt brynge into the ark, that maal sex and femaal lyuen with thee.
1532 MORE Confut. Tindale II. 152, I had as leue he bare them both a bare cheryte, as wyth the frayle feminyne sexe fall to far in loue.

The use of sex to refer to genital pleasure is apparently much more recent -- the OED's earliest citation is to DH Lawrence in 1929:

1929 D. H. LAWRENCE Pansies 57 If you want to have sex, you've got to trust At the core of your heart, the other creature.
1952 S. KAUFFMANN Philanderer (1953) x. 174 Her arms went around his neck and his hand rested on her waist, and they had a brief moment of friendship before the sex began.

On the web in general, sex is holding its own in phrases "same sex marriage", but seems to be yielding ground to gender in phrases like "single sex education":

	sex	gender	sex/gender ratio
single __ education	134K	53.1K	2.52
single __ school(s)	53.4K	22.8K	2.34
same __ marriage	1.33M	43.9K	30.3
same __ union(s)	623K	18.6K	33.5
same __ couple(s)	442K	561	788

Apart from such phrases, the media (and perhaps the culture at large) seem to be converging on a split between gender for the biological categories and sex for genital pleasure, with sexes being retained as an optional irregular plural for gender.

[Update -- Simon Tatham writes:

I don't know if I'm unique or in a large majority, but I thought you might be faintly amused to hear that at least one person was briefly confused by the title of this post. When I saw `single-X' my immediate thought was of chromosomes - i.e. I instinctively read the phrase as meaning _boys_-only education. This was close enough to the actual subject of the post that I got most of the way through before realising that in fact you were using the X as a placeholder for either `gender' or `sex', and didn't mean it literally at all!

Actually, I intended "single-X" as a sort of a pun, and I originally made that clear(er) in the body of the text by discussing the biology of mamalian sex in a certain amount of detail. But I didn't have time to finish the discussion and tie it back to the traditional sex/gender terminology, so I wound up postponing the biolexicography for another time, leaving the title semiotically stranded.]

[Randy Alexander writes:

One small idea that may contribute to the reason "gender" is becoming more popular in the context of same-sex education: the phrase "sex education" is included in "same-sex education". It makes sense to want to be removed from that association. "Sex school", would be an even worse association.
There are no phrases "sex marriage", or "sex union", and even "sex couple" would be very unlikely, so "sex" would pose no problem in those constructions.

This makes sense, but there seems to be a broader tendency to use gender as a sort of euphemism for sex, even when there is little chance of misunderstanding. Here's an example from the recent news:

It may be hard for parents to believe, but believe it or not science has developed a way to tell the gender of your baby even before morning sickness kicks in.

Since the test is based on lookng for fetal Y chromosomes in the mother's bloodstream, the quality in question here is about as biological as possible.]

Posted by Mark Liberman at 06:59 AM

June 19, 2007

Number Delimitation

In English when numbers are written using numerals the usual convention is to separate the fractional component from the integral component by means of a decimal point and to break the integral component into groups of three digits using commas, e.g. 123,456,789.12. A common alternative is not to separate the integers at all, e.g. 123456789.12. In some languages, the delimiters used are different. In a number of European languages, for instance, the roles of comma and period are swapped, e.g. 123.456.789,12. One occasionally sees other characters used for group separators, including spaces, e.g. 123 456 789.12 and apostrophes, 123'456'789.12.

Whereas in North American English the integral part is broken into groups of three, in some other languages other systems are used. I know of only two. One is grouping into sets of four digits rather than three. This is sometimes found in the Sinosphere, when numbers are written using place notation, where it presumably reflects the fact that named units in Chinese and other languages of the region occur at intervals of 10⁴. In Chinese, for example, we have such units as 一 10⁰, 十 10¹, 百 10², 千 10³, 万 10⁴, 億 10⁸, 兆 10¹² and 京 10¹⁶. Intermediate values are multiples of the preceding unit. For example, 10⁶ is 百万, that is, 100 times 10,000.

The other system is found in the Indosphere. It makes a group of the lowest three digits but thereafter uses groups of two, e.g. 12,34,56,789. This is again probably connected to the distribution of basic units in the spoken language.

Here, for example, are the powers of ten in Panjabi. Note that there are named units for ten, one hundred, and one thousand, but that thereafter there are named units at intervals of 10², that is, two decimal digits.

The Powers of Ten from Zero Through Fifteen in Panjabi
Power	English Numerals	Gurmukhi Numerals	Spelled Out
10⁰	1	੧	ਇੱਕ	ikk
10¹	10	੧੦	ਦਸ	das
10²	100	੧੦੦	ਸੌ	sau
10³	1,000	੧,੦੦੦	ਹਜ਼ਾਰ	hazār
10⁴	10,000	੧੦,੦੦੦	ਦਸ ਹਜ਼ਾਰ	das hazār
10⁵	100,000	੧,੦੦,੦੦੦	ਲੱਖ	lakkh
10⁶	1,000,000	੧੦,੦੦,੦੦੦	ਦਸ ਲੱਖ	das lakkh
10⁷	10,000,000	੧,੦੦,੦੦,੦੦੦	ਕਰੋੜ	karōṛ
10⁸	100,000,000	੧੦,੦੦,੦੦,੦੦੦	ਦਸ ਕਰੋੜ	das karōṛ
10⁹	1,000,000,000	੧,੦੦,੦੦,੦੦,੦੦੦	ਅਰਬ	arab
10¹⁰	10,000,000,000	੧੦,੦੦,੦੦,੦੦,੦੦੦	ਦਸ ਅਰਬ	das arab
10¹¹	100,000,000,000	੧,੦੦,੦੦,੦੦,੦੦,੦੦੦	ਖਰਬ	kharab
10¹²	1,000,000,000,000	੧੦,੦੦,੦੦,੦੦,੦੦,੦੦੦	ਦਸ ਖਰਬ	das kharab
10¹³	10,000,000,000,000	੧,੦੦,੦੦,੦੦,੦੦,੦੦,੦੦੦	ਨੀਲ	nīl
10¹⁴	100,000,000,000,000	੧੦,੦੦,੦੦,੦੦,੦੦,੦੦,੦੦੦	ਦਸ ਨੀਲ	das nīl
10¹⁵	1,000,000,000,000,000	੧,੦੦,੦੦,੦੦,੦੦,੦੦,੦੦,੦੦੦	ਸੌ ਨੀਲ	sau nīl

The terms for 100,000 and 10,000,000 correspond to the lakh and crore of Indian English, though these are probably borrowed from the Hindi लाख and करोड rather than the Panjabi forms.

To summarize, the grouping rules with which I am familiar are:

No delimitation of integers, e.g. 123456789
Groups of three, e.g. 123,456,789
Groups of four, e.g. 1,2345,6789
Low group of three, other groups of two, e.g. 12,34,56,789

One can imagine groups larger than four, or consistent use of groups of two, or even the use of groups that double in size. As a hypothetical example of the latter, one could imagine a system like this: 123456789123,789123,456,789. Here the first group represents ones, the second thousands, the third millions, and the fourth British billions, that is, not 1,000 million as in the United States but 1,000,000 million as in Britain. However, as far as I know, such numerical systems are unattested. Does anyone know of other attested groupings?

Update: Reader Isabel Lugo points out that Donald Knuth's proposed -yllion system is similar to the one that I mentioned in which the groups increase in size. Knuth's system is actually a bit more complicated in that it has multiple delimiters. Knuth's inspiration was an ancient Chinese system, but I am confident, in spite of the fact that I don't have Knuth's article to hand, that the system that inspired him did not work exactly the way I described for the simple reason that the use of place notation in the standard Chinese number system is fairly recent. The system that inspired Knuth was almost certainly a non-place-based system in which the values of the named units increased by squaring rather than by multiplication by ten thousand. In other words, the values would coincide with the current standard through 10⁸, but the next named unit would have the value 10¹⁶ rather than 10¹², and the next 10³² rather than 10¹⁶. As far as I know, no Chinese numerical system has ever combined place notation with groups of doubling, or other non-constant, size.

Posted by Bill Poser at 11:08 PM

A style book joke

A reader recently pointed me to his favorite "common spelling error", which he found in a piece by Roy Blount Jr. ("Is the Pope Capitalized?", in his 1982 collection One Fell Soup, p. 84), who got it from Bobby Ray Miller's United Press International Stylebook (1977, p. 29):

burro, burrow A burro is an ass. A burrow is a hole in the ground. As a journalist you are expected to know the difference.

Blount, reviewing four style guides for journalists, commented, "The UPI book has the best joke."

There might be some previous history for the joke; I'm not especially interested in tracing quotations back in time, so 1977 is good enough for me.

The entry surely gets into style guides just for its value as humor: burro and burrow are on many lists of homophones, but not on lists of commonly confused words.

The latest UPI guide (Bruce Cook & Harold Martin, UPI Stylebook and Guide to Newswriting, 4th ed., 2004) has a burro(w) entry (p. 37), but it has only the first two sentences and is missing the zinger "know the difference" sentence.

Plenty of quotations in the years since 1977, plus some paraphrases, as in this entry from the "Condensed Stylebook":

... burro, burrow One's an ass, the other's a hole in the ground and reporters ought to know the difference.

and some embroidery, as in this 2001 piece by John Irvine Ades:

I was not, myself, in the habit of entering the margins of my students papers to make droll comments on their foibles. But I cannot forbear reporting a choice temptation that one of my teachers was led into (despite Matt. 6.13). A student had been asked to write an essay on the subject of what he had done during the summer vacation. This young dude had been to the Grand Canyon, where, he wrote, he had gone down into the Canyon on a burrow. Seeing the supererogatory w, the professor at first steeled himself; but then, seeing an opportunity that might knock but once in a lifetime, he wrote in the margin a slang saying, the gist of which may be more discreetly conveyed by the saucy entry for burro, burrow in The UPI Stylebook: A burro is an ass. A burrow is a hole in the ground. As a journalist you are expected to know the difference.. . . You know, to be honest I don't think I could have resisted, either.

and, finally, versions transported to other contexts, like this one from James Landau on ADS-L, 12/28/02:

An Annapolis midshipman once wrote "Sancho Panza, sitting on his burrow..." The instructor wrote back "a burro is an ass. A burrow is a hole in the ground. As a future Naval officer, you are expected to know the difference."

and this one from the American Language Review in 1998:

Decades ago, Carl Cochran, retired Professor of English at Colby Sawyer College in New Hampshire, taught at Shady Side Academy in Pittsburgh. He received a composition in which one of his students described his summer adventures in Venezuela, where he had worked for Gulf Oil Company. One error kept appearing throughout the paper. The student consistently misspelled the word burro as burrow.

At the end of the essay, Professor Cochran wrote: "My dear sir: It is apparent to me from your spelling that you do not know your ass from a hole in the ground."

It looks like all the fabled student spelling errors have burrow for burro, which is what you'd expect: the more common word for the less common.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:38 PM

Blinding us with science

Andrew Sullivan joins David Brooks in concluding that the movement for sex-segregated education has "brain research backing it up". He quotes approvingly from an article in yesterday's San Francisco Chronicle (Janine DeFao, "Single-gender education gains ground as boys lag"), which in turn quotes approvingly from Michael Gurian and Leonard Sax, the main promoters of pseudo-scientific arguments for single-sex education.

Many of their criticisms of current educational policies seem reasonable to me. However, their arguments from neuroscience are exaggerated, not to say completely bogus.

Misleading appeals to the authority of "brain research" have become the modern equivalent of out-of-context scriptural fragments. Andrew Sullivan wouldn't accept a politician's bible quotations uncritically, and he should learn to be just as skeptical of psychologists.

About a year ago, when David Brooks began promoting the single-sex education movement, I looked into some of the "science" presented in the movement's central texts: Leonard Sax's Why Gender Matters, and Michael Gurian and Kathy Stevens' The Minds of Boys. What I found was shockingly careless, tendentious and even dishonest. Their over-interpretation and mis-interpretation of scientific research is so extreme that it becomes a form of fabrication. If you care about such things, read the posts listed below, and then go back and re-read Janine DeFao's article in the SF Chronicle:

"David Brooks, cognitive neuroscientist" (6/12/2006)
"Are men emotional children?" (6/24/2006)
"Of rats and (wo)men" (8/19/2006)
"Leonard Sax on hearing" (8/22/2006)
"More on rats and men and women" (8/22/2006)
"The emerging science of gendered yelling" (9/5/2006)
"The vast arctic tundra of the male brain" (9/6/2006)
"Girls and boys and classroom noise" (9/9/2006)

(A list of links to other relevant posts is here.)

I believe that the pluralistic nature of American education, fostered by local control and the mixture of public and private institutions, is a good thing. Single-sex schools surely deserve a place in the mix, and perhaps it should be a bigger place. And science is obviously relevant to the discussion. But a handful of ideologues like Sax and Gurian are trying to pass their own convictions off as scientifically demonstrated truth. Their opinions deserve to be heard, but their false invocations of scientific authority should be condemned and rejected.

Posted by Mark Liberman at 10:01 AM

Four-letter words

Benjamin Monreal writes:

Your post on Wiiitis brings to mind the late George Starbuck's poem, "Verses to exhaust my stock of four-letter words":

From the ocean floors, where the necrovores
Of the zoöoögenous mud
Fight for their share, to the Andes where
Bullllamas thunder and thud,

And even thence to the heavens, whence
Archchurchmen appear to receive
The shortwave stations of rival nations
Of angels: "Believe! Believe!"

They battle, they battle---poor put-upon cattle,
Each waging, reluctantly,
That punitive war on the disagreeor
Which falls to the disagreeee.

Posted by Mark Liberman at 08:14 AM

June 18, 2007

ISOC, ESOC

A while back I posted about two cases where the pronoun whom is often used for the subject of a clause (against the prescription that who should be used for subjects) and there's some structural motivation for choosing whom. These I labeled ISOC (for "in-situ subject of an object clause") and ESOC (for "extracted subject of an object clause") -- hang on, I'll explain these -- which immediately suggested Shadrach and Meschach in the fiery furnace, although I have no good candidate for the Abednego character.

I suggested at the time that some people might have adopted ISOC or ESOC or both as part of a (non-standard) system for assigning case to the pronoun WHO. Now I've collected some evidence in favor of this idea.

First ISOC, as in

(1) Extra copies will be provided for whomever needs them.

As I said in my earlier posting, here

we have an object clause (usually the object of a P) with WHO as its subject. The pronoun then immediately follows the governor, and could easily be mistaken for its object (even though it's the whole clause that's the object).

So the pronoun picks up its case from its location, rather than from its syntactic function within its clause.

Sentence (1) is adapted from an example in a 1981 article by Maxine Hairston in College English (43.8.794-806): "Not all errors are created equal: Nonacademic readers in the professions respond to lapses in usage". Hairston reported on a study in which she mailed 101 professionals (none of them English teachers) a questionnaire of 65 items, each including "one error in standard English usage", asking them to choose one of three responses for each item: "Does not bother me; Bothers me a little; Bothers me a lot." She got 84 questionnaires back, and grouped the items into six levels according to the ratings on the returned questionnaires: "Outrageous, Very Serious, Serious, Moderately Serious, Minor, or Unimportant".

Notice that I said that the ISOC example (1) above was "adapted from" one of Hairston's -- the only item on the questionnaire testing case choice for WHO. In fact, the item (Hairston's #1) on the questionnaire was not (1), but

(1') Extra copies will be provided for whoever needs them.

Wonderful. Hairston was assuming that the ISOC version is the correct one; indeed, she says that the problem with (1') is "using 'whoever' in a sentence that called for 'whomever'" (p. 797). Her respondents rated this sentence as only a Minor error -- quite possibly because they saw no error in it at all (even though they were told that each sentence contained one error in the "conventions of grammar"), or because they had a twinge about the passive, or thought extra (rather than additional) might be a tad colloquial, or whatever. We'll never know: Hairston didn't have access to the respondents' reasons for their ratings, and we can't even ask her for her opinions, since she died two years ago.

But we do have access to Hairston's opinions about ISOC. Twenty-five years ago, this highly respected professor of English at the University of Texas was in favor of it. I very much doubt she was alone in this view. So, ISOC lives.

On to ESOC. As with ISOC, as I said in my earlier posting,

there's an object clause, but this time its subject has been extracted and now appears at the front of a higher clause. Still, the gap of extraction immediately follows the governor (most often, a V)...

The gap can then be assigned accusative case (by position rather than syntactic function within its clause), and if this case is inherited by the extracted element, we get whom. Here's Roy Blount, Jr. complaining about ESOC, also 25 years ago:

... the most prevalent who/whom mistake -- you see it even in the Times -- is the undue whom, as in, "The Pope listed all those whom he felt would rise from the dead."

In the notation from the earlier posting (with clause boundaries indicated by bracketing, with the extracted element bold-faced, and with the gap of extraction marked by underlining):

The Pope listed all those [ whom he felt [ ___ would rise from the dead ] ]

Blount is known primarily as a humorist, but much of what he writes can be fairly characterized as light essays, often on serious subjects. His reflection on whom comes (on p. 85) from "Is the Pope capitalized?", a review of four style guides for journalists reprinted in his 1982 collection One Fell Soup: Or, I'm Just a Bug on the Windshield of Life (Penguin paperback).

It's not just the New York Times. Reader Chris Lance, who finds ESOC jarring and tends to notice it, blogged in his journal about two examples in Colm Tóibín's The Master and five in Iain Pears's An Instance of the Fingerpost, all in relative clauses. From The Master:

He sent his book on the matter to those in England [ whom he thought [ ___ might initiate a debate ] ]. (p. 79)

... people [ whom I don't think [ ___ ever knew Constance ] ] claim to miss her." (p. 197)

From Fingerpost (supplied to me by Lance in e-mail), in the U.K. Vintage paperback edition:

She had killed a man [ whom she said [ ___ had raped her ] ], but the jury judged this a lie because she had fallen pregnant, which cannot occur without the woman taking pleasure in the act. (p. 147)

I also learned from the keeper that Lord Mordaunt — [ whom I discovered [ ___ was bitterly detested in the town for his lack of extravagance ] ] — was indeed in residence as warden of the castle ... (p. 222)

Grove is pressing his case and is winning over several members of the Fellowship [ whom I assumed [ ___ were on my side ] ]. (p. 243)

The man then pointed out a beggar on the street outside, [ [ whom he said [ ___ was once a sailor in a Candia ship ] ]. (p. 400)

... the Blundy girl ... spent much time travelling from Burford in the west to Abingdon in the south, carrying messages to sectaries [ whom, he was sure, [ ___ would in due course rise up as one when the murder of Clarendon had thrown the country into turmoil ] ]. (p. 604)

I suspect that these are not inadvertent slips, or hypercorrections at the moment of writing, but how the writers think case-marking of WHO works in object clauses in English. (As I said in my earlier posting, there's a long history of such practices.) In fact, ISOC and ESOC might now be the primary islands of whom use in modern written English, outside of the mainland of P + whom -- that is, object whom with a fronted (rather than stranded) preposition, as in To whom did you give the book? and the student to whom I gave the book.

A final entertainment. Kenneth Ulrich reports in e-mail:

I live in Sweden, where business is often conducted in English. Last year, I attended a presentation, held by a Swede whose English was nearly flawless, on the things that remained to be done in a certain project. Several of the speaker's PowerPoint slides featured a table with two columns: "Action" (that is, what needed doing) and "Who/Whom?".

I first thought, wildly, that I was witnessing an act of political subversion--capitalist deeds formulated in Leninist terms--but then realized that no one was being called upon to do anything *to* anyone else. It finally dawned on me that the point was to include both singular and plural doers: "who" meant one person (corresponding to the Swedish "vem"), and "whom" a group (Swedish "vilka").

Variation in who/whom use in English must be troublesome for speakers of other languages. I can certainly see that speakers of a language that marks number differences on WH pronouns would strive to find such a distinction in English. Very clever, though just wrong.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:16 PM

Data Catalysis

I'm back in Philadelphia, after a quick jaunt to Kyoto for ISUC2007. One of the most interesting presentations there was Patrick Pantel's "Data Catalysis: Facilitating Large-Scale Natural Language Data Processing":

Large-scale data processing of the kind performed at companies like Google is within grasp of the academic community. The potential benefits to researchers and society at large are enormous. In this article, we present the Data Catalysis Center, whose mission is to stage and enable fast development and processing of large-scale data processing experiments. Our prototype environment serves as a pilot demonstration towards a vision to build the tools and processing infrastructure that can eventually provide level access to very large-scale data processing for academic researchers around the country. Within this context, we describe a large scale extraction task for discovering the admissible arguments of automatically generated inference rules.

Imagine astronomy if all the large telescopes were owned by private companies and used to develop trade secrets; or particle physics if all the accelerators had a similar socio-economic role. Fernando Pereira used that analogy a few years ago to describe the emerging situation in computational linguistics.

Patrick's idea, as I understand it, is not to create yet another supercomputer center. Instead, his goal is a model that other researchers can replicate, seen as a part of "large data experiments in many computer science disciplines ... requiring a consortium of projects for studying the functional architecture, building low-level infrastructure and middleware, engaging the research community to participate in the initiative, and building up a library of open-source large data processing algorithms".

Posted by Mark Liberman at 07:33 AM

Threeers

Barbara Partee writes:

From this Sunday's Week in Review, Corrections:

A graphic last Sunday about health and safety issues misspelled the name of a new diagnosis for the shoulder pain caused by playing tennis on Wii, the video game console. It is acute Wiiitis, not Wiitis.

My father used to take pleasure when from his office window at the Glenn L. Martin Company (later Martin, then Martin Lockheed, then Martin Marietta, if I've got it straight) in Middle River, Md., he would occasionally see three trains simultaneously passing, and even greater pleasure in describing those happenings as "threeers", challenging us to try to find another word with three e's in a row. But 3 i's -- unimaginable -- I wish he were still alive to see that!! Wow!

[Update -- Arnold Zwicky joined several other readers in wondering why it wouldn't be "three-ers" (with hyphenation, as in "twee-est") or "threers" (with orthographic haplology, as in "freest").]

[Update #2 -- Geraint Jennings writes:

Any old excuse to quote Ogden Nash's Llama:

The one-l lama,
He's a priest.
The two-l llama,
He's a beast.
And I will bet
A silk pajama
There isn't any
Three-l lllama.*

*The author's attention has been called to a type of conflagration known as a three-alarmer. Pooh.

]

[Update #3 -- Jonathan Falk asks:

Can one go Hawaiiing?

]

Posted by Mark Liberman at 06:17 AM

June 16, 2007

Nearly no: a gnarly knot

My post "Why is 'nearly no' nearly not?" struck a chord with readers -- more than 30 of you sent in ideas and/or evidence. Given the number of notes, I need to apologize in advance for not referencing everyone's suggestions in this morning's narrow blogging window, before I go take the 8:50 JR train to Kansai airport.

The puzzle is why nearly seems not to consort well with negatives, as indicated by the following table of Google counts:

	no one	everyone	everyone/no one ratio
nearly	27.6K	1.29M	47.8
almost	1.06M	1.66M	1.57
almost/nearly ratio	38.4	1.29

I considered and rejected one hypothesis, namely that nearly prefers the positive ends of scales.

Many readers suggested that the problem is alliteration: thus Aaron Toivo asked "Could it be a clash of some sort between the two /n/s?" But several readers also provided evidence to undermine this suggestion: thus Aaron offered Google counts from eight vs. nine, which lend no support to the "avoid alliteration" theory:

	eight	nine	eight/nine ratio
nearly	866K	694K	1.25
almost	623K	557K	1.12
almost/nearly ratio	0.72	0.80

And I can supplement this with data from October vs. November, which suggest that if anything, alliteration is sometimes a slightly attractive force:

	October	November	October/November ratio
nearly	273	3.3K	0.082
almost	10.3K	10.1K	1.02
almost/nearly ratio	37.7	3.06

Other readers suggested that "nearly NEG" is not just dis-preferred, it's really entirely outside the norms of the language, with apparent counterexamples involving a different scope for nearly. Thus Steve Carter offered the examples:

"Thomas nearly never got to ride in his hometown."
"Thomas almost never got to ride in his hometown."

and commented

In the first example, it feels to me more ready to be interpreted as "Thomas narrowly escaped being prevented from ever riding in his hometown." -- though it may easily be followed by "As it turned out, he had a long and successful four seasons' riding and became a local hero."

The second example, "almost never," seems to bind more closely, more strongly suggesting "Thomas seldom got to ride in his hometown."

Two observations on the above:

1) It probably doesn't contribute to the original thrust of your post
2) The cause of the above perception is probably the very fact that "nearly never" is seen more rarely than "almost never".

This may well be right -- perhaps nearly has wide scope in a certain fraction of the web's apparent "nearly NEG" examples, making that pattern ever rarer than it seemed to be. Unfortunately I don't have time this morning to check.

Joseph Pentheroudakis wrote:

I thought I’d check what happens with similar modifiers like “just about”, “virtually”, and “literally.” I tried them in the context ‘___ everything/everyone/everybody’ and ‘___ nothing/no one/nobody’. Here are the results:

every(thing/body/one) no(thing/body/one) pos/neg ratio

just about
3.318M

97K

34.2

nearly
2.55M

272K

9.3

virtually
2.288M

1.51M

1.5

almost
3.98M

2.78M

1.38

literally
280K

320K

0.8

Two items very strongly associate with positive polarity then (‘just about’ and ‘nearly’), and the rest sort of swim in the same end of the pond, compared to the first two anyway.

That's interesting. And most of the examples of "nearly NEG" in his compilation are from "nearly nothing" (226K out of 272K), which may reflect the fact that Norma Loquendi doesn't seem to object much to "nearly zero":

	zero	one	zero/one ratio
nearly	873K	1.26M	0.69
almost	1.07M	1.27M	0.84
almost/nearly ratio	1.23	1.01

If we eliminate the -thing examples from his table, the split gets even bigger:

	every(body/one)	no(body/one)	pos/neg ratio
just about	1.778M	27K	65.8
nearly	1.43M	46K	31.1
virtually	1,078M	400K	2.70
almost	2.46M	1.56M	1.58
literally	96K	50K	1.92

Alan Wechsler suggested:

I have a theory, though I don't know how to test it. Theory: in any context that implies a numerical assessement, "nearly" means "a number slightly less than". This leads to discomfort in contexts where the numerical measure is zero, because it implies a value less than zero.

Oh, I guess on second thought I do have a test case. I don't like "nearly free", in cases where "free" means "available for no cost". This is hard to count on Google because there are other senses of "free" which accept "nearly" quite nicely: "nearly free from deal", "nearly free of dabblers", and so on.

This makes sense -- except that the counts cited earlier suggest that "nearly zero" is not much of a problem. (Though maybe some of these are cases in which a function is approaching zero from below?)

Ben van Heuvelen suggested a cognitive explanation, with a bit of a theological twist in the tail:

Here's a little conjecture about what's going on:

"Nearly" is a more concrete word than "almost." Both adverbs are used as degree modifiers, but "nearly" entails a slight metaphor, since the adjective and preposition forms of the word ("near") suggest physical proximity. It's impossible to use "nearly" without subtly invoking physical space. For example, my understanding of the sentence "They were nearly happy" is informed by my previous understandings of sentences like "You are near the supermarket."

(True, you can also use "almost" when talking about physical space -- "You are almost at the supermarket" -- but to do so you have to add a preposition. There isn't a metaphor built into the word.)

My theory is that we tend to rely on "almost" when the idea we're conveying is more abstract, something we can't easily picture. "Nearly everyone" is easy to picture (a big crowd), as are the "nearly worthless" things I encounter every day. "Almost no one" is almost impossible to picture (an empty space? a space filled with a couple semi-translucent bodies?), while an "almost priceless" object is a logical impossibility. In the latter two cases, we rely on the more abstract adverb, "almost."

In response to your admonition -- "Don't bother to tell me that priceless is like unique and shouldn't get any sort of degree modification at all" -- why not? It's a relevant question: after all, a word that illogically receives degree modification becomes harder to picture.

Consider the case of "good" versus "evil." I know many people who are mostly (but not entirely) "good." I also know some people who tend more towards "evil." But if I'm describing the latter, I'll probably choose a narrower word (like "thoughtless" or "cruel") rather than "evil," which suggests something absolute. Call me a hopeless optimist, but I think it's much easier to picture someone who's "nearly good" than someone who's "nearly evil." A quick Googling suggests I'm not alone:

good evil good/evil ratio

nearly
144K

2.19K

65.7

almost
178K

39.9K

4.46

almost/nearly ratio
1.24

18.2

Finally, as for the occasional uses of "nearly no one" by "otherwise competent" writers: I think they're probably just enjoying a little alliteration.

This might be right -- but it seems easy enough to me to come up with a concrete visualization of a room with almost/nearly no one in it -- it's a situation very near to that of an empty room, but with a few people sprinkled sparsely around. Isn't the sparse sprinkling of people a concrete-enough visualization of "nearly no one"? And neither I nor the net seem to have any real trouble with "nearly all gone"; "almost all gone" has only twice as many, compared to more than 38 times as many for "almost no one" vs. "nearly no one".

The facts about evil, seem similar to whatever's going on with other negatively-evaluated end-of-continuum words like hungry and thirsty and tired. These don't seem to be especially abstract or hard-to-visualize concepts, but they show a somewhat similar pattern, though it's hard to find clear opposing terms to test them against (e.g. full means lots of things besides not hungry):

	full	hungry	full/hungry ratio
nearly	992K	204	4863
almost	1.51M	3.47K	435
almost/nearly ratio	1.52	17.0

So we've placed the pattern in a larger context, but I'm not convinced that we have an explanation yet

Posted by Mark Liberman at 05:54 PM

Obscure words

I just encountered a word that I don't think I've ever seen before, though it took only a moment to realize what it meant. The word is scacchically.

It is the adverb derived from scacchic "of or pertaining to chess". It appeared in the sentence:

The queen is a piece I recognize, and so is the knight, but what, scacchically, is a "rushdie" and how does it move on the board?

which is a humorous comment on a report to the effect that the Queen has knighted Salman Rushdie. Scacchically means "from the point of view of chess" or "as it pertains to chess".

The adjective scacchic is presumably borrowed from Italian scacchico since it is otherwise difficult to account for the specific form it takes together with the insertion of the <h> following Italian spelling conventions (to keep the sound of the <c> a hard [k] rather than the soft [tʃ] it would be immediately preceding <i>).

Posted by Bill Poser at 03:24 PM

June 15, 2007

Semantics and Pragmatics, a new LSA journal

Together with MIT linguist Kai von Fintel (he of Semantics etc.), I'm starting a new peer-reviewed, open access journal to be called Semantics and Pragmatics. We made the official launch announcement at the SALT conference last month, will open for submissions later this year, and will go into production in early 2008. We hope to match other top journals in the area for quality, beat them if we can on submission-to-publication turnaround time, and beat them hands down on price. Our annual subscription rate will be permanently set at half that of Language Log. And vice versa.

Our big news today is that the Linguistic Society of America (LSA) has accepted Semantics and Pragmatics as an LSA affiliate journal, as part of their new eLanguage initiative. We're very excited about this development! You can read the full story, and give us your comments, on our editors blog.

Posted by David Beaver at 07:08 PM

It's interpretational

Just in case you haven't already seen it, here's Senator Mike Gravel's new campaign ad, which is certainly the most original piece of political semiotics on display so far in the 2008 presidential season:

According to Eric Kleefeld at tpmcafe, Gravel's press secretary, Alex Colvin, explained the ad as "an expression of Mike Gravel":

Where he's coming from is that, it's less about him coming across with a heavy political message in this video, as much as it is the message of the impression the viewer will have, looking at him.

What will that impression be? Well, according to Colvin, "it's interpretational". I'm going to remember that answer -- it could be useful in a wide variety of circumstances.

I believe that Gravel's ad is a first in political rhetoric -- but in music, it's been done, in John Cage's 1952 work 4'33". Here's the score:

If you're not familiar with the work, there's a virtuoso performance by Frank Zappa available on a 1993 recording. (In fact, you can listen to samples of five different performances on the amazon.com website.) A video version of a different performance, with Japanese subtitles, is available here.

A few years ago, there was an infamous copyright dispute involving Mike Batt and the John Cage Trust, in which (according to "Silent music dispute resolved", BBC News, 9/23/2002):

Batt, who had a number of hits in the 70s with UK children's characters The Wombles, was accused of plagiarism by the publishers of the late US composer John Cage, after placing a silent track on his latest album, Classical Graffiti which was credited to himself and Cage.

I suspect that this was a publicity stunt, though I have no inside knowledge one way or the other. Gravel is already getting plenty of play for his silent-ad stunt, but perhaps to keep the discussion going for another 15 minutes, his campaign could arrange for the John Cage Trust to threaten him with a lawsuit for unauthorized use of a substantial portion of 4'33" as background music in the ad.

Posted by Mark Liberman at 06:44 PM

Outage this morning

We had another file system issue this morning on the mighty Language Log server. Since I'm still in Japan, we might have been off the air for the weekend; but Partha Talukdar and Chris Cieri responded to an email cry for help and performed some emergency repairs. (Thanks, guys!) I guess it's really time for me to take some steps to find us a more suitable home.

Posted by Mark Liberman at 06:42 PM

Citation Plagiarism?

We have discussed a number of cases of plagiarism here on Language Log, but there is a putative type of plagiarism that we have not yet considered. Plagiarism normally involves either the unacknowledged borrowing of someone else's idea or the unacknowledged borrowing of someone else's words. A third kind of plagiarism is, however, occasionally mentioned, namely the citation of a reference without acknowledging that it came from another source. If author Jones reads a paper by Smith and thereby learns of a paper by Doe and cites Doe without mentioning that he owes the reference to Smith, he has committed this kind of plagiarism, if plagiarism it be.

This type of plagiarism has received some attention recently because of its role in a very public dispute between De Paul University historian Norman Finkelstein and Harvard law professor Alan Dershowitz. Finkelstein, a radical critic of Israel and of Jewish support for Israel, accused Dershowitz, known as a civil libertarian and defense counsel for a number of celebrities as well as for his advocacy of Israel, of not being the true author of his book The Case for Israel and of having used citations from other sources (in particular, from Joan Peters' From Time Immemorial), without acknowledgment. Finkelstein's complaint triggered an investigation by Harvard in which former President Derek Bok concluded that no plagiarism had taken place. The dispute between Finkelstein and Dershowitz has attracted considerable attention, with threats of legal action, comments by public intellectuals, including Finkelstein supporter Noam Chomsky, and an unusual public debate about Finkelstein's tenure case, with critics arguing that his work does not meet the standards of professional historians and is no more than political advocacy and supporters arguing that he is a legitimate historian persecuted for his unpopular stance. Just a few days ago Finkelstein was denied tenure.

The only treatment of citation plagiarism that I am aware of is the brief discussion on pp. 14-16 of Judge Richard Posner's Little Book of Plagiarism. He says that it is a common practice because the consequences are "too trivial to arouse much ire" and because it is very difficult to detect, but leaves open the question of whether it is a venial form of plagiarism or not really plagiarism at all.

References serve a number of purposes:

They provide authority for the statement cited.
They allow the reader to check the accuracy of the citation.
They allow the reader to obtain further information about the cited point.
They point the reader to a potential source of additional references.
They demonstrate that the author is aware of the source.
They give credit to the originator of the idea or words.

The author who obtains a reference via an intermediate source and does not alert the reader to this fact does not thereby fail in respect of any of the above. In all of these respects, the reader obtains precisely the same benefit from the reference. Furthermore, the source of the words or ideas receives the credit for them.

What we cannot tell from such a citation is whether other authors have cited the same work and how the author came across it. So what? Whether other authors have cited the same work is irrelevant (unless the work is a meta-level survey, in which case the citation would be part of the data, not a simple citation at all). There is no reason that the author should provide such information. Similarly, how the author came across a citation is not something in which the reader normally has any interest since it is not relevant to evaluating the author's argument and evidence nor to understanding what the author has to say. It is true that we can't tell whether the author went to great lengths to learn of the existence of the cited work or received the citation on a silver platter, but we don't need to know that, and the conventions of scholarship clearly do not require that the author provide this information. It is in general impossible to tell whether a given citation was obtained from a reference in another paper, by thumbing through journals on library shelves, from a bibliography, from a web search, by word of mouth from a colleague, or by a research assistant using any of these techniques. An author may acknowledge assistance from a research assistant, librarian, or compiler of a bibliography, but this is rarely done for individual citations and, if the assistance was not of an unusual kind or magnitude, he may not acknowledge it at all.

I have assumed so far that the author actually reads all of the references that he cites. What of the case in which he merely recites a reference found in another work? (Some of the charges brought by Finkelstein against Dershowitz were of this type.) Occasionally the fact of the citation is the point and there is no need to look it up, e.g. when the author merely seeks to show that another author was aware of the cited work. In other cases, the original may be unobtainable, or more difficult to obtain than is worthwhile, if, for example, it is relevant only to the history of an idea and the current work is not historical in focus. In this case, the normative practice is to indicate that the citation was found in another work. But what if the author fails to indicate this?

Here again, all of the functions of citation are fulfilled, though in some cases imperfectly. There is no deception as to the origin of the material in the paper. To the extent that the reader regards the author as a reliable interpreter of other work in the field, the reader cannot be as confident of the correctness of what is said if the author does not have first hand knowledge of it and may, for example, be more inclined to check the source himself, but depriving the reader of this information, though a small sin, is not plagiarism. Similarly, a citation of a work that the author has not actually read may mislead the reader into overestimating the author's capabilities, e.g. to understand work in a certain area or to read a certain language, but since this deception is not about the origin of the ideas or words of the text, it is not plagiarism. Furthermore, there is no clear standard in this area, and hence no justification for the reader to rely on such inferences. If, for example, the author commissions a translation of a work in a language that he does not understand, although some authors will mention this and acknowledge the translator, as far as I can tell this is not a routine practice and there is no established ethical requirement to do so.

In my view, then, citation plagiarism is not plagiarism at all. Research assistants, laboratory technicians, systems administrators, programmers, students, librarians, bibliographers, colleagues, other authors, friends, lovers, relatives, pets, plants, ambient deities and suppliers of favorite ingestible substances may well deserve more credit than they receive, but that is a different matter.

Posted by Bill Poser at 01:38 AM

June 14, 2007

Now it's time to play our game

So, yesterday, I asked which of the following four instructions was not like the others:

English: Work into lather with a little water.
German: Mit etwas Wasser aufschäumen.
French: Faire mousser avec un peu d'eau.
Spanish: Producir espuma con un poco de agua.

As one would expect from the sophisticated beings that are you LLog regulars, many people had interesting thoughts about what might make one of these sentences stand out from the crowd. I had one particular, very syntactic, dimension of difference in mind, which a lot of commenters also spotted, but there were (of course) several other possibilities mooted.

Some people pointed out ways in which one or the other language's lexical choices stood out. For instance, peter berry writes:

All except the German have "a little" translated directly.... If the translators had done the same for German it would be "ein bisschen", but instead we have "etwas".

And terrycollman notes:

... the English one uses "lather", not "foam" - a lather, surely, is thicker and heavier than a foam ...

Both perfectly true, and both touching on the thorny problem of lexical translation equivalents: A given word in language A in context a and b may not have a direct translation in language B at all, or may be idiomatic only in context a but not context b. Or language A might split a given semantic field into two subcategories where language B only bothers with a name for the supercategory.

Another kind of difference, more syntactic but also not what I had in mind, noted by jack lecou, among other, is that the word order of the German example stands out. The PP Mit etwas Wasser comes first, and the verb aufschäumen at the end; in all the other languages, the verb comes first, a phrase related to the foam next, and the PP last.

One pretty subtle semantic difference that had definitely not occurred to me, noted by miked (who remarks, "IANALinguist", but who could be), is that, in the English sentence,

...whether the lather already exists could be ambiguous (ie, mix product into an existing lather by using water (say as a solvent) versus using just this product and some water to create a lather.

Given the English sentence out of context, one can imagine that there might have been previous instructions to create an independent lather, e.g. with some soap, and then sprinkle the powder into it and mix it in, incidentally using a bit of water. This interpretation isn't possible for the French or the German, since the 'foam' bit is contained within verbs in both these two cases, and definite reference is famously not possible within lexical items; nor is it available for the Spanish since the instruction is not ambiguous between a producing-foam interpretation and a do-something-to-existing-foam interpretation, because it explicitly uses a verb meaning 'produce'.

People also noted two different kinds of morphological differences. An anonymous commenter thought the stand-out language was:

German, for instructing you to "foam" rather than to "make to foam" "produce foam" or "work into lather."

Andrew referred to the hardworking German verb aufschäumen as a "synthetic causative with an incorporated result", but don't mind him; he's been as overexposed to linguistic terminology as I have.¹

The second-most-identified difference is also morphological, and, maddeningly, caught me a bit off guard. In every language except English, the verb form is not imperative, but infinitive! I'd noticed that for the French and Spanish, enough to change 'imperative' to 'instruction' in the LL version of my original post (but, sadly, not in the Heideas version), but didn't think too much about it. It should have been reflected in my glosses, though. If I'd been glossing the English, the verb would have come out something like, 'work.IMP', and, properly glossed, the French, German and Spanish verbs should have been 'make.INF', 'foam.INF' and 'produce.INF', respectively. (Indeed, the French should have been 'make.INF foam.INF', definitely not, 'make to.foam', which implies that the faire and the mousser are not the in the same form.) Since the bare infinitive and the imperative are indistinguishable in English, though, the distinction was not reflected in the glosses. My bad!

Anyway, that's a really obvious way in which the English stands out from the other languages. And it's really quite interesting! I had never really consciously recognized this use of the infinitive in French, though I've read enough French product directions and recipes. And it's certainly interesting that it's common to French, Spanish and German but not English! Commenter Sus speculates:

I think it's because we have the "du vs Sie" problem. If we were to use the imperative in manuals and instructions, we'd constantly have the trouble of having to decide the age group and status of our customers... But I'm just thinking aloud here.

Certainly the availability of grammaticized formal vs. informal forms of address is one way in which French, Spanish and German pattern together and differ from English, so this makes sense as a possible explanation, though I would have thought the natural thing to do in anonymous instructions would be to just use the formal form. A Spanish-speaking anonymous commenter notes:

Recipes and product instructions normally use the infinitive (producir espuma) or a passive voice (prodúzcase espuma). The imperative (produzca espuma) seems very intrusive to me, and definitely feels like a literal translation from American English.

I'll have to check with a buddy who wrote her dissertation on imperatives to find out if this difference has been discussed in the literature.

The difference I was actually fixated on, though, was the one picked up on by the first commenter (and many subsequent ones), gregates, who wrote:

It seems to me it's the Spanish variant which is different, on the grounds that the word for "foam" is the direct object of the verb, whereas in the other three the object is left unspecified, with the context filling in the product itself as what one is to make into foam (up-foam?). So the other three tell you what to do with the product, whereas the Spanish version seems to leave the product out of it.

Well spotted! (If only the LL would fund a prize... but I'm afraid the whole discretionary budget is gone on the sherry in the senior writers' lounge. Some kind of Oxbridge thing. You could have one of my old conference name-tag holders, if you want.)

In fact, in each of the the first three sentences, there's an implicit direct object, however they might otherwise differ from language to language. It would be grammatical to insert a direct object noun phrase -- 'this powder', 'it', whatever -- into the sentence in the appropriate spot in all three cases. In the English case, it'd be the object of 'work'; in the German, it'd be the object of the causative verb 'aufschäumen'; in the French, it would be the structural object of the whole clause but the semantic subject of the intransitive 'mousser'. In the Spanish, however, no additional direct object is possible; 'espuma' is the direct object of 'producir'. The reader has to infer that they're supposed to use the powder to produce the foam; syntactically, there's no spot for it in the clause.

The issue of objects and their potential absence has been extensively discussed before here at LL in the past, e.g. here, here, here, and here. This 'recipe' scenario is one of the most well-known object-drop contexts, and clearly has its object-eliminating effect in at least three languages in which object drop is normally not too common. It would be interesting to know if this context can eliminate objects universally. Particularly, I'm curious about what happens in this context in polysynthetic languages, where object marking is generally obligatory. I bet you need some object marker even in this context. I'll see if I can find out and report back.

The reason this set of sentences caught my eye in the first place was that several brands of syntactic/lexical semantic analysis (including my own) would assign a broadly uniform syntactic structure to the English, German and French sentences, emphasizing their common causative and resultative natures. It would differ in its surface particulars from language to language, but not much in essentials. As I ran my eye over the different instructions with a particular kind of analysis in mind, comfortably going, mmm, yep, oh, sure!, running into a different structure for the same message brought me up a bit short. That Spanish sentence would get a different kind of analysis.

Comments?

¹ Commenter Sus provides some nice examples indicating that the auf in aufschäumen is one of them there nifty separable prefixes, being pulled apart from the schäumen in the finite forms Ich schäume auf, 'I foam up', du schäumst auf, 'You foam up'...

Posted by Heidi Harley at 06:37 PM

Montana's mountains and creeks in the news

The Language Log Rocky Mountain reporter's office is located in an elegant cubicle right next to Eric Bakovic's desk under the stairwell at Language Log Plaza. He thinks he was stuck there because he represents callow youth (aka Youth and Popular Culture). I suppose the staff in the penthouse offices think that nothing about language is worth reporting from my isolated base out here in Montana. [Sidenote: It's curious though that the management put the youngest logger down here in the basement right next to the oldest one -- just the stuff needed for an age discrimination suit. Hmm.]

One day at the Plaza water cooler, Geoff Pullum commented that nothing interesting ever happens out here in Montana. Well, I'm here to tell him how wrong he is. STUFF happens here -- BIG STUFF. The Missoulian (I'm sure you read this newspaper daily) has reported that one of the state's finest mountain ski resorts is changing its name from Big Mountain Resort to Whitefish Mountain Resort. In the print version the CEO, Fred Jones, said: "Quite honestly, the Big Mountain name has been very confusing outside the region. There's a whole host of big somethings out there." So take that, you effete loggers on our east and west coasts! We know how to rename things here.

But, unfortunately, there are still a lot of places to rename. Big Mountain (it's actually the name of a mountain) isn't the only bland name in Montana. When I retired and relocated to this state, after spending the previous 30 years in Washington, DC, I noticed how very vanilla the place names are out here. For example, I live near a premier trout fishing stream called Rock Creek, not a very exciting name. Montana has 9,442 place names with the word, "creek," attached. Spring Creek is said to be the most common one. Others in the top twenty are Rock Creek, Cottonwood Creek, Ash Creek, Fish Creek, Bear Creek, Deer Creek, Trout Creek, and so on. Generic. Dull, dull, dull. Very seldom do you find more interesting place names, such as Balm of Gilead Creek and Maid of the Mist Creek. But there is a hopeful handfull of mysteriously bold and frightening ones, like Damnation Creek, No Business Creek, Starvation Creek, and Poison Creek. For reasons hard to fathom, we also have Octopus Creek, Alligator Creek, and Sauerkraut Creek. Alas, these place names don't give even a hint of the Native American history of the state, even though there are seven Indian reservations here.

So, renaming Big Mountian Resort to Whitefish Mountain Resort indicates clearly how much Montana is really on the move in the onomastics biz.....And that's the outpost news for today. Maybe it will get me a new desk lamp.

Posted by Roger Shuy at 05:52 PM

Plagiarism and Copyright

Geoff Pullum's enlightening (and entertaining, as usual) post about the differences between plagiarism and allusion makes it clear that plagiarism is intellectual dishonesty of a type that roils most educators. What he says pretty well matches my experience with my own students who plagiarized. I've also had a little experience with such borrowing in the commercial world, however, and his post stirred up some thoughts about the similarities and differences between plagiarism and copyright infringement.

First of all, plagiarism is a moral and ethical problem while copyright infringement is also a legal offense. But the issues seem to be rather similar. Both plagiarism and copyright cases focus on the uses of words and sentences that are the same as (or very close to) those in the source document. Both involve borrowing of language and ideas that are not one's own without proper acknowlegement. Both require somebody to point out the passages where the alleged borrowing took place. Both involve someone making judgments about how much borrowing is enough to be problematic.

In the few cases of plagiarism I've experienced, the matter was settled by a heart-to-heart talk with the student, followed by a much- lowered grade. Seldom has the issue led to a trial-like hearing in which the students try to defend themselves and bring in a lawyer or expert witness, although this sometimes happens. I've been such an expert witness at two such hearings at Midwestern universities and believe me, they were not anything like copyright infringement trials. Nor were these hearings based on specific, indentifiable legal concepts, such as the ones that underlie copyright infringement cases.

At trial, copyright infringement uses the legal concepts of proportionality, substantiality, origninality, and substantial similarity. It's not only the amount of previously published material that matters, but also the proportion of the borrowed work in relationship to the original source. Amounts and proportion are not the same things. Even though the borrowed work may be significantly shorter than the original, amounting to as little as 5% of the ideas, words, or other measures of the source document, this 5% can reflect as much as 50% of the source's ideas or other measures. Courts struggle with this issue but sometimes it's pretty clear. For example, I worked on one copyright infringement case in which the publisher of a 111 page book claimed that 100% of the 55 major ideas in that book were found in the defendant's 4 page pamphlet, very impressive proof of proportion without even getting into the specific similarity of the language used.

Substantiality seems to involve quantity -- a lot versus a little. Here we get a bit fuzzy. Using the case I worked on, 100% seems like a substantial amount of the major ideas, but would 10% meet the plaintiff's goal? And how many identical words or expressions are enough to be considered substantial?

The concept of originality has an similarly vague definition: "some degree of creativity, even a minimal amount." Now we have to measure "creativity" and try to figure out what "minimal" means in a given case. Short phrases generally haven't been included under copyright protection, but converting present tense verbs into the past tense and changing passives to actives make a better case. The defendant pamphlet maker in the case above was a master at this. But it's still a judgment call about how much is enough.

Things don't get much clearer with "substantial" and "substantial similarity." Copyright law says that substantial similarity is present in two works when they are compared in their entirety, including both protectable and unprotectable material, especially when the junior user copies not merely the ideas, but also the "expression of the ideas" in the senior's work. Many find it hard to measure "expression of the ideas."

Ideas, words, expressions, and sentences seem to be the major focus in identifying both plagiarism and copyright infringement. Perhaps because I've worked on so few plagiarism and copyright infringement cases, I've yet to see linguistic concepts such as speech acts, speech act sequencing, topics, and topic sequencing analyzed by linguists as evidence of borrowing. If you know of any such cases, please let me know.

Posted by Roger Shuy at 11:59 AM

Why is "nearly no" nearly not?

The other day, I had one of those grammatical WTF reactions that you sometimes get from a bit of text as it passes by. I didn't make a note at the time, so I'm not sure of the source, but I remember the phrase: "nearly no one". This phrase is obviously OK for some people -- Yahoo News turns up these examples in its current index:

Dr. Feustel is the Canadian astronaut nearly no one in this country knows about.
Everyone wishes to be loved, but in the event, nearly no one can bear it.
But nearly no one called him on it.
So many women with my condition suffer alone, alienated from their own friends, family, partners and doctors by having something that nearly no one can fathom, let alone treat.

But when I read things like this, my reaction is "No, no! It's "almost no one", not "nearly no one"!

Now, if I were the typically careless sort of prescriptivist, I'd assume, without checking, that my reaction is how English is and always has been and ought ever more to be, and I'd proceed to compose an argument about why my preference in this case is justified by logic, history and/or moral hygiene.

But instead, my reaction is to wonder what's really going on.

(There's probably a whole section on this in CGEL, or perhaps a treatment in a 10-year-old issue of NLLT — but I'm in a hotel lobby in Kyoto, and until they come out with a digital version, CGEL is not part of my travel kit. So I'll just throw a few observations at the wall of the global linguistic village, so to speak, and see what sticks.)

First, although lots of competent writers apparently see nothing wrong with "nearly no one", my reaction does have some popular support. In Google's general index, we have

	no one	everyone
nearly	27.6K	1.29M
almost	1.06M	1.66M
almost/nearly ratio	38.4	1.29

So with everyone, almost is 30% commoner than nearly -- but with no one, it's 3,840% commoner. Why?

My first thought was that it had to do with the positive versus negative ends of a continuum. But that's not right, because both my intuitions and Google's index show that "nearly empty" and "nearly worthless" are fine:

	empty	full	worthless	priceless
nearly	617K	998K	89.6K	572
almost	721K	1.4M	138K	20.1K
almost/nearly ratio	1.17	1.40	1.54	35.1

In fact, the pattern here is in the opposite direction, raising the (additional?) of why "nearly priceless" is so rare in comparison to "almost priceless" (and please, don't bother to tell me that priceless is like unique and shouldn't get any sort of degree modification at all...).

A better guess seems to be that nearly tends be uneasy when asked to modify overtly negative words like no, never and none:

	never	always
nearly	66.8K	1.95M
almost	1.32M	9.87M
almost/nearly ratio	19.8	5.06

	none of	all of
nearly	20.1K	1.26M
almost	769K	3.42M
almost/nearly ratio	38.3	2.71

But nearly does modify negatives, in the work of what otherwise seem to be entirely competent writers of English:

Thomas nearly never got to ride in his hometown.
Edward has gone through life protecting himself from the drabness of reality with a shield of stories, nearly none of which Will believes.
In fact, the Pirates have nearly no minor league players ready to contribute.

As usual, the more of these examples I read, the better they sound. But still...

In the whole history of English literature as indexed by LION, "nearly no" occurs once in poetry and once in drama.

The poetic hit is in Richard Hugo's The Right Madness on Skye, 1980:

73 Are we on course again? Good. Isle of Skye, right?
74 This the day of my death. Only feigned tears, like I ordered.
75 Make sure the flowers are plastic. Five minutes, remember,
76 piper and drum. Tell the nearly no mourners remaining
77 I was easy to mix up with weather. The weather
78 goes on. Me too, but right now in a deadly stiff line.

The one hit in drama is from a note To The Reader in the back matter of M. G. Lewis's 1798 The Castle Spectre:

To originality of character I make no pretence. Persecuted heroines and conscience-stung villains certainly have made their courtesies and bows to a British audience long before the appearance of "The Castle Spectre;" the Friar and Alice are copies, but very faint ones, from Juliet's Nurse, and Sheridan's Father Paul, and Percy is a mighty pretty-behaved young gentleman with nearly no character at all.

By contrast, "almost no" occurs in 56 poems, 18 dramas, and 46 prose works.

So to sum up, I'm clearly not alone in feeling that nearly doesn't mix with negatives; on the other hand, there seems to be a minority that disagrees. If you think you know what's going on here, please let me know.

Posted by Mark Liberman at 04:06 AM

Miffled

It's rather unclear what my job is as the only member of the staff at the Youth and Popular Culture (YPC) desk here at Language Log Plaza, but if I had to guess (and I often do), I'd say it was to comment on youth and popular culture, but instead I seem to just be around for the senior writers to accuse me of mischief, to make fun of my speech, to expose my bad puzzle skills, to spy on me while I'm doing research, to accuse me of picking fights, etc.

I should have paid more attention when I applied for the position. When they asked me if I "already owned" a TV, a PlayStation 2, or an iPod, I didn't realize that they were just hoping I had one or both so that they wouldn't have to buy these "luxury items" for my research (or, as Senior Writers Pullum & Zwicky always say to me, "whatever it is that you do in that windowless space we'll call 'an office' to amuse ourselves").*

This wouldn't be so bad if it weren't for the fact that the powers-that-be at LLP wouldn't even give me an allowance for a cable TV hook-up at the office, which combined with the fact that I spend most of my waking hours here means that I miss, oh, I don't know, a good two-thirds to four-fifths of popular culture while it's happening. And this is how I missed the series finale of The Sopranos on HBO two nights ago.

Mind you, I still haven't seen it. But I hear that the writers are taking liberties and just making words up (or perhaps just using them incorrectly). The word that has the Sopranos bloggers a-buzz -- apparently uttered by Tony in reply to Paulie in the series finale -- is "miffled".

I'm not much of a word sleuth, but a little digging reveals that "miffled" has been around for a few years, with various different apparent meanings, some of which coincide with several people's feelings about its meaning in the context of the Sopranos episode -- a blend between "baffled" and "miffed".

Here's the earliest example I've found in my admittedly very superficial digging. It's from August 2001, where it seems to have the meaning of "fiddle".

Tried MP3's and videos, nothing made sound. After that I have miffled around and after uninstalling and reinstalling and rebooting I still can not get a sound even though the Device manager in System properties states that the card is functional.

Here's another example from May 2005, where it seems to mean "mixed".

Yeah that's a miffed Bunny making miffled metaphors.

There's also "Miffled" as at least one person's avatar, the origins of which are explained here in July 2005.

Well my user name goes back years. Its a sloppy cake of all the silly things I say. I have many names must of them around Maffle, Faffle, Smiffle.. etc' But Miffle is from what I say from time to time in a different chatting area, my tummy box is miffled.
A few times here I have said things are miffled aswell!

(I wonder if this is the same person ...)

Of special interest, perhaps, to Language Log readers is this example from March 2006, where it appears to just mean "miffed". (The topic? "English grammar is chaotic".)

That's not an opinion; for better or worse many would-be reformers do naturally base their reforms off their own dialects. There's nothing to get miffled about.

Here's one from July 2006, where it appears to mean "baffled" + "miffed" as in the example from The Sopranos.

I am somewhat miffled by the reaction so far in regards to this new message board structure by yhoo. IMHO it is signifcanlty better with threaded view as I do not have to waste time with post who’s content was worthless.

But just last month, someone explicitly analyzed "miffled" as "miffed" + "muffled"

Z.Stardust: Please try to type in proper English. :) People generally get a little miffled (is that being used in the right context?) with txt talk.

Cold as Ike: You mixed miffed with muffled, star =P. It should be miffed.

Z.Stardust: I personally like miffled better, but thank you. :p

This next one is undated, but is a good self-reflective example.

"Um... After the fight... I don't carry my check book on this flight suit." Quatre sweatdropped slightly. Shannon looked miffled. Yes, miffled.

Just for kicks, I checked out what UrbanDictionary.com had to say. As of this posting there is no entry for "miffled", but two unrelated entries for "miffle" from Aug. 2004 and Oct. 2003.

So did the Sopranos writers make the word up? Maybe, but there's at least one small piece of evidence for its use in roughly the same sense from almost a year ago. In any event, now that it's been broadcast in the middle of a major cable-television event (that I must emphasize I had to miss)**, nobody better try to pass it off as a new word at the Open Mic in Chicago tomorrow.

[ Hat-tip to Kate Davidson. ]

[ Comments? ]

*Note to self: finish extended diatribe about the non-recycled-water fountain just installed in main outdoor lobby of LLP in time for summer.

**Please send financial support to: Young Bakovic's YPC Desk Cable Fund, c/o Language Log, Closet Under the Stairs, Language Log Plaza. Generous donations of $1000 or more are completely tax-deductible and come with a free subscription to Language Log (satisfaction guaranteed).

Posted by Eric Bakovic at 01:58 AM

Confused? Read on...

In today's BBC News online, you can find the headline:

US rejects China currency charge

Why?

Did China accuse the US of messing with its currency?
Did someone find out that low quality American dollars rejected by the US Treasury are being recycled as Yuan?
Is there an excess of positive ions on Yuan bank notes?
Is the US annoyed at the Yuan's overly rapid advance?
Did China accuse the US of not staying up to date?

As it happens, none of the above meanings were intended.

The headline describes a refusal of the US Treasury to accuse China of manipulating trading of the Yuan. Now I'll freely admit that most of the readings I can come up with for the headline are implausible, although it's fun to identify them. (Challenge: how many readings can you find?) But still, in this case and many others it's impossible to identify the correct reading unless you already know the story. While it's obvious that currency manipulation is the crime, it's all but impossible to tell who's accusing who.

Why is it that headlines are so maddeningly ambiguous? Is there an incentive for headlines to be vague or ambiguous in order to draw the reader in? This would run counter to standard claims and prescriptions about what makes good functional language... but that doesn't mean the technique doesn't work.

And is this even restricted to newspaper headlines? When you get down to it, aren't many of our utterances, though dressed up as useful packages of information, in fact just a tease? Nice weather. Is it that time already. Do you have a sister? What are you wearing? Quiet in here, isn't it?

But I digress. I was in the middle of a stock post about ambiguous news headlines, an old favorite of viral emails, intro linguistics classes, and, of course, Language Log (e.g. here). Yet I think few people realize just how common the phenomenon really is. On one single day a few weeks ago (May 24 2007), the BBC News website not only contained the three intentionally groan-inducing headers Bird makes a splash on Bush, NY yellow taxicabs 'to go green', and Genes shed light on fish fingers, but also all of the following apparently 'straight' headlines:

Lack of nurses 'killing Africans'
(I don't get it. Why would we want nurses to kill Africans?)

Polls close in Ireland's election
(Are they really so close? Or did they close?)

Tamil rebels launch naval attack
(Attack on a navy? With a navy? Both?)

US approves pill to stop periods
(Headline writers have obviously been taking it for years.)

Social security for Indian poor
(It's a damn shame that Indian's social security is poor.)

Wal-Mart to sell Dell computers
(Doesn't Dell already have computers?)

Sleepless record man feeling fine
(Did the sleepless try to record a man feeling fine as part of a cure for their insomnia?)

One wonders if headline writers get paid by the meaning.

Posted by David Beaver at 01:43 AM

June 13, 2007

One of these things is not like the others...

Audience participation segment: Instructions from the back of a container of special Japanese 'silk' facial cleansing powder:

English: Work into lather with a little water.
German: Mit etwas Wasser aufschäumen.
French: Faire mousser avec un peu d'eau.
Spanish: Producir espuma con un poco de agua.

I really only have any actual speaking command of two of the languages on this list (English and French), but given that these are all intended as a direct translation of the English instruction, and what with being able to recognize various cognates in the German and Spanish, and having worked on causative and resultative constructions in English and Italian, and you know, heck, being a professional linguist and all, I felt like I had a pretty good handle on what was going on in these four sentences, syntactically speaking, when I first read them. And it seems to me that one of these instructions is not like the others.

Which one? And why?

Post your thoughts in the comments section here. I'll tell you what my take is tomorrow.

(For the less western-europeanly inclined, I've posted interlinear glosses (not translations) after the jump.

English: Work into lather with a little water.

German:
Mit etwas Wasser auf-schäumen.
with some water up-foam

French:
Faire mousser avec un peu d'eau.
Make to.foam with a little of water

Spanish:
Producir espuma con un poco de agua.
produce foam with a little of water

Posted by Heidi Harley at 05:21 PM

If they lose your pants, sue

The language and law desk at Language Log Plaza is pleased to report that you can now sue a dry cleaner who loses your pants. A Washington DC administrative law judge was incensed when a local dry cleaning establishment owned by Korean Americans returned his suit jacket minus his pants. The Washington Post says it all here, or most of it at least. The connection of this startling news with language is a bit iffy, but there are at least three words of linguistic interest here: satisfied, willful, and we.

The plaintiff claims that a sign in the shop's window says, Satisfaction Guaranteed. He wasn't satisfied and said he had no choice but to take on "the awesome responsibility" of suing on behalf of the residents of the city. He brought in witnesses who had similar experiences with the cleaners but, to his dismay, they all testified that they'd be satisfied if they were compensated for the value of their lost or damaged garments and if their bill was voided. So their view of satisfaction wasn't quite what the plaintiff had in mind. He's suing for 65 million dollars, a very different definition of what it means to be satisfied.

The plaintiff also charged the cleaners with "willful and malicious conduct." It's not always clear what the Courts mean by willful. Bryan Garner, in his Dictionary of Modern Legal Usage (Oxford U Press 1995) says that time and again willful means

"only intentionally or purposely as distinguished from accidentally or negligently and does not require any actual impropriety; while on the other hand it has been stated with equal repetition and insistence that the requirement added by such a word is not satisfied unless there is a bad purpose or evil intent." (936)

Was this drycleaner out to get the plaintiff? Or was losing his pants an mere accident or the result of negligence? Will the Courts ever decide on what is meant by willful? Stay tuned.

The Post article also reports that the judge in this case rebuked the plaintiff for referring to himself as "we" during his presentation (he represented himself), saying: "Mr. Pearson, you are not a 'we.' You are an 'I'" Judge Judith Bartnoff probably thought the plaintiff was using the "royal we." She could have been correct in this, but in an admittedly feeble effort to see this from the plaintiff's perspective, I suppose he could have been using the stylistic mannerism of the "editorial we," which implies a collective rather than individual view for, after all, he took on himself the awesome responsibility of representing half a million DC residents. Naah. It was probably the "royal we."

Posted by Roger Shuy at 01:20 PM

Calling all Chicagoland neologizers

I'm off to the sixteenth biennial conference of the Dictionary Society of North America, which gets underway tomorrow on the campus of the University of Chicago. The program for the conference proper (schedule HTML, PDF) would likely interest only the most diehard lexico-fiends, but there's a new event this year that's open to the public and should be a lot of fun. If you're in the Chicago area and you've coined a word that you think deserves a place in the English lexicon, come to the first-ever New Word Open Mic, where a panel of distinguished judges will select the best neologism. It's all happening Saturday, June 16th, from 4:30 to 5:45 in Breasted Hall at the U of C's Oriental Institute. Erin McKean, organizer of this year's conference, has more details on her Dictionary Evangelist blog.

Posted by Benjamin Zimmer at 08:50 AM

June 12, 2007

Plagiarism and allusion

A few days ago I got an email from John McIntyre telling me about his usage blog on the Baltimore Sun's website. He wanted me to know about the blog in case I wanted to attack it (as "Another damn prescriptivist blog", he suggested, though he says he's only a "moderate prescriptivist"). I had a look, and noticed that he hates many common clichés: phrases like at the end of the day meaning "ultimately", and so on. I didn't have much time for lengthy polite correspondence, so I sent him a brief note composed entirely of annoying clichés: I told him that life is short, and at the end of the day you've still got to get up in the morning, and so on. He mailed back to ask, "Are you saying I'm looking up a dead hog's ass?" I still didn't have any time for chat (heavens, here at UC Santa Cruz the academic year is not quite over yet — I'm still grading finals), so on a hunch I just sent him this comment:

I'm saying your blog is like a hundred clowns with bees in their underpants. I expect its lowering of morale to lead to violence.

What happened then was just what I expected.

Back came a message, within a minute or two, saying: "Ah, another Scott Adams fan." Because, you see, the words in red above are essentially just quoted from Scott's extraordinarily funny Dilbert strip of June 3. And crucially, I had decided that I could be sure John would recollect it and identify it.

That's the subtle line between plagiarism and literary allusion. It's plagiarism if you copy someone's writing and you don't want it to be noticed that you were copying; it's allusion if you do exactly the same but you do want it to be noticed.

If I had hoped Mr McIntyre would not identify the source of my very funny metaphor and would think me responsible for its brilliantly humorous simile, I would not be a brilliantly humorous writer, I would be a dumb and contemptible plagiarist. And if I had thought he would spot the quotation but I was wrong and he did not, I would be in an awkward spot for two reasons: (i) I would have gratuitously insulted someone I didn't even know, and (ii) I would have used someone else's clever humor without admitting it or citing the source, and would thus have put myself in danger of being fingered later as a plagiarist.

But I had judged him right: I took him to be well acquainted with such familiar features of our culture as the Dilbert strip, and I intended him to see that I was quoting, and he did, and I intended him to see that I intended him to see that I was quoting, and he did, and I intended him to see that I intended him to see that I intended him to see that I was quoting, and he did, and... Perhaps it would be simpler if I just cut this (non-vicious) infinite regress short and say that I intended there to be not just recognition of the quote but also mutual recognition of our mutual knowledge state.

I once had to fail ten percent of a 100-student class for plagiarizing. Ten people had copied the same crucial line off the web, without even understanding it, and submitted it as part of their answers to a homework problem set. (Trust me: there were five reasons why it was totally clear that it had been copied letter for letter. The material was technical, and the line in question introduced four things I had not taught, and at one point there was a letter interpreted as an arbitrary label, so any letter would have done just as well, but they had all chosen the same one.) I hauled each one in for the obligatory personal interview in which by university regulations I was required to listen to their side of the story.

They sat there and made their various responses: "I've never copied anything before in my life"; "I didn't realize it was wrong because in the country I come from copying is quite normal"; "I didn't copy anything, I just studied with people who did and I must have picked it up from them unknowingly"; "I can't afford to have a dishonesty charge on my record because I'm applying to law school"; "My dad's a lawyer and I'm going to have you fired!"; "Is there nothing I can do?"; "Do you have any more Kleenex™ brand tissues?"; and so on — as usual, it was pathetic. Then I failed them all and reported them to the disciplinary authorities.

Later I told the whole class very plainly: if those students had quoted the line, given the source, and explained briefly why it was an excellent and very sophisticated solution to the problem I had posed, I would have given them extra credit for finding it and citing it. But they tried to pretend it was their own unaided work. They did not intend me to see it as a quotation, or to google down its source.

And there's another aspect to the matter: If the students had given their source, I would have felt flattered and pleased that they assumed I was smart enough that they couldn't pretend; but instead they insulted me by treating me as someone who was too dull-witted to spot collusion, or too bone idle to use the Google™ search engine to prove it was collusion. They had not just been dishonest, which is a bad enough sin in academia; they had insulted my intelligence, which is an utter no-no to the n^th power with a side order of fries, and will cause me to wreak my awful vengeance. Don't ever insult my intelligence.

On the one hand, plagiarism, with the fairly serious punishments that are generally attendant on it (remember this case). On the other, quotation without troubling to cite the source because of a confidence that the audience will recognize the quotation and interpret it as an obvious allusion. It's a subtle line to draw, perhaps, but no very difficult concept is involved. It's really the same as the difference between wearing a Darth Vader mask because you are dressed up as Darth Vader (and intending to be recognized as someone dressed up as him — not to be mistaken for him), and wearing one in order not to be recognized while you are robbing a bank.

Posted by Geoffrey K. Pullum at 08:16 PM

Urdu Calligraphy Not Quite So Endangered

A few days ago I mentioned a post on an Urdu newspaper in Chennai that was said to be the last outpost of handwritten Urdu. Reader Sydney Mark Heyne has brought to our attention this post, which contends that most Urdu printed works are actually made from handwritten masters and discusses the current state of Urdu printing.

Posted by Bill Poser at 06:51 PM

Secret code sharing

Normally we use language to be forthright and to be understood. Occasionally, however, people use a language code to disguise the meaning of what they're saying. Language is a code in itself, of course -- a formal system of communication shared by its users. But there are also codes within codes, used by people who don't want to share information with outsiders. Such codes deliberately isolate information from others in ways that regularly understood language would not. Codes are used for many reasons, including security (as in times of war), efficiency (as in occupations), intimacy (as in clubs or social groups), or secrecy (as in the prevention of detection). Code sharing, as the term is currently being used, is fine when used within the group that understands it, but when it causes confusion to those outsiders not in-the-know, it can be irritating and troublesome.

Mark Liberman's comments on airline code sharing prompted some memories of past criminal cases I've worked on, when codes were used to prevent any outsiders who might be listening in from understanding what was being said. In the 1983 case of the grandma mafia, three otherwise respectable looking grandmas in the women's clothing business carried out an Asian cocaine operation under the guise of selling "blouses" and "skirts," which turned out to refer to different types and quantities of illegal drugs.

I also thought of a 1988 case in which I was asked by the US House of Representatives, Committee on the Judiary, Subcommittee on Criminal Justice, to review the 1981 audio tapes in a case involving then Federal Judge Alcee Hastings. Co-defendant William A. Borders was convicted in that case but Hastings was not. Nevertheless, the government thought Hastings' participation in those tapes sounded very suspicious. So they brought impeachment hearings against him. In an earlier post. I described some of my analysis in that case.

The codes used in these two cases were partial and disguised. Outsiders were not supposed to understand their intended meaning. Now the airlines' use of code sharing introduces a somewhat different concept of "code" as well as "sharing." The code seems to be legitimately shared within and between airlines and other businesses that use the term. There's nothing particularly wrong with that. The problem is that those of us not in that business and not yet fully aware of its meaning find it confusing, pretty much the way Mark did (by the way, we hope he made his San Francisco transfer). He calls code sharing "a sort of digital-bureaucratic morass of reciprocal failed reference." That's a pretty good description of an unshared code.

It's hard to locate code sharing in any of the four types of codes I listed in my first paragraph: security, efficiency, intimacy, and secrecy. Surely code sharing isn't the product of any perceived airline dangers and it's doubtful that it's an effort at intimacy, at least not intimacy with the customers. That leaves efficiency and secrecy as the best candidates. It's probably efficient, for the airlines at least, and as for secrecy, well, the secret gradually is coming out.

Maybe this is only one more instance of letting the buyer beware. Maybe passengers are expected to figure this out for themselves. Mark had his computer with him and his post gave us the Wikipedia link defining code sharing. For another one, see here. It's a bit more critical than Wiki. But what we usually don't know is what Mark found out the hard way. When airlines code share, what this really means is that passengers can't get seat assignments on those legs of their flights that the originating carrier doesn't fly -- until they reach the gate of the next leg. It also means that the equipment used on the non-originating flight is not always made clear. The airlines don't share that part with us, at least not openly. And they haven't tried very hard to let us in on their little secret, still another step in the continuing reduction of service we get these days.

Posted by Roger Shuy at 05:41 PM

Universal Communication

What's that? Well, the organizers of the First International Symposium on Universal Communication, to be held in Kyoto on June 14-15, explain it like this:

With the development of communication system and device technologies such as optical and wireless communications and IC/RF tags, the ubiquitous information and communications infrastructure has been spreading rapidly, enabling the use of information network systems “anytime and anywhere.” The rapid progress of broadbandization has also enabled transmission of large-volume information, such as ultra-high-resolution images and stereoscopic images. Moreover, the recent emergence of the new communications technologies such as Web 2.0 has triggered changes in social activities.

This covers the ground, as the song says; the scope is broad enough, anyhow, that they invited me to speak. At the moment, I'm doing my part towards "enabling the use of information network systems anytime and anywhere" by blogging from Philadelphia Airport, as I wait for my plane.

I can tell you that communications between USAir and United Airlines, who operate the planes on the first and second legs of my flight to Osaka, are not yet at the stage you would want to call "universal". In fact, they're in a sort of digital-bureaucratic morass of recipropal failed reference. The two companies can collaborate on a sort of referential fiction known as "code sharing", whereby I can buy what purports to be a USAir ticket from PHL to KIX via SFO, although the second leg of the journey is actually a United flight (with a different flight number). What they can't do, it turns out, is assign me a seat on that flight. The helpful USAir representatives told me on the phone, starting a month ago or so, that this is because it's actually a United flight; the helpful United representatives told me on the phone, during the same period, that this is because my ticket has a USAir flight number.

Was there any way to avoid this reciprocal failure of indirect reference? Well, I was told by representatives of both companies, I could purchase the two legs of the flight separately, each from the company that actually operates it. Unfortunately, this would roughly double the price.

Further attempts at communication are promised during my scheduled hour-long layover to change planes in San Francisco. I'll let you know how it turns out.

Assuming that they resolve the conundrum and get me on the airplane to Osaka, I'll be spending about 50 of the next 100 hours in one sort of conveyance or another, so my blogging might be a bit spotty. I'm sure that the rest of the Language Loggers will pick up the slack.

Posted by Mark Liberman at 06:39 AM

June 11, 2007

Contrastive focus reduplication in the courtroom

So far today we've had a post about contrastive focus reduplication and another one relating to linguistic evidence in jury trials. In a bit of Language Log synchronicity, today's news contains a wire story that combines these two themes. The terrorism support trial of Jose Padilla, Adham Amin Hassoun, and Kifah Wael Jayyousi is underway in Miami, and the AP article makes it clear that much of the case hinges on semantic arguments, particularly over the term jihad. The prosecutors want to associate the use of the term strictly with "acts of violence by al-Qaida and other Muslim extremist groups," while the defense attorneys have endeavored to show that "Muslims could perform jihad in many ways other than violent conflict."

Similarly, the prosecution has tried to paint the defendants' use of the word "brothers" as indirect evidence of their involvement in a terrorist network:

FBI wiretaps played in court for jurors contain frequent references to "brothers," which prosecutors say means mujahedeen fighters looking for a battle. Defense lawyers contend the term is a common expression among male Muslims.
"There are mujahedeen brothers and brother brothers," said Assistant U.S. Attorney Brian Frazier in one of many arguments about use of words. "There's more context to the word 'brother' than just a Muslim person."

The defense has also questioned how the FBI has translated the intercepted calls from Arabic into English, since the Bureau's translators sometimes rendered "Allah" as "God" and sometimes left it untranslated. Defense attorneys suggest the use of "Allah" is intended to make the speakers sound "sinister," while one FBI translator testified that she chose "Allah" because she thinks "it's a beautiful word." This sounds like a case in dire need of expert testimony from a linguist or two.

[Update, June 13: On his blog Jabal al-Lughat, Lameen Souag eviscerates the prosecution's claim about "brothers," or at least how that claim is portrayed in the AP article. Lameen notes, however, that additional news coverage explains the prosecutors' larger argument, namely that the defendants spoke in code to each other to mask their plans for violent jihad. For instance, the New York Times reports:

Other calls played Thursday and Friday, as interpreted by Mr. Kavanaugh, focused on jihad activities in Ethiopia, Afghanistan and Kosovo. There was talk of "brothers" who had been "married" — code for killed in battle, [FBI agent John T. Kavanaugh Jr.] said — and of interference by "the dogs," or the United States government.
Mr. Kavanaugh also said a reference to "eating cheese" was code for waging jihad. But he said he had no idea what a reference to a "reservation on the female donkey" meant.

A sidebar to the Times article provides further prosecutorial interpretations, such as "go to the picnic" being code for "travel to an area of jihad." So it's really all about secret code sharing, to tie the case to yet another Language Log post.]

Posted by Benjamin Zimmer at 03:20 PM

Teaching the judges

It might be instructive to those who provide expert witness testimony about language and linguistics to consider the current approach of psychologists who serve as expert witnesses in cases involving eyewitness identification. The June issue of the APA's Monitor on Psychology describes psychology's new thrust to educate judges about such variables as lighting, viewing distance, and the way law enforcement conducts lineups, including the way officers' questions suggest answers.

Judges are the gatekeepers for what happens before and during trials. Part of this gate keeping is to make decisions about whether or not proposed expert witnesses will be permitted to testify. It's a tough job because, in addition to their knowledge of law and the intracacies of a given case, judges have to learn enough about medicine, engineering, psychology, social science, linguistics, and other specializations to enable them to decide whether expert testimony will assist jurors as they decide the cases. Pychology Professor Richard Wise puts it this way:

"The type of legal and judicial training and experience that judges have is not very helpful for them in understanding eyewitness testimony."

Once psychologists succeed in getting their testimony admitted, they then have to present their specialized knowledge to jurors, which is often difficult. But the focus of the Monitor article is primarily about getting past the judge, never easy. Psychology Professor Gary Wells points out that many judges argue that eyewitness information is "within the juror's realm of common knowledge." Judges put the same hurdles before linguists who testify about language issues, especially in criminal trials. If eye witness information is "common knowledge," how much more do judges believe that jurors have "common knowledge" about the way language works?

Judges also frequently invoke the mantra that the proposed testimony would usurp the function of the jury. Opposing lawyers can be expected to say this during hearings about offers of proof and judges seem to rely on it a lot too. But what is this dreaded usurpation? If the experts propose to reach the ultimate issues in a trial, those of determining guilt or innocense, the accusation is absolutely correct. They should never do this. But experienced experts know how to stay away from such testimony. The proper thing to communicate to jurors is important information about the issue that they would not normally be expected to have -- like factors of lighting, viewing distances, law enforcement's suggestive interview questions, and other things that psychologists provide about eyewitness testimony. Experienced linguistics experts exert the same caution, providing the jury with information about the language in evidence -- like the accuracy of transcripts, the identification and resolution of ambiguity, the difference in frames of reference, the potential meanings conveyed, the grammatical scope of negatives, and many other things that jurors don't normally know about.

Another common reason judges use to exclude expert witness testimony is that the attorney could communicate the same information without using the expert at all. The fact that trial lawyers are not trained in the specialized fields of psychology or linguistics seems to go right by such judges. The "common knowlege" argument seems to be here as well.

My favorite judicial objection to experts, however, is that the proposed testimony would go beyond the scope of the claimed expertise. It's hard to know what this really means. It could be another recital about possible usurpation of the juror's function. If not this, it would seem to convey that the judges know enough about psychology and linguistics to enable them to measure the points at which these experts step out of their own fields and into forbidden areas. Again, experienced experts know not to do this, but if they should happen to venture beyond their own fields, competent opposing lawyers will be the first to try to point this out as a way of discrediting the testimony.

Whatever the reasons for excluding experts, the Monitor article seems to be right about one thing -- that the next step for fields that contribute to the resolution of law cases is to better educate judges about what these fields can and can't do to assist jurors as they decide the cases. The past focus has been on educating jurors. Now maybe it's time to help judges understand what we have to offer law.

Posted by Roger Shuy at 02:04 PM

The enveloping Pirahã brouhaha

Back at the end of April, there was a conference in Normal, Illinois, on Recursion in Human Languages. Yesterday, the Chicago Tribune got around to reporting on it, and the article's adrenaline level is so high that it makes you wonder why they waited so long (Ron Grossman, "Shaking language to the core", Chicago Tribune, 6/10/2007):

To get some idea of the brouhaha currently enveloping linguists, occupants of a usually quiet corner of the ivory tower, suppose a high-school physics teacher found a hole in the theory of relativity.

Students of language consider Noam Chomsky the Einstein of their discipline. Linguistics is a very old science, but beginning in the 1950s, Chomsky so revolutionized the field that linguists refer to the time prior to his work as B.C., or before Chomsky.

They may have to add another marker: A.D., after Dan.

This is clearly a Hot Story -- if only it were true. Actually, it's a great story anyhow, but the way that Grossman tells it is, well, kind of misleading. At least, it seems that way to me. I'm not a syntactician, so you should take all of this with a grain or two of salt, but here's a discussion of the issues as I understand them.

Let's start by following the Trib's article a little further:

Daniel Everett, a faculty member at Illinois State University, has done field work among a tiny tribe in the Amazon. He reports that their obscure language lacks a fundamental characteristic that, according to Chomsky's theory, underlies all human language.

The ideas behind it are fairly basic: Some birds squawk and some animals grunt, alerting winged or furry compatriots to danger, but only humans can share complex thoughts.

A Scottish professor illustrated that at a recent gathering with a nursery rhyme: "This is the cat that chased the rat that ate the malt that lay in the house that Jack built."

In those lines, the word "that" is what linguists call a recursive device. Recursion allows humans to link various parts of our experience: to direct others to not just any cat, but to the one that chased the rat.
The device enables humans to pool knowledge and skills, share hopes and ambitions, build sophisticated societies and elaborate technologies.

Everett, however, fired a volley straight at the theory when he reported that the Brazilian tribe he was studying didn't use recursives. "For a long time, I said to myself: 'Maybe if I just hang around the tribe long enough I'll find it,'" Everett said. "But after 30 years, I don't know how much longer I'm going to be able to hang around."

OK, hold that thought. Now read the abstract of Rachel Nordlinger's paper "Spearing the Emu Drinking: Subordination and the Adjoined Relative Clause in Wambaya", Australian Journal of Linguistics, 26(1) 5-29, April 2006. As you do, keep in mind that Ken Hale, whose 1976 paper is so prominently cited, was Chomsky's colleague at MIT for several decades. This includes the period 1972-1975, when I was a graduate student there and Ken was working out the ideas presented in the 1976 paper.

Studies of subordination in Australian Aboriginal languages have been heavily influenced by Hale's foundational paper on the `adjoined relative clause'—a non-embedded, multifunctional subordinate clause type found in Warlpiri and a `large number of Australian languages' [Hale K 1976 `The adjoined relative clause in Australia' in RMW Dixon (ed.) Grammatical Categories in Australian Languages AIAS Canberra: 78–105 at 78]. Since this paper, almost every Australian grammar makes some reference to this clause type, presenting a general picture of structural homogeneity across subordination structures in Australian languages, and leading to the general perception that Australian languages typically don't have syntactic embedding. In this paper I present an analysis of subordinate clauses in Wambaya, arguing that these share many features of Hale's `adjoined relative clause' while still being clearly subordinate. The differences between subordinate clauses in Warlpiri and Wambaya show that complex constructions in Australian languages can be structurally dissimilar while sharing many of the properties of the `adjoined relative clause' type. I argue, therefore, that clause-combining in Australian languages may be more structurally heterogeneous than is traditionally assumed, and that a single analysis for complex sentences across a majority of Australian languages is quite likely inappropriate. This has implications for both the analysis and description of subordination in Australian Aboriginal languages, and for their relationship to the typological literature on subordination more generally. [emphasis added]

So Ken Hale's 1976 paper proposed that "Australian languages typically don't have syntactic embedding", and this idea has been widely accepted for the past 30 years. Ken was saying this sort of thing openly at MIT in 1974 or so, in the classes that I took from him. If this fundamentally challenged the foundations of Chomsky's theory, why weren't there any fireworks? Why isn't Grossman writing about AK instead of AD?

Well, this is partly because Ken Hale was not at all a rebellious or combative sort. But it's mostly because the idea that some languages completely lack clausal syntactic embedding (and thus a fortiori lack recursive clausal syntactic embedding) was perfectly compatible with Chomsky's theories -- until recently.

Through the 1980s and early 1990s, there were versions of Chomskian theory, countenanced within the range of speculation acceptable to his acolytes, in which you could set one or more parameters of the hypothetical Universal Grammar machinery, and get a syntactic system of the type that Hale considered Warlpiri and other Australian languages to have.

This began to change with Chomsky's "mimimalist program" in 1995, which jettisoned the idea of any well-defined layer of syntactic structure in between meaning ("logical form") and sound ("phonological form"), and therefore also discarded many of the parametric switches and knobs available in his earlier theories. And the change became important in the early oughts, when he decided that the human language faculty "only includes recursion". [See "JP versus FHC+CHF versus PJ versus HCF" (8/25/2005) for a discussion of some back-and-forth between Chomsky and others on this point, along with links to the original papers if you want every sanguinary detail.]

By the time that the "only recursion" idea became Chomskian orthodoxy, Ken Hale was dead. His response might have been to reconsider his analysis of Warlpiri, I don't know; but instead, other linguists are reconsidering it for him, and the confrontation between the "only recursion" idea and the analysis of Australian languages is being carried out quietly and implicitly in the discussions of these relatively obscure papers. I predict that this discussion will take a while to come to a consensus. For one thing, it's often not easy to decide whether or not structural embedding is involved in a particular class of examples. I explained some of the reasons in a post from last year: "Parataxis in Pirahã", 5/19/2006. And for another thing, there are many specific constructions in many different languages to consider.

Curiously, Dan Everett's path has been exactly the reverse of this reconsideration. In his 1986 paper "Pirahã" (pp. 200-325 in Desmond Derbyshire and Geoffrey Pullum, eds. Handbook of Amazonian languages), he presented a detailed analysis of Piraha morphosyntax that included many examples of recursive structures. In his recent work, he's reconsidered these ideas and offered new, non-recursive analyses. As he explains in "Cultural Constraints on Grammar in PIRAHÃ: A Reply to Nevins, Pesetsky, and Rodrigues,

As a matter of potential historical interest, I did wonder about my own analysis as early as 1984. Everett (1986) was actually written in 1982 in Portuguese, appearing initially as Everett (1983) and later as Everett (1990). So since I was a Visiting Scholar at MIT during this time, I talked to Chomsky about my idea that there seemed to be very little evidence for embedding of any kind in Pirahã, apart from these –sai examples which I was beginning to question. We discussed it briefly and Noam gave me some ideas for further testing the idea. Mark Baker, writing his PhD under Noam at the time, mentioned to me one day as we were having lunch that Noam was really intrigued by the idea that a language might not have embedding (Mark said something like 'You really got Noam's attention with what you told him about Pirahã', or some such).

[See "Dan Everett and the Pirahã in the New Yorker", 4/19/2007, for some explanation of the context of this rebuttal of a rebuttal.]

So Grossman's analogy -- "suppose a high-school physics teacher found a hole in the theory of relativity" -- is wildly off the mark.

In the first place, Dan Everett is no high-school teacher. He's been a well-established academic researcher for more than 20 years, with numerous widely-cited publications, and previous faculty positions at the University of Pittsburgh and at Manchester University. In the second place, the specific theory that Grossman focuses on ("recursion only") is no theory of relativity. It's a relatively recent idea -- dating only to 2002 in its current form -- and it has never been universally accepted, with prominent rebuttals from Ray Jackendoff and Steven Pinker, among others. (It's true that Dan is challenging the idea of Universal Grammar more broadly -- but without trying to diminish the interest of his arguments, it's fair to say that he's joining a long and distinguished list of challengers, stretching back to the beginnings of the UG idea in the 17th century.)

All the same, there is certainly a brouhaha centered on Dan. See Geoff Pullum's post "Fear and loathing on Massachusetts Avenue", 11/29/2006, for a description of some of the symptoms, and some keen insight into the causes. And the brouhaha continues, including what seem to be some shocking intrigues in Brazil, designed to persuade the government to bar Dan's access to the Pirahã on the basis of trumped-up charges of racism.

[Let me mention in passing that informed readers are also likely to have a quiet chortle over Grossman's description of linguists as "occupants of a usually quiet corner of the ivory tower". If you'd like to get in on the joke, you could read The Linguistics Wars, a description of "the fierce, acrimonious controversies that have rocked linguistics since the 1950s", focused mostly on the Generative Semantics heresy, a form of what those familiar with earlier sectarian strife might call "premature minimalism".]

In other Everett-related news, the CBC radio program "And Sometimes Y" will air a show (on June 23 from 11:30 am to 12 noon) about Universal Grammar in the context of Dan's claims about Pirahã. In addition to Dan, the interviewees include David Pesetsky and Martha McGinnis, two fine linguists whose positions on the issues differ from Dan's in ways that you should find interesting.

Also, an interview with Dan is now up at www.edge.org.

Finally, I'm told that a consortium including the BBC, ARTE, the Australian Broadcasting Company, PBS Nova, and the Smithsonian Channel will be doing a film about linguistics, Dan, and the Pirahã (if the Brazilian government authorizes it).

[A list of other Language Log posts on Dan Everett and the Pirahã can be found here.]

[Update -- Dan Everett writes:

I would simply like to emphasize that the popular media's reporting on this controversy has been extremely hit or miss. The New Yorker did a reasonably good job, but I am really weary of pieces like the Chicago Tribune's that puts this all in personal terms. It is bad for me, bad for the field, and potentially bad for the Pirahas, because it makes them look somewhat freakish. I wrote a paper and published it in a peer-reviewed journal. Looking past all of the publicity and vitriol, there is really only one appropriate way to deal with these claims: design experiments, go to the field, and test them. On the other hand, Nevins, Pesetsky, and Rodrigues did do a very good and important bit of work in their long literature-based criticism of my paper and I welcome the opportunity that they afforded me to (attempt to) clear up some things. I think that not only Piraha but the Australian languages Ken Hale worked with, among others, all need to be studied more carefully, with a range of experiments and different methodologies and theories. These are important issues and I certainly never claimed to have the last work on any of them, not even for Piraha. Whenever we publish anything, all of us, we know that if it is important, people will want more data, they will want to replicate the experiments, etc. That is what I expected and that is what should be done.
There are several empirical claims here: Piraha lacks recursion, numbers, counting, etc. And there is a theoretical account I propose: the explanation is cultural. If I am right about Piraha, then this does not mean that, say, Australian aboriginal languages, if they also lack recursion, have the same explanation. I claimed in the Current Anthropology paper that Piraha was the only language known in which the claim was that they had no embedding. Ken Hale did in fact make a similar claim, but he never made a big deal of it and the press didn't pick up on it. But I think too that he was never categorical in saying that these languages lacked recursion, i.e. that the Australian languages he worked on were *finite* languages, which is what I claimed for Piraha. However, I have thought a lot about Ken's claims on non-configurationality and I think that those claims and facts could easily be recast in terms of non-recursivity, rather than non-configurationality. Peter Austin has told me, after listening to me talk about this, that what I say about Piraha seems very similar to what he knows about Australian languages. And Rachel Nordlinger (who was also invited to speak at the Recursion Conference at ISU last month but couldn't come because of scheduling conflicts) has said similar things to me.
Piraha is perhaps unique in the constellation of features associated with it and, if I am correct, the particular cultural explanation offered. But who knows? More fieldwork is needed.
One thing that is clear, though, from the conference at ISU, from Ken's work, from mine, and from the work of many others: Recursion and its manifestations are not well enough understood to support the claim that recursion is the core component of human language.
Finally, let me just say that I sorely miss Ken Hale. I wish so much that he were here now to engage in the debate. I am sure that his calmness, kindness, and brilliance would help all of us to see these issues much more clearly, as he would also pour oil on the waters that have been so disturbed in recent months.

Amen. ]

[Note: You may have to listen to the CBC And Sometimes Y program in real time -- past shows seem to be available for a limited period as RealMedia streams, for example a show on Clichés that I was on a couple of months ago -- but mostly they want you to buy collections of their shows on physical media, in the weird old-fashioned style that some publically-funded broadcast organizations have not yet abandoned. This is enough expense and trouble that I don't imagine that very many people bother, and it's been a mystery to me for years why these organizations don't take the obvious step here, and make individual shows available for download. If someone knows the explanation -- management is too sclerotic to consider new ideas until a few years they've become routine elsewhere? someone's college roommate ekes out a meager living running the business of selling CDs? -- please tell me.]

Posted by Mark Liberman at 07:15 AM

Contrastive focus reduplication in Zits

A few years ago, there was a paper: Jila Ghomeshi, Ray Jackendoff, Nicole Rosen, and Kevin Russell, "Contrastive focus reduplication in English (the Salad-Salad paper) ", Natural Language and Linguistic Theory, 22(2) 2004. The abstract begins:

This paper presents a phenomenon of colloquial English that we call Contrastive Reduplication (CR), involving the copying of words and sometimes phrases as in It''s tuna salad, not SALAD-salad, or Do you LIKE-HIM-like him? Drawing on a corpus of examples gathered from natural speech, written texts, and television scripts, we show that CR restricts the interpretation of the copied element to a `real'' or prototypical reading. Turning to the structural properties of the construction, we show that CR is unusual among reduplication phenomena in that whole idioms can be copied, object pronouns are often copied (as in the second example above), and inflectional morphology need not be copied. Thus the `scope'' of CR cannot be defined in purely phonological terms; rather, a combination of phonological, morphosyntactic, syntactic, and lexical factors is involved.

This morning's Zits presents the comic-strip version:

In fact, this is taken directly from one of the Ghomeshi et al. paper's initial list of characteristic examples:

(1)a. I’ll make the tuna salad, and you make the SALAD–salad.
    b. LIKE-’EM-like-’em? Or, I’d-like-to-get-store-credit-for-thatamount like-’em?
    c. Is he French or FRENCH–French?
    d. I’m up, I’m just not UP–up.
    e. That’s not AUCKLAND–Auckland, is it?
    f. My car isn’t MINE–mine; it’s my parents’.
    g. Oh, we’re not LIVING-TOGETHER–living-together.

Kevin Russell has a corpus of 203 real-world examples on his web site, which attributes (1)d to a real-world interaction rather than to a canonical text such as The Simpsons or As the World Turns:

[A phones B early in the morning.]
A: Sorry. Did I get you up?
B: I'm up, I'm just not UP-up.

So Zits owes a footnote to Jila Gomeshi, Ray Jackendoff, Nicole Rosen, and Kevin Russell. It's not the same, I know, but I'll take it as fair trade for all the Zits strips that we've used over the years.

More important, I'm shocked to discover that (at least as far as the Google News Archive knows) the Gomeshi et al. paper was never picked up by the popular press. As I've often observed, the LSA should fire its public relations consultants (if only it had any) and hire a crew that knows what to do with a great story like this one. Maybe they could get the folks who handle PR for Roland Kapferer and West Country Farmhouse Cheesemakers.

[For those readers who don't have access to the NLLT archives, and don't want to shell out $32 for a peek at Springer's copy, there's a preprint of the Salad-Salad paper here.]

Posted by Mark Liberman at 07:11 AM

A Tip of the Hat to Senator McCain

In the New Hampshire Presidential debates the announcer asked the Republican candidates if any of them opposed making English the official language of the United States. The only one to step forward was Senator McCain, who not only pointed out that this is not a real issue, in that anyone who wants to advance economically is strongly motivated to learn English, but that the United States has treaty obligations to Indian tribes, such as the Navajos, who have their own languages and unproblematically conduct their affairs in them. Good for him!

Posted by Bill Poser at 02:25 AM

June 10, 2007

Join the sect

That's La Secte Phonétik:

Pour jouer les apprentis sorciers,
Rapper, beat boxer toute la journée,
Déjouer la fatalité,
Devenez adeptes.

Si votre vie manque de couleur,
Si quotidien rime avec douleur,
Approchez tous, n'ayez pas peur,
Rentrez dans la Secte.

[From 3quarksdaily]

A literal translation of the refrain:

To play sorcerer's apprentices,
to rap, to beat-box all day long,
Avoid doom,
Become a member.

If your life lacks color,
If daily rhymes with sorrow,
Aproach, everyone, have no fear,
Join the Sect.

Send me the rest of the lyrics and I'll post them.

[Charles Brasart writes:

After watching the video you posted on Language Log yesterday, I decided to type out the lyrics to the song - they make for a nice little corpus for anyone interested in recent French slang. There are a couple of things I haven't able to understand, but the rest sounds fairly accurate.

Pour jouer les apprentis sorciers,
Rapper, beat-boxer toute la journée,
Déjouer la fatalité,
Devenez adeptes,

Si votre vie manque de couleur,
Si quotidien rime avec douleur,
Approchez tous, n'ayez pas peur,
Entrez dans la Secte.

Que tu sois un gars, que tu sois une meuf,
Que tu sois peace, que tu sois rough,
Que tu sois jeune, que tu sois vieux,
Que tu sois athée, que tu sois pieux,

Que tu sois manchot, que tu sois gaucher,
Que tu sois blindé ou toujours fauché,
Que tu sois deuspi ou jamais pressé,
Que tu sois un flic ou qu'tu t'sois fait serrer,

Que tu sois blanc, que tu sois noir,
Qu'tu sois dans les temps, ou toujours en retard,
Que tu sois keuss, que tu sois roux,
Que tu sois chauve, que tu sois blond,

Que tu sois une caille ou un bobo
Que tu sois homo ou encore puceau
Que tu sois maniaque ou bordélique,
Que tu sois toujours sobre ou alcoolique,

Que tu sois un nain ou bien une perche,
Que tu sois deux d'tens' ou qu't'aies toujours la pêche,
Que tu sois timide, ou un tchatcheur,
Que tu sois patron, ou qu'tu sois chômeur,

Que tu sois sportif ou qu'tu fumes le bédo,
Que t'aies fait d'longues études, que tu sois sans diplôme,
Que t'aimes être peinard ou que t'aimes faire la fête,
jt'e souhaite comme on dit la bienvenue dans la secte...

Pour jouer les apprentis sorciers,
Rapper, beat-boxer toute la journée,
Déjouer la fatalité,
Devenez adeptes,

Si votre vie manque de couleur,
Si quotidien rime avec douleur,
Approchez tous, n'ayez pas peur,
Entrez dans la Secte.

Euh, dans ma secte, pas d'inceste, pas d'insectes ni de couleuvres à avaler,
Notre aspect intrinsèque vous inquiète mais les couleurs vont cavaler
Sur les murs de vos tympans donc oyez oyez braves gens,
Voyez l'étrange contingent de ceux qui hurlent avec les dents,
C'est la Secte Phonétik, hystérique comme un pari,
Elle nous dit dans le silence les sirènes de la République.

Pas de trip apocalyptique,
On ne se plie qu'aux tortures poétiques,
<First word missing> juste amoureux de la langue
Houleuse par grand vent,
Langoureuse et goulue
Quand le temps est changeant.

En guerre contre la bêtise, contre l'analphabétisme,
Qui parasitent et répriment les énergies créatives,
<I have no idea what this line is>
Mais d'un disque disponible accessible à tout esprit libre.

Pour jouer les apprentis sorciers,
Rapper, beat-boxer toute la journée,
Déjouer la fatalité,
Devenez adeptes,
Si votre vie manque de couleur,
Si quotidien rime avec douleur,
Approchez tous, n'ayez pas peur,
Entrez dans la Secte.

<gibberish>Hippo, hip, hip, ho
Irico, no aime, pipeau li</gibberish>
ami du tempo aussi habile au beat ho-ho,
Lis ça, folie j'erre solitaire solide, fier,

Nouvel adepte de la secte,
Afin de l'être il a dû faire
Preuve de sang froid, dur, entêté,
Preuve, sans effroi,
Plonger dans ce fleuve pour draguer des piranhas,
Veuves plus flippantes que Fantomas,
Il a dû boire toute l'eau d'un lac ;
Chercher des oeufs sur l'ïle de Pâques ;
Fermer les yeux face aux cieux
Dans l'espoir facétieux de recevoir une bise ou une claque ;
Attraper la jaunisse, la lèpre, la chaude-pisse, la peste
et mater des téléfilms suisses à Budapest ;
Ainsi a libéré des instincts primaires emmurés dans sa chair
sans suer de sang sincèrement sans se censurer,
Très compétent, hippo-compétent,
Encore vivant, est enivrant, te délivrant vite,
Il fait désormais partie de la Secte Phonétik,
Dites merci pour ce récit de son parcours initiatique.

]

Posted by Mark Liberman at 07:25 AM

Talking about talking

Fry and Laurie:

[Hat tip: the Leggott Language Blog]

Posted by Mark Liberman at 07:05 AM

June 09, 2007

Finches again

One of the many topics that I've been meaning to get back to is the nature and role of birdsong dialects in the Flemish finch-tweeting competitions known as "vinkensport" ("Watch out for those Wallonian Finches", 5/22/2007; "Dialect variation in the terminal flourishes of Flemish chaffinches", 5/25/2007). Yesterday, Jon Bernard reminded of this by writing:

Bruno Dumont's "La vie de Jesus", set in the town of Bailleul in northern France, has a lovely scene of a finching contest.

I was happy to get this note, not only because it gives me a reason to get back to vinkensport, and raises a fascinating problem for future field research (just what are the boundaries of the competitive finching culture in northern Europe?), but also because it helps me rise to a challenge posed the other day by Bill Poser in the Senior Common Room at Language Log Plaza:

We may be the only blog out there that has yet to comment on the Paris Hilton situation. ... You'd think there'd be a linguistic angle, though I can't say I've thought of it myself.

After following the link to the IMDB description of La Vie de Jesus, and reading the list of IMBD keywords for the movie, ("Bird, Male Frontal Nudity, Female Frontal Nudity, Bar, ..."), I feel that there might be a role for Ms. Hilton in an English-language remake. I see one of those girl-and-her-horse movies like National Velvet or Dreamer, only with a chaffinch instead of a horse, and a tweeting competition instead of a race. Plus of course the nudity and bars. As for the linguistic angle, keep reading!

Earlier in the week , Ivan Lietaert sent this note:

I've been digging deeper into the 'vinkensport' and stumbled upon a detailed set of rules for the finch competition in Flanders, which I copied and pasted below. This set of rules describes the Flemish finch dialect, and rules out other (Walloon) dialects. The rules are intimidating, if you ask me, and it must take years to master them. In fact, there is a striking similarity with linguistics here... Someone must have painstakingly jotted down these phrases, counting syllables, registering phonemes, assimilation and contraction, and by doing this inventing 'finch-phonetics' (and grammar!). I'm only sending you chapter 7 and 8, but the previous chapters may also be of interest, but unfortunately all of it is in Dutch. Still, in it, they use the terms 'Walloon' ("Waal" or "Wale") and 'foreign' ("uitheems") alternatively, which implies they are synonymous to them. In doing so, they prove the point I made in my previous email.

Ivan is referring to the "Statuten van de V.Z.W. 'Vinkeniers Midden-Belgie' afgekort VI.MI.BEL", an extraordinary document comprising more than 20,000 words. Thanks to Google, I had previously found this myself, and had even painfully translated a few paragraphs, but put off posting them because I felt that I should understand them in the context of the whole mass of netherlandic legalese. Which I meant to get back to some day, really I did.

But prompted by Ivan's note, I'll share with you some pieces of the the passage he sent -- the whole thing is a bit on the long side, so you'll have to go to the vimibel web site to read the rest, which I certainly recommend that you do. (As a Language Log reader, you can doubtless deal with the Dutch for yourself, though I've translated a few of the explanations of valid and invalid songs to get you started. Not that I know any Dutch, I hasten to add in case I've totally misunderstood something... )

07. ENKELE VOORBEELDEN VAN GELDIGE VINKENZANGEN (SOME EXAMPLES OF VALID SONGS)

-Rin tin tin tin – blubblubblur – sis ke wie

Dat is een geldig lied omdat het bestaat uit minstens een tweelettergrepige voorzang, een middenzang en minstens een tweelettergrepige inheemse slotzang eindigend op /WIE
(That is a valid song because it has at least a two-syllable initial phrase, a middle phrase, and at least a two-syllable final phrase ending with /WIE)

-Tjetjetje – sis kwie

Dit is een geldig lied omdat het bestaat uit voldoende lettergrepen voorzang en een tweelettergrepige inheemse slotzang die eindigt op /WIE
(That is a valid song because it has an initial song with enough syllables, and a two-syllable native final song that ends with /WIE)

-Tjin tjin – se wie

Is een geldig lied omdat het bestaat uit een tweelettergrepige voorzang en een tweelettergrepige inheemse slotzang met als laatste lettergreep WIE

-Tje tje tje – trurrrrr – schwie

Is een geldig lied omdat het bestaat uit een voorzang, een roulade en een éénlettergrepige samengetrokken slotzang, eindigend op WIE

[...]

08. ENKELE VOORBEELDEN VAN ONGELDIGE VINKENZANGEN (SOME EXAMPLES OF INVALID SONGS)

-Tjeet tjeet tjeet – tjing tjing tjing – sis kie

Is een ongeldig lied omdat de laatste lettergreep niet eindigt op /WIE
(Is an invalid song because the last syllable does not end with /WIE)

-Tje tje tje – krerrrr – skie

is een ongeldig lied omdat de laatste lettergreep niet eindigt op /WIE

-Tin tin tin – klokloklo – beeuw wie

Is een ongeldig lied omdat de laatste klankgroep WIE voorafgegaan is van een bestanddeel van en uitheems of Walelied
(Is an invalid song because the last sound-group WIE is preceded by a component of an exotic or Walloon song)

[...]

(I'll appeal, on your behalf as well as mine, for help on one point from someone who knows Dutch. When the rules say that:

-Tjin tjin tjin – tje tje tje – siske wie kit

Is een ongeldig lied omdat er na de WIE nog iets bijgevoegd wordt dat geen pink, tjok, grol of steek is
(Is an invalid song because after WIE something is attached that is neither "pink", "tjok", "grol" or "steek")

are pink, tjok, grol and steek just onomatopoeic names for kinds of finch syllables?)

Another development since my last vinkensport post was that interlibrary loan brought me a scan of P.J.B. Slater, F.A. Clements and D.J. Goodfellow, "Local and regional variation in chaffinch song and the question of dialects", Behaviour, 88(1-2) 76-97, 1984. They consider a large number of chaffinch songs recorded in Sussex, Orkney and Cheshire, compared against one another and against a corpus from the same species in New Zealand. For a sense of their take on the variations involved, here's Figure 2, dealing with just one location in Sussex:

Fig. 2: Dendrogram illustrating results of cluster analysis on the three phrase trill songs sung in Stanmer Park, Sussex, in 1981. Sonograms are of representative examples of the songs clustering together at the end of a particular branch. Song types, as classified by eye, are represented by letters or letters and primes. Branches are not included if they only connect songs classified as the same type by human observers: the numbers show the number of such songs occurring together at the end of a particular branch.

They conclude that

As a result of these analyses what can be said about variation in chaffinch song within the British Isles? Section I showed that variation within an area was quite substantial in many of the feature of song that could be measured, but the differences between two areas 900 km apart were slight, at least for the three phrase songs that were analysed. ...

,,,[A] point emphasised repeatedly earlier is that song types are not fully independent of each other, one often being derived from another during cultural evolution.

[...]

The word "dialect" seem inappropriate when applied to the variations in chaffinch songs within the British Isles ... it is not so easy to pinpoint universal differences between areas, such as the word dialect would imply. That songs fall into types that are usually learnt accurately but sometimes less so leads to the variety of song, and the occurrence of several different lineages of song types within an area. The analogy with human dialects breaks down because all people in a particular isolated population have features in common with each and different from those elsewhere. ...

OK, so back to Paris Hilton.

In her remake of La Vie de Jesus, I see her overcome with grief when her champion chaffinch is denied his victory due to the presence of allegedly foreign syllables in his song. While drowning her sorrows in the bars of Picardy, Paris meets two eager young researchers, a computational linguist and a biologist specializing in animal communication. During a heated exchange of ideas and bodily fluids, they recognize that techniques developed for tracing the history of genomic variants in mixed populations can be applied to determining the cultural phylogeny of chaffinch song variants. Working together, they establish that the cultural stereotypes of traditional finching prescriptivists throughout northern Europe are historically as well as sociolinguistically invalid -- in particular, many themes conventionally viewed as Flemish actually derive from songs that originated in French-speaking areas, and vice versa.

They publish their results in Nature, and the European Parliament, by acclamation, establishes a study group to decide on the methodology for creating a committee to develop a pan-European stochastic grammar of chaffinch song. Meanwhile, the Eurovision Song Contest opens a new category for finches; Paris reclaims her champion bird, left in the care of her grandfather while she was lost in alcohol and computational linguistics; and they win!

[This whole "remake" idea only works if you totally ignore the actual plot of the movie, which appears to involve anti-immigrant prejudice and skinhead violence, with the finching contest only there for local color. There's a resonance with the whole local-vs.-foreign finch song regulations, I guess, but it would be hard to fit in all the themes and scenes. So this would be one of your less faithful remakes, or perhaps I should say, one of our more gratuitous attempts to find the linguistic relevance of current events.]

[Update -- Andrew Clegg writes:

Can I just say that, while being stuck at home on a sunny Sunday afternoon examining dependency graphs for my long overdue thesis, the image of Paris Hilton 'lost in alcohol and computational linguistics' has brightened up my day immensely.

Mine too! ]

[Update #2 -- With respect to pink, tjok, grol, and steek, Alain van Hout explains:

All four words have some sort of meaning in Dutch/Flemish, which in all cases however do not seem to have any bearing on this topic:
- pink: same as 'pinky' (little finger) in English
- tjok: 'tjok vol' is sometimes used to infer that something is completely full or even overflowingly full
- grol: same as 'a growl' in English
- steek: the 1st person singular or imperative of a verb meaning either 'stabbing' or 'putting/placing/inserting'

Alain therefore endorses the idea that these are echoic names for classes of chaffinch syllables. But I'm most interested in the cross-channel variant of chock/choke full.]

Posted by Mark Liberman at 12:27 PM

June 08, 2007

A kinder, gentler speech error

Speaking of speech errors, Wednesday's News Hour featured a particularly interesting one (to me). It happened during Jeffrey Brown's interview with freelance journalist Brian Mockenhaupt, a "[f]ormer soldier [who] wrote in the Atlantic Monthly about the Army's struggle to fill its ranks with a generation less willing and able to serve than in years past." Beginning around minute 3:48 of the .mp3 of the interview, Mockenhaupt says the following:

And that's where you see the shift to what some people call the kinder and gentler basic training. (audio)

The "kinder and gentler" bit is of course a (very indirect) reference to Bush Sr.'s call for "a kinder and gentler nation" (also cynically referenced in a military context by Neil Young in Rockin' in the Free World: "We got a kinder, gentler, machine gun hand").

That isn't the speech error (though Mockenhaupt's production of the "tl" cluster in "gentler", such that it sounds something like "genchler", is also somewhat interesting). The speech error I'm thinking of occurred when Jeffrey Brown repeated the "kinder and gentler" bit -- twice -- as "kindler, gentler":

And you describe in your article the two styles here of the shock that we're all kind of familiar with, either from experience or from movies, and this kinder, gentler approach that you're describing. (audio of italicized portion, beginning around minute 4:45 of the interview)

Well, in terms of results, what did the military people that you talked to, what did they say about the type of soldier that comes out of this -- call it the kinder, gentler training? (audio of italicized portion, beginning around minute 5:31 of the interview)

Brown is clearly anticipating the "tl" cluster in "gentler" as he's producing "kinder", and this anticipation is facilitated by the overall similarity between the ends of the two words: the "nder" at the end of "kinder" differs from the "ntler" at the end of "gentler" mostly in the "l" found in the latter but not in the former; "t" and "d" are both alveolar stops differing only in voicing (and, in this case, in the fact that the "d" is released into the "er" while the "t" is released into the "l").

Anticipatory speech errors of this kind are common, and I imagine that they are especially common when there is already a certain amount of similarity between the word in which the error occurs and the word that "triggers" the error. I have nothing but intuition to guide my imagination here, though, so I should probably spend some time searching through the excellent Fromkins Speech Error Database at the Max Planck Institute for Psycholinguistics. The database is named for the late, great Vicki Fromkin, a pioneer in speech error research and the original developer of the database -- read more about it here.

[ Comments? ]

Posted by Eric Bakovic at 04:56 PM

Don't Ask, Don't Translate

Today's New York Times has a an op-ed piece by Stephen Benjamin, a former US Navy Arabic translator, what the military calls a "linguist". He volunteered for the Navy, went to the Defense Language Institute to learn Arabic, and was ready and willing to go to Iraq. He never got there. Why? Because military snooping on IM exchanges between Benjamin and his roommate revealed that they are gay, as a result of which he was discharged from the Navy.

As we have discussed before on a number of occasions, the US military badly needs people with Arabic language skills. Discharging those they have merely because they are gay is stupid. Both the success of other armed forces in incorporating gay soldiers and polls of US forces indicate that it will not lead to the alleged problems of unit cohesion that constitute the only halfway credible argument against allowing openly gay soldiers.

Benjamin's account also reveals that the US is not following its announced policy of "Don't ask, don't tell.". Mr. Benjamin was not open about his gayness. He was exposed by government snooping. Apparently the administration believes that pandering to the distaste of social conservatives for homosexuality is worth the lives of American soldiers. That's a funny way of supporting the troops.

Posted by Bill Poser at 04:52 PM

Anything but pedagogic

With reference to the piece in Southwest Airlines' Spirit magazine that named Language Log one of "8 Diversions" for its readers, Chris Cieri pointed out to me that several phrases might in principle be either praise or criticism. For example,

These profs know their allomorphs from their morphemes, but their posts are anything but pedagogic.

Being someone with a positive attitude towards school, Chris observes that this might mean "These people know their stuff, but they can't explain it properly in what they write."

Unlike Chris, however, most English-speaking people over the past few centuries have had generally negative attitudes towards pedagogues and pedagogy, at least when they use words derived from Latin paedăgōgus, meaning "a slave who took the children to school and had the charge of them at home, a governor, preceptor, pedagogue". The OED's earliest citation for pedagogic is

1693 J. BARNES in B. Hawkshaw Poems upon Several Occasions p. v, Forward Sense, to lofty Flights enclin'd, Prevents the tedious Discipline of Schools, The Loyt'ring Art of Pædagogick Rules.

There's a slightly earlier one for the variant form pedagogical -- in Act I, Scene 1 of Mr. Anthony (a 17th-century play by Roger Boyle, Earl of Orrery, who died in 1679), Anthony expresses some hostility to his schoolmaster, Mr. Pedagog, in terms that are by no means respectful of the characteristic tools and methods of pedagogy:

Prethee let me have my full swinge at him (for he has had his many a dismal time at me:) I say, if thou dost not conform to all the Maxims of Jack Plot, Tom Art, and my own dear self, I will peach thee at such a rate to my Sire, as shall provoke him to uncase thee out of thy Pedagogical Cassock, Condemn to the Flame, Martyrlike all thy Ferula's, Grammars, Dictionaries, Classick Authors, and Common-Place Books; nay, take thy Green Glasses out of thy Spectacles, and leave thee only thy Horn-cases to look through; by which, thou wilt be as able to read Prayers with thy Nose as with thy Eyes.

Although the pedagogical cassock is now worn only for graduation rituals, the negative associations with teachers have remained in force over the centuries. Thus Caroline Kirkland, "The Schoolmaster's Progress", 1845:

Master William Horner came to our village to keep school when he was about eighteen years old: tall, lank, straight-sided, and straight-haired, with a mouth of the most puckered and solemn kind. His figure and movements were those of a puppet cut out of shingle and jerked by a string; and his address corresponded very well with his appearance. Never did that prim mouth give way before a laugh. A faint and misty smile was the widest departure from its propriety, and this unaccustomed disturbance made wrinkles in the flat skinny cheeks like those in the surface of a lake, after the intrusion of a stone. Master Horner knew well what belonged to the pedagogical character, and that facial solemnity stood high on the list of indispensable qualifications.

And a whole dissertation could be written about attitudes towards teachers in recent popular music, starting with

No more pencils
No more books
No more teachers' dirty looks

The AHD entry for pedagogic elevates this connotation to the level of a second sense:

1. Of, relating to, or characteristic of pedagogy. 2. Characterized by pedantic formality.

It's true that the turn of phrase "anything but X" can be either positive or negative, depending on the properties of X as evaluated in the context. But I think that "anything but pedagogic" is likely to be meant as a compliment -- and a bit of web search confirms this impression:

From James Hilton's "Was It Murder?" (emphasis added):

The study presented another striking change; under the régime of the Reverend Dr. Jury, who had been Head of Oakington in Revell's time, it had been a gloomy, littered apartment, full of dusty folios and sagging bookshelves. Now, however, it looked more like the board-room of a long-established limited company. A thick pile carpet, a large mahogany pedestal-desk, nests of bookshelves in the two alcoves by the side of the fire-place, a very few good etchings on the walls, and several huge arm-chairs drawn up in front of an open fire, gave an impression that was anything but pedagogic. And Dr. Roseveare himself confirmed the impression. He was tall (well over six feet), upright, and of commanding physique. Bushy, silver-grey hair surmounted a strong, smooth-complexioned face into which, however, as he gave Revell a firm hand-grip, there came a smile both cordial and charming.

Katie Savchuk, "Tom Robbins find life in the magically mundane":

Robbins forces you to ponder the philosophical conundrums of the universe, from the meaning of time to the value of civilization in an unadulterated flow of thoughts that is anything but pedagogical.

"The only truly magical and poetic exchanges that occur in this life occur between two people. Sometimes it doesn't get that far. Often, the true glory of existence is confined to individual consciousness. That's okay. Let us live in the beauty of our own reality," Robbins writes.

As clever as it is profound, Even Cowgirls Get the Blues is overflowing with expressions just outrageous enough to describe reality.

Posted by Mark Liberman at 09:54 AM

June 07, 2007

Keine Seife; Radio

It's going to be quite a puzzle for future students of our culture, this lolcats phenomenon, isn't it? To say nothing of the extension to philolsophers and the like. Anthropologists are going to ask, why did people way back in the early 21st century find these mangled and misspelled captions so funny? U can haz undRstand lolcats?

I remember back many years ago I was playing piano in a rock 'n' roll band in clubs in Rhineland Germany, and there was a baffling craze for incomprehensible shaggy-dog jokes, sometimes involving gorillas, in which the punch line was always "Keine Seife; Radio" ("No soap; radio"). What did it mean? Why did the bar girls who told us these jokes fall about laughing at them? They admitted it made no sense. The line about soap and radio was just some sort of internal private lolcat for them. Until today I never knew why, but now I do, because after I posted the first version of this it was only an hour before a Language Log reader pointed me to the Wikipedia article on the vein of (anti-)humor involved (in English, it turns out — there was nothing German about it), which anthropologists of humor do seem to have deciphered. Thank you, Dan, for the first pointer, and many others after that.

Posted by Geoffrey K. Pullum at 05:01 PM

English declared "national language" (again)

Last night the Senate voted to approve Sen. James Inhofe's amendment to the immigration reform bill declaring English the "national language." As I mentioned in an earlier post, this is much the same amendment that Inhofe introduced last year, though it now sports the brand-new title, "The S.I. Hayakawa National Language Amendment Act of 2007." Last year the Inhofe amendment passed by a vote of 62-35, and this time around the vote was 64-33 — a slightly better margin despite the fact that the Democrats now hold a slim majority in the Senate. Last time the amendment was supported by 10 Democrats, whereas this year 17 voted "yea." The swing was due to five Democratic newcomers voting for the amendment (Benjamin Cardin [MD], Amy Klobuchar [MN], Claire McCaskill [MO], Jon Tester [MT], and Jim Webb [VA]), plus three old-timers (Mary Landrieu [LA], Barbara Mikulski [MD], and Ron Wyden [OR]) who apparently decided to switch their position on the issue. (Tim Johnson [SD] voted in favor of last year's amendment but did not vote this year.)

As was the case last year, this is all likely a moot point, since according to the latest reports it now seems unlikely that the immigration bill to which the amendment is attached (S. 1348) will even make it to a full Senate vote. Still, it gives English-only groups something to crow about once again. The passage of last year's amendment got a lot more media attention because it was the first time the Senate had voted to make English the "national" language. This year the amendment's approval was noted as just one of many setbacks for the proponents of the immigration reform bill, culminating in the failure to win a key test vote earlier today. If the immigration bill does manage to get passed with the "national language" amendment intact, or if Inhofe finds success with his more ambitious bill to make English the "official language," then we will see a real shift in this country's language policy. Until then it's little more than political posturing to assuage voters who see American multilingualism as some sort of threat to the national fabric.

Posted by Benjamin Zimmer at 01:02 PM

Learning to think like a ...

Many years ago, in the early days of sociolinguistics, I was pleasantly surprised by the rapid development of some of my grad students. At first puzzled by why they progressed so quickly and did so well, I began to suspect that they were beginning to think like linguists. As it turns out, my rosy view of their future careers turned out to be correct, because many of them went on to complete their Phds and became productive linguists at important universities. But it occurred to me that most professionals seldom stop to analyze how newcomers to their fields learn to think the way they do.

There are some signs that we are now beginning to study, or at least think about, the way students learn to think like their professions. In the field of medicine, for example, Jerome Groopman has recently published a book, How Doctors Think (Houghton Mifflin, 2007). Maybe it's not surprising that he places a very strong emphasis on the importance of the way doctors and their patients use language during diagnosis and treatment. Groopman believes that language is a large window to the thought processes. In the introduction to his book, Groopman says:

My generation was never explicitly taught how to think as a clinician. We learned Medicine catch-as-catch-can ... Rarely did an attending physician actually explain the mental steps that led him to his decisions.

Groopman's book is woven out of his own experiences. He's a marvelous writer and has a wealth of fascinating case histories to illustrate his ideas. His major point seems to be that med students are poorly taught about how to think like doctors. Med school gives them huge amounts of information about bodies, technology, illnesses, and treatment strategies, then sends them forth to try to match these with the patients' apparent symptoms. But med school training, he believes, doesn't give them much about the reasoning processes that makes this happen effectively. Groopman believes they diagnose too quickly, often without hearing the important signals that their patients could tell them, mostly because the current medical delivery system often allows doctors less than 20 minutes to make the important decisions in their patients' lives.

Elizabeth Mertz's recent book, The Language of Law School: Learning to Think Like a Lawyer (Oxford University Press, 2007), tackles this issue in the field of law school training. Mertz is an anthropologist with a law degree and now holds the position of Professor of Law at the University of Wisconsin Law School. She uses her research findings from eight different law schools to support her conclusions about how law students learn to think, talk, and read like lawyers. Law education tends to begin with reports of judicial decisions, focusing on the principles of law that were used to decide the human stories that created the problem in the first place. Mertz calls this "finding the layers of legal authority." Thus, learning to read like a lawyer plays a very important role in learning to think like a lawyer, especially learning to separate the relevant facts from the irrelevant and to distinguish legal doctrine from legal policy (alternative rules that might apply).

Unlike Groopman, who gets the data for his conclusions from his own wide experience and perceptive mind, Mertz gathers her data more ethnographically. She tape-recorded eight law school courses on contracts in different law schools, transcribed her data, coded it, and analyzed it as an impartial outsider. She focused on the teaching methods (Socratic, modified Socratic, short exchange, and lecture) and looked at social differences such as the race and gender of both faculty and students.

So now there are some early stirrings about how these two professions, medicine and law, are starting to examine how their practitioners think, talk, listen, and read they way they do. I find it interesting that these treatments of medicine and law both begin with the clients' stories about their health and legal problems. Groopman would have doctors match the patients' stories with their own medical training in anatomy, technology, medications, and illness in the process of being advocates of health. Mertz descibes the way lawyers match the clients' stories with their own legal training in statutes, judicial decisions, and case law precedents in the process of being advocates of justice.

So now I'm back to wondering about how linguistics students learn to think like linguists. Unlike medicine and law, our teachers don't usually construct their learning moments with specific complaints presented by patients or with problems presented by clients. Like both medicine and law, we start with a body of knowledge about our field, but that body of knowledge is not inclined to be as authoritative or precedent-laden as it seems to be in law. Our teaching methods may be indirect but they usually are not, as Groopman puts it, "catch-as-catch-can." Nor do we tend to lean heavily, the way law professors do, on teaching via the Socratic method. But I still believe that good linguistics students manage, somehow, to learn to think like linguists.

In my own approach to graduate courses in linguistics I tried to encourage students identify a language problem by themselves, gather data about it, apply the appropriate linguistic tools and approaches to it, and reach one (or possibly more than one) conclusion. But I can't say that this represents how my colleagues teach or how their students learn to think like linguists, and I should probably disqualify myself anyway, since I retired from teaching eleven years ago.

Maybe someone like Elizabeth Mertz should take on the question of how linguistics students learn to think like linguists...unless someone has already done this.If so, please let me know.

Posted by Roger Shuy at 11:53 AM

Lolxicographers

Jeff Prucher answers the call.

You'll want to follow up with some of the other fine posts at his weblog, "Home of the Oxford Dictionary of Science Fiction":

Brave New Words: The Oxford Dictionary of Science Fiction is the first historical dictionary devoted to the language of science fiction and science fiction fandom. It shows exactly how science-fictional words and their associated concepts have developed over time, with full citations and bibliographic information. It's a window on a whole genre of literature through the words invented and passed along by the genre's most talented writers. In addition, it shows how many words we consider everyday vocabulary—words like spacesuit, blast off, and robot—had their roots in imaginative literature, and not in hard science.
Citations are included for each definition, starting with the earliest usage that can be found. These citations are drawn not only from science fiction books and magazines, but also from mainstream publications, fanzines, screenplays, newspapers, comics, filk songs, and the Internet. In addition to illustrating the different ways each word has been used, citations also show when and where words have moved out of the science fiction lexicon and into that of other subcultures or mainstream English.

Posted by Mark Liberman at 09:12 AM

"Republicans and Democratics": Hypercorrection or speech error?

Or could it be both? Don Porges writes:

We may have achieved some kind of back-correction (if that's a word) in the issue of "Democrat" vs "Democratic"... If my ears do not deceive me, you can hear Tom Ashbrook of WBUR's On Point refer to the people themselves as "Republicans and Democratics", 30:52 into the June 6, 2007 show "Republicans Debate: Round 3".

Or it may be a weird stutter, "Democrat-ats". Hard to tell. Bring out the waveforms!

Don's first idea, I think, is that Ashbrook might be using "Democratic" in place of "Democrat" as the noun form, as an over-reaction to all the complaints about Republicans who use "Democrat Party" and similar phrases as a taunting insult to their political opponents (see "'Democrat majority': offensive but not ungrammatical", 1/31/2007).

This particular sort of error is generally called a "hypercorrection". At least, that term almost fits. Usually the source of a hypercorrection is someone who doesn't know any better. In one common sort of hypercorrection, for example, a speaker whose dialect has lost a distinction that is maintained in higher-prestige forms of the language may sprinkle the missing forms around indiscriminately, in an attempt to seem posh. James Thurber satirized this process in his Ladies' and Gentlemen's Guide to Modern English Usage:

The number of people who use "whom" and "who" wrongly is appalling. The problem is a difficult one and it is complicated by the importance of tone, or taste. Take the common expression, "Whom are you, anyways?" That is of course, strictly speaking, correct - and yet how formal, how stilted! The usage to be preferred in ordinary speech and writing is "Who are you, anyways?" "Whom" should be used in the nominative case only when a note of dignity or austerity is desired. For example, if a writer is dealing with a meeting of, say, the British Cabinet, it would be better to have the Premier greet a new arrival, such as an under-secretary, with a "Whom are you, anyways?" rather than a "Who are you, anyways?"

Another example of this kind is the sometimes-satirized behavior of Cockneys who add [h] in front of vowel-initial words, in an ironically self-subverting attempt to avoid being stigmatized for h-dropping:

NOCKY tells me that the Westry means a-clearin' hout our place
For to make a bit o' garding, wot they calls a Hopen Space,

An example that became conspicuous a few years ago is the pronunciation of Jaguar as if it were written "jagwire". I speculate that this originated in a (south midlands?) dialect that merges the pronunciation of words rhyming with fire with that of words rhyming with far. Some speakers of this dialect noticed that city slickers often pronounce -ar words with -ire, and recognized that the failure to do so is stigmatized, a marker of "redneck" speech patterns.. Since "Jaguar" is the name of a prestige automobile, you naturally don't want to use a stigmatized pronunciation. So you pick the prestige variant that your dialect lacks -- and paradoxically, you thereby create a stigmatized hypercorrection, just as the cockney speakers do by adding a random initial [h].

(This analysis is purely a guess on my part -- if you know any research into the distribution and history of jagwire, please let me know. And how did this wind up as Steve Jobs' pronunciation for the code name of OSX 10.2? Okie influence on northern California speech patterns?)

OK, back to Ashbrook and his "Republicans and Democratics". It would be neat if he had been so concerned to monitor for using Democratic instead of Democrat as a modifer, that he mistakenly used Democratics as a substantive.

The trouble is, as Don acutely observed, Ashbrook didn't say "Democratics", but rather something that you might spell "Democratits". For determining the place of articulation of consonants, ears are still the best instruments we have. So give a listen:

It's clear that the D-word has four syllables, not three; but it also does sound like it ends in "-its" rather than "-ics". (That would be IPA [ˌdɛ.məˈkræ.ɾɪts] instead of [ˌdɛ.məˈkræ.ɾɪks] ).

If you want the added comfort of instrumental science, here's the requested waveform, and more to the point, a wide-band spectrogram of Ashbrook's pronunciation of the D-word:

The fact that there's no [k] articulation is signaled in the spectrogram by the lack of a "velar pinch" in the closure of the final syllable, highlighted in yellow, and the corresponding lack of any mid-frequency burst in the release of the stop closure into the [s], highlighted in pink.

So why did Ashbrook say "Democratits"? Beats me -- maybe he started to say "Democratics" and tried to take it back, but it was too late. On this analysis, his initial decision to use "Democratics" would itself be a kind of speech error, caused by the fact that his heightened awareness of the Democrat/Democratic distinction resulted in each form priming the other more strongly than usual; and then he noticed the mistake in midstream and tried to suppress the last syllable, creating a phonetic error on top of the lexical substitution.

That might be the biggest load of speculative association piled on a one-word error since Freud's Psychopathology of Everyday Life. But just imagine what we'd all be thinking if W had said it. (Cue George Lakoff on stern vs. nurturing...)

[Clarifying the "jaguar" pronunciation options mentioned above: The OED gives the IPA pronunciations [ˈdʒæg.wɑː(r), ˈdʒæg.juː.ɑ(r)], with the parentheses indicated r-ful vs. r-less optionality. I believe that (some of) the American commercials for Jaguar-the-car feature a British voice using the r-less version of the second of these. The "jagwire" pronunciation, in the same style of surface-phonemic IPA, would be something like [ˈdʒæg.wa^ɪr].

Posted by Mark Liberman at 05:57 AM

June 06, 2007

Distracted by the brain

About a year ago, I referred readers to Paul Bloom's discussion of Deena Skolnick's study of how mixing in a bit of irrelevant talk about neuroscience "turned bad [psychological] explanations into satisfactory ones" ("Blinded by neuroscience", 6/28/2006). Now a paper documenting that research is in press: Deena Skolnick Weisberg, Frank C. Keil, Joshua Goodstein, Elizabeth Rawson, & Jeremy R. Gray, "The seductive allure of neuroscience explanation", Journal of Cognitive Neuroscience. Here's the abstract:

Explanations of psychological phenomena seem to generate more public interest when they contain neuroscientific information. Even irrelevant neuroscience information in an explanation of a psychological phenomenon may interfere with people’s abilities to critically consider the underlying logic of this explanation. We tested this hypothesis by giving naïve adults, students in a neuroscience course, and neuroscience experts brief descriptions of psychological phenomena followed by one of four types of explanation, according to a 2 (good explanation vs. bad explanation) x 2 (without neuroscience vs. with neuroscience) design. Crucially, the neuroscience information was irrelevant to the logic of the explanation, as confirmed by the expert subjects. Subjects in all three groups judged good explanations as more satisfying than bad ones. But subjects in the two non-expert groups additionally judged that explanations with logically irrelevant neuroscience information were more satisfying than explanations without. The neuroscience information had a particularly striking effect on non-experts’ judgments of bad explanations, masking otherwise salient problems in these explanations.

Here's a sample of the material used in their experiment.

Here are the results from the novices and from the students in the neuroscience course. Note that for the neuroscience students as well as for the novices, the addition of irrelevant information about brain localization turns bad explanations into satisfactory ones, on average.

It's also worth noting -- and hardly surprising -- that the neuroscience students also were more impressed by the good explanation when a bit of irrelevant neuroscience was mixed in. If you thought of the novices as the general public (at least its more intellectual strata), and the neuroscience students as science writers (of the better-educated sort), you probably wouldn't be too far wrong.

Here are the results from the grown-up cognitive neuroscientists. Adding irrelevant neuroscience didn't impress them, I'm happy to say -- their (average) opinion of the bad explanations was not significantly improved, and their opinion of the good explanations actually declined:

The authors suggest that the same sort of experiment would also work in other fields, as I'm sure you can imagine. Are bad moral arguments "improved" by adding scriptural references? Do foolish system designs look better by references to the latest software-engineering methodology? What about bad teaching methods and fashionable educational buzzwords?

I'd guess that the answer to such questions is usually "yes": if you mix in irrelevant material from a fashionable subdiscipline in an authoritative-sounding way, you'll impress novices and apprentices, and make it harder for them to judge whether the substance of what you're saying makes sense.

There's a potentially embarrassing question for each group: are the experts better able to see through such displays of authority symbols? In some cases, I (unkindly) suspect that the "experts" might be worse than the novices -- sometimes expertise is little more than the learned ability to be distracted by artfully-deployed symbolic smokescreens. (But not in my field, nor in yours, of course...)

Skolnick et al. observe that neuroscience has a number of properties that make it especially effective as a rhetorical distractor, beyond the previously documented (and more general) "seductive details effect" -- it points to reductionist and materialist explanations, it provides an almost unlimited source of jargon, it sits at the intersection of several high-status occupations, and (though not in this experiment) it offers pretty pictures.

Posted by Mark Liberman at 06:27 AM

Calligraphy at MIT

A couple of months ago Scott Carney had a fascinating post entitled The Last Calligraphers about an Urdu newspaper in Chennai, India whose master is still handwritten. I'm not aware of any comparable English language newspaper, but I do own a beautiful textbook of classical number theory published by the MIT Press in 1977 that, like The Musulman, is in the author's calligraphic hand. It is Joe Roberts' Elementary Number Theory: A Problem-Oriented Approach. Here is a sample page.

It is a pleasant read on number theory, too. Regrettably, it is out of print.

[Addendum: Reader Chloe Lewis has informed me that Joe Roberts teaches at Reed College, where there was a great deal of calligraphic activity, described here.]

[Addendum: Reader Chris Lance points out that the books of Alfred Wainwright on the Lakeland Fells are in the author's calligraphy and also contain the author's fine illustrations. An example can be found here. Amazon.com also has excerpts.]

Posted by Bill Poser at 02:20 AM

June 05, 2007

Revelry in verbiage

This month's Southwest Airlines Spirit magazine features "8 Diversions":

a hardback (The Obvious, containing "All the business advice you’ve ever heard wrapped in one easy-to-read package");
a paperback (Learning to Kill, a collection of "bite-size whodunits, perfect for the beach");
a TV show (Passport to Latin America, in which "Host Samantha Brown travels to Central and South America, guiding viewers to the best cafés, parks, museums, shopping, and nightlife spots");
a movie (Ratatouille, Brad Bird's "first feature-length film since 2004's Oscar-winning The Incredibles");
a DVD (the Lucille Ball Film Collection, because "Who doesn't love Lucy?");
a CD (The Best of Both Worlds, "in which Disney star Miley Cyrus shares her pop-rock bebut with her alter ego, Hannah Montana ... a mini-Hilary Duff");
a blog; and
a website (How Stuff Works, which is "Wikipedia by way of David Macaulay ... Quick answers to dozens of common head-scratchers").

The blog? Language Log, described as "revelry in verbiage".

Here's the full passage (as it appears on the web):

THE CONCEPT Revelry in verbiage, especially when it comes to media and pop culture
BACKSTORY In 2003, University of Pennsylvania phonetics professor Mark Liberman and University of California, Santa Cruz, linguistic professor Geoff Pullum joined forces to create the Language Log blog. These profs know their allomorphs from their morphemes, but their posts are anything but pedagogic.
A SAMPLER “The Language of Stargate,” “Freedom of Speech: More Famous Than Bart Simpson,” “Tighty-Whities: The Semantics,” “The Coming Death of Whom: Photo Evidence,” “Irritating Clichés? Get a Life,” “Ray Charles, America, and the Subjunctive,” “You Say Nevada, I say Nevahda”
FIRST-TIMER’S GLOSSARY Eggcorn: a word coined by Pullum to describe the misuse of a homophone. It originated from an anecdote Pullum heard about a woman writing “egg corns” where she meant “acorns.” Snowclone: term used to describe fill-in-the-blank clichés often used by the media. Examples: “X is the new Y,” “have X, will travel,” and “what happens in X stays in X.”
FIGHTING WORDS “Strunk and White were a pair of hypocritical old grousers whose inaccurate grammar and usage edicts dated not from the last century but the one before that. Yet people not only treat them as if their words came from God and had been chiseled into granite slabs during an encounter up a mountain, they also fail to read those words to see if the old fools practice what they preach. Of course they don’t.”

According to Wikipedia, Southwest Airlines "is the largest airline in the United States by number of passengers carried domestically for any one year and the third largest airline in the world by number of passengers carried". So obviously they know their blogs.

We are now free to move about the country, in Spirit anyhow. (Did we get any free tickets? Alas, no.)

But Language Log, on a shortlist of diversions for Southwest Airlines' passengers, along with Lucille Ball, Brad Bird and Hannah Montana? Priceless.

[Hat tip: Kenny Easwaran.]

Posted by Mark Liberman at 08:35 PM

ESP Across Cultures

A call for papers for a journal entitled ESP Across Cultures just arrived via Linguist List. I immediately checked out the web site, figuring that it would be an interesting crank journal devoted to parapsychology. It turns out that "ESP" in this context stands for "English for Special Purposes", not "Extrasensory Perception". Darn.

Posted by Bill Poser at 05:20 PM

Taking no shit from judges

Last year, the New York Times famously printed President Bush's recipe for peace in the Middle East, "What they need to do is get Syria to get Hezbollah to stop doing this shit, and it's over." 32 years earlier, the Gray Lady printed the same four-letter word in a quotation from President Nixon that appeared in the House Judiciary Committee's transcript of the Watergate tapes. And in 1976, William F. Buckley was allowed to quote, in a book review, John Erlichman's fictional attribution of the same word to Lyndon Johnson. (See "Taking shit from the president", 7/19/2006).

However, it seems that circuit-court judges don't get the same privileges as House committees and famous right-wing publishers, not even when the judges are quoting the president in the highly relevant context of an opinion about FCC obscenity rules. At least, in a story this morning, Stephen Labaton uses some rather ornate circumlocutions to avoid printing what the judges wrote that our leaders said ("Court Rebuffs F.C.C. on Fines for Indecency"):

Reversing decades of a more lenient policy, the commission had found that the mere utterance of certain words implied that sexual or excretory acts were carried out and therefore violated the indecency rules.

But the judges said vulgar words are just as often used out of frustration or excitement, and not to convey any broader obscene meaning. “In recent times even the top leaders of our government have used variants of these expletives in a manner that no reasonable person would believe referenced sexual or excretory organs or activities.”

Adopting an argument made by lawyers for NBC, the judges then cited examples in which Mr. Bush and Mr. Cheney had used the same language that would be penalized under the policy. Mr. Bush was caught on videotape last July using a common vulgarity that the commission finds objectionable in a conversation with Prime Minister Tony Blair of Britain. Three years ago, Mr. Cheney was widely reported to have muttered an angry obscene version of “get lost” to Senator Patrick Leahy on the floor of the United States Senate. [emphasis added]

You can find a link on the New York Times web site to the Decision by the U.S. Court of Appeals for the Second Circuit -- with an "Editor's Note" reading "This text contains potentially offensive language." If you summon up the courage to follow the link anyway, you'll learn what that "common vulgarity" and "angry obscene version of 'get lost'" actually were, at least in the judges' version (on pages 26-27):

The Remand Order makes passing reference to other reasons that purportedly support its change in policy, none of which we find sufficient. For instance, the Commission states that even non-literal uses of expletives fall within its indecency definition because it is “difficult (if not impossible) to distinguish whether a word is being used as an expletive or as a literal description of sexual or excretory functions.” Remand Order, at ¶ 23. This defies any commonsense understanding of these words, which, as the general public well knows, are often used in everyday conversation without any “sexual or excretory” meaning. Bono’s exclamation that his victory at the Golden Globe Awards was “really, really fucking brilliant” is a prime example of a non-literal use of the “F-Word” that has no sexual connotation. See Golden Globes (Bureau Decision), 18 F.C.C.R. 19859, at ¶ 5 (“As a threshold matter, the material aired during the ‘Golden Globe Awards’ program does not describe or depict sexual and excretory activities and organs . . . . Rather, the performer used the word ‘fucking’ as an adjective or expletive to emphasize an exclamation.”), rev’d by Golden Globes, 19 F.C.C.R. 4975 (2004). Similarly, as NBC illustrates in its brief, in recent times even the top leaders of our government have used variants of these expletives in a manner that no reasonable person would believe referenced “sexual or excretory organs or activities.” See Br. of Intervenor NBC at 31-32 & n.3 (citing President Bush’s remark to British Prime Minister Tony Blair that the United Nations needed to “get Syria to get Hezbollah to stop doing this shit” and Vice President Cheney’s widely-reported “Fuck yourself” comment to Senator Patrick Leahy on the floor of the U.S. Senate).

The reference to Bono's Golden Globe remarks is a bonus -- among our many posts on taboo language, this episode and its follow-up inspired a series that I'm especially fond of:

"On second thought, make that 'fuckingly brilliant'", 11/3/2003
"Maybe better make that 'freaking brilliant'", 1/25/2004
"Some people should get a life", 1/25/2004
"The FCC and the S word", 1/25/2004
"The Ngadjonji and the PTC", 1/25,2004
"Imprecational categories", 3/21/2004
"The FCC and the S-word (again)", 3/21/2004
"The S-word and the F-word", 6/12/2004

[Update -- Ben Zimmer points out that in Erlichman's novel, the use of the word "shit" is attributed to a fictionalized version of LBJ, who is given the name "Esker Scott Anderson".]

Posted by Mark Liberman at 08:54 AM

June 04, 2007

Military capitalization

The adage that "military justice is to justice as military music is to music" reflects the fact that the military often has its own, odd way of doing things. Hitherto, that has not included English spelling, which is bad enough without military intervention. Unfortunately, the Army has not seen fit to leave bad enough alone. According to an article in the May 31st Stars and Stripes, Army Chief of Staff General Peter J. Schoomaker has ordered that all "command information products" capitalize the word "soldier" in all contexts, as if it were a proper noun. He has, furthermore, requested that the Associated Press stylebook and Webster's dictionary adopt his proposal.

General Schoomaker is evidently a smart guy - according to the 9/11 report, he argued unsuccessfully for attacking Al-Qaeda in Afghanistan prior to 9/11, which might have saved a lot of lives - but I think he's off the wall on this one. The reason that he wants "soldier" to be capitalized is that he thinks that it is respectful.

The change gives soldiers the respect and importance they've always deserved, especially now in their fight against global terrorism.

I don't know where he gets that idea. The standard basic rules for capitalization are that proper nouns are capitalized everywhere and other nouns at the beginning of the sentence. This does not show respect for the referents of proper nouns over common nouns or sentence-initial common nouns over common nouns in other positions. Similarly, the additional rules, such as the one that capitalizes titles, do not necessarily confer respect. ("Private" is capitalized just as much as "General".)

Some people point to the capitalization of terms like He and the Holy One in reference to god, and Bible, as instances of capitalization showing respect. Actually, I think that these are examples of the capitalization of proper nouns. When He is capitalized, it is no longer an ordinary pronoun, it is effectively a noun referring to one particular being. Similarly, Bible is the title of a particular text and so is capitalized for the same reason as Great Expectations. It is true that in many cases such capitalization is used for figures and items considered sacred, which may be the source of General Schoomaker's idea that capitalization shows respect, but it isn't hard to find counterexamples, such as the Evil One in reference to Satan. (Satan itself is capitalized because it is a proper noun.)

On the other hand, like other deviations from the norms of capitalization, this proposal would reduce a little bit the utility of capitalization in parsing English text and add yet another arbitrary fact to what the learner must learn. English spelling is bad enough as it is. Moreover, it invites variation in usage depending on the political attitude of the writer, and conflicts between the political attitudes of individual writers and the editorial policies of the publications their work appears in. People who capitalize soldier will be accused of being warmongers; those who don't, will be accused of not supporting the troops. Let's not go there.

Posted by Bill Poser at 08:58 PM

Squash the experts before they can multiply

The January issue of the American Psychological Association's journal, Monitor on Psychology, has a column called "Judicial Notebook," which describes Judge Regie B. Walton's decision in the perjury trial of "Scooter" Libby to not allow the proposed expert witness testimony of psychologists specializing in memory and cognition. The judge offered four grounds for his decision: (1) such testimony would not assist the jury; (2) the testimony would usurp the role of the jury in deciding the issue of credibility; (3) the prejudicial effect of the testimony would outweigh its probative value; and (4) the validity of the underlying studies were (sic) in question. Linguists who testify as expert witnesses have heard these grounds before.

Judges have the right to make such judgment calls. That's one reason why they're called judges. But it's often not very clear why they rule the way they do. For example, why did the judge think that the testimony would not assist the jury in this case? I don't know what the experts said in their offers of proof but, based on my own experiences in such hearings, there can be lots of problems. Sometimes the offer of proof is written by attorneys who may or may not fully understand what the expert is going to say. And sometimes they don't make their case very effectively even when they do understand. In some cases the offer is made orally at a hearing, either by the lawyer or the expert. Again, from my own experience in such hearings, judges may not hear the experts out, cutting them off when they are only part of the way through what they are trying to say, and never letting them get to the gist of their proposed testimony. When this happens, the experts may not have followed Grice's maxims very well and, therefore, they may have failed to convince the judge that their testimony would assist the jury. Many experts aren't familiar with the language of the courtroom or, for that matter, with the jargon-free language that might communicate to non-expert jurors. If the experts make their case well, however, they should be able to make a dent in Judge Walton's first objection: that such testimony would not assist the jurors. Whether judges agree to this, or even listen to it, is another matter, since they're the ones in the catbird seat.

Judge Walton's second objection, that such testimony would usurp the role of the jury, is another familiar judgment call. If the experts have had any experience in the role of an expert witness at trial, they will know, or will soon learn, where the boundaries are for this. The role of the jurors is to listen to the facts of the case and then to make an informed decision in their verdict. The facts of the Libby case came from the testimony of witnesses and the documents in evidence. Typically, experts can assist the jurors to do their job by providing them with the essential insights of their field that are relevant to the evidence. But they should never tell the jurors what to do with these insights about the evidence. That would be a judge's greatest fear. But getting beyond that fear can be difficult for judges because judges, being guardians at the gate, sometimes infer that it will happen anyway.

When experts have been careful to proffer their testimony in ways that put them in good shape to be admitted, a judge's ruling against them is sometimes based on an unfortunate inference. There are several reasons why judges infer that the expert will cross the line, even if nothing was said that would justify that they might do so. As noted above, one reason is that the expert's potential testimony may not have been made clearly and effectively. Another reason is that judges may fear that skillful attorneys might lead their experts out of their prescribed and proper roles. By then it would be too late and the damage couldn't be undone. Experienced and competent experts know how to avoid this, but the judge has to make an early judgment call about whether it might happen. Other reasons are known only to the Court, but it is suspected that such things as the possibility of lengthening the trial come into play at least once in a while. The judge's fear about the potential of usurping the role of the jury is often based on the judge's fear of a potential courtoom calamity.

Things get even more complicated with Judge Walton's third ground for exclusion: the testimony's prejudicial effect on the jury that would outweigh its probative value. Byran Garner's Dictionary of Modern Legal Usage (Oxford 1995) defines "prejudicial" as "tending to injure; harmful." (WNCD's definition is equivalent) Garner adds that in the legal context, "prejudicial" applies to things and events, in contrast with "prejudiced," which applies to people. So Judge Walton was not saying that the psychology experts were prejudiced, but rather that allowing them to testify in this case would have a harmful or injurious effect. Again following Garner, "probative value" means "tending or serving to prove." So, translating the judge's objection number three leads to: "the harmful effect of the testimony would be greater than what it might prove."

I have no idea what Judge Walton was thinking about here. Perhaps he was right that the psychologists' testimony would have been harmful. But how? If they were indeed bona fide experts in their field of memory and cognition, and if what they had to say would stick to the issues of the evidence, and if this would help the jurors better understand what to do with the evidence when they deliberated, then it's hard to see how this could be harmful. If they were proper expert witnesses, they wouldn't even try to prove anything; they would enlighten the jury about aspects of memory and cognition that laypersons don't often know, things that would help the jurors decide whether the evidence put before them was proved.

On the surface at least, it looks like Judge Walton just didn't want to hear any more expert testimony. Unless he knows more about memory and cognition than the experts know, it would be hard to imagine how he could decide whether or not this would not assist the jury. His fear that the experts might wander off into territory of the ultimate issue of guilt or innocence is just that--a fear. But all this is moot if the experts' proffers gave clear signals that they were likely to go that route.

Judge Walton's fourth objection, the questionable validity of the studies underlying what the experts proposed to say, may or may not have been accurate. I suspect that the lawyers and witnesses must have gone around and around about this in a hearing, assuming that there was one.

The judge's objection here was that the memory research cited was carried out in university research settings. He ruled that the psychologists did not approximate the legal context that contained a vigorous cross-examination, voir dire, closing arguments, and jury instructions. To him, these are very different things. He's right about that. But the larger question is how better can we find out important things about memory and cognition without studying them in contexts that are controlled? I'm sure that the psychologists understood that they were doing experimental, laboratory-style research. That's what they usually do and it's hard to imagine how an experimental research study on memory could take place in a courtoom context. The psychologists must have believed that their research has some useful relationship to the activities of the rest of the world, even to perjury cases like this one. But nothing seems to be quite like the world of law.

Update: Mike Maltz writes to inform me that Judge Walton probably based his decision on the skewering that Prosecutor Patrick Fitzgerald delivered to memory expert Professor Elizabeth Loftus at a pretrial hearing.

Posted by Roger Shuy at 05:35 PM

Hablador de la Casa

There's a long piece by David Montgomery in Sunday's Washington Post about the pragmatic choice made by presidential candidates and other politicians to communicate with voters in Spanish, even among those who strongly support the primacy of English. The article touches on many of the issues discussed on Language Log in the past — from last year's "Nuestro Himno" controversy, to the vote on the Inhofe amendment declaring English the "national language," to the use of Spanish on the Senate floor (including by Sen. Inhofe), to Newt Gingrich's unfortunate "ghetto language" remark and his subsequent apology in what Montgomery calls "grammatically correct Spanish, albeit with a terminally Anglo accent."

Such an article would not be complete without some ribbing of earnest politicians falling desperately short of the mark in their Spanish usage:

No amount of studying can prevent the occasional gaffe. Gingrich was a pioneer of bilingual communication as speaker of the House, but a news release his office issued for Cinco de Mayo in 1998 is still recalled with chuckles in the bilingual halls of power.
The release referred to Gingrich as "Hablador de la Casa" — but "hablador" doesn't mean "speaker." It means someone who talks too much, a big mouth.
Then there's [Mitt] Romney's fiery "¡Patria o muerte — venceremos!" ["Fatherland or death — we shall overcome!"] in Miami. It happens to be a trademark line of Fidel Castro's.
Quoting Castro to Cuban Americans? ¡Caramba!
[Al] Cardenas, Romney's Cuban-born adviser, still winces. "It's one of those you wish you could take back," he says, adding that the speech was not properly vetted.

The news hook for Montgomery's piece is the pending vote on the immigration bill currently before the Senate (S. 1348), which once again includes an amendment from Sen. Inhofe declaring English the "national language." (Last year's amendment passed the Senate but was never enacted into law.) The legislation bears the title "the S.I. Hayakawa National Language Amendment Act of 2007," commemorating the English-only advocacy of the onetime senator (and general semanticist) Samuel Ichiye Hayakawa. Inhofe has also offered a more strongly worded bill (similarly named after Hayakawa) that would make English the "official" rather than "national" language of the US, a more satisfying turn of phrase to activist groups like ProEnglish. With the Democrats in charge of the Senate, Inhofe's proposed legislation would seem to stand less of a chance than last year, but given the charged atmosphere surrounding immigration reform it's hard to predict. Regardless of the result, expect our leading politicians to use more and more Spanish in the coming campaign season, based on the realpolitik of American bilingualism.

[Update, June 5: Some good discussion of the article is going on over at Languagehat.]

[Update, June 7: This year's version of the Inhofe amendment has passed the Senate, though it's likely as meaningless a vote as last year's. Details here.]

Posted by Benjamin Zimmer at 03:06 PM

Annals of automated avoidance

My files on plain speaking and modesty (and various approaches to taboo vocabulary in between) continue to expand, but recent days have brought two especially striking examples of automated avoidance: "hen and ****" on the Royal Society for the Protection of Birds site (asterisking courtesy of Microsoft), and automatic rejection of "XX" (as in the Roman numeral for '20') in Yahoo groups.

First, from Larry Urdang (on the ADS-L), this report from the Daily Telegraph of 1 June:

RSPB website bans use of the word 'cock'
By Stewart Payne

The Royal Society for the Protection of Birds has banned the use of the word "cock" when applied to the male of the species, in case it causes offence.

[Addendum: Adam Kightley points out that cock isn't banned -- that is, asterisked -- on the RSPB site in general, only on its on-line forum, and that the word does appear elsewhere on the site. This is not at all clear from the sentence above, although the succeeding discussion, below, mostly restricts itself to the forum. But note the quote from the moderator below, which refers to "the RSPB website". I suppose I should just stop trusting anything I read in the papers.]

In a move condemned for "taking political correctness too far", a correspondent on an RSPB online forum was surprised to find that his use of the word "cock", when referring to a male blackbird, was replaced with four asterisks.

He challenged the forum moderator over the sensitivity to the word, only to find that once again the asterisks appeared. He wrote: "When is it not in order to refer to a male bird as a **** and a female as a hen? I've heard of PC but that is taking things too far."

The contributor, named as JohnD from Holmfirth, Yorkshire, adds: "It's censorship that is just silly. What should I have said then...the daddy bird...the father bird...the male."

The moderator replied: "It is not political correctness. The issue is words that can be used in an offensive context and we should not forget that the RSPB website has a massive viewing from children.

"Pretty much all internet forums use the same or similar filters. It is far from an ideal situation but it is better to be safe than sorry."

In a second posting, the moderator adds: "Some words have been hijacked for a different and more offensive meaning and it is important to examine the context in which they are delivered because forums have the potential to be read by people of all ages.

"It is not easy to override the system but I have seen this being abused on other forums by careful wording so it is better to be cautious."

An RSPB spokesman confirmed that it did not use the word "cock" on its website, preferring instead to describe birds as either male or female.

"The filter that removes the word 'cock' and replaces it with asterisks is built in to the Microsoft software package we use. This is standard procedure. It is not something that we have added ourselves," she said.

"These filters are designed to remove a range of words the software designers believe some people may find offensive. When someone uses the work cock it automatically replaces it with asterisks. Our moderator is not sitting there making these changes."

John, in Holmfirth, had the final word, writing in another posting: "I was thrilled to see on the bird table a pair of... Parus major."

"As bird lovers will know, a Parus major is a great tit, and while ***** do not get past the forum censor, 'tits' do not cause offence."

Outraged letters followed the next day, including one from the owner of the 300-year-old country pub The Cock Inn, in Luddesdowne, Kent.

Notice that, once again, it's a spelling, in this case c-o-c-k, that's being avoided, not the actual taboo lexical item, in this case cock 'penis'.

To summarize: Who killed Cock Robin? Microsoft. In the software, with asterisks.

On to Yahoo. This report came in from Ken Rudolph this morning. Ken has been overseeing the Yahoo groups sites for motsscons (annual gatherings of people from the Usenet newsgroup soc.motss). Two years ago, the motsscon, the 18th, was in Vancouver, and the site was motsscon_XVIII. Last year, the motsscon was in Minneapolis, and the site was motsscon_XIX. This year's motsscon is in, oh my, Palo Alto, so that the site should have been motsscon_XX. But, Ken reports, "Yahoo groups doesn't allow the word XX to be used in any group name or description", presumably because that's too racy.

For the moment, Ken has added postings for Palo Alto to the Minneapolis site. Meanwhile, another site will be created, but not on Yahoo groups.

As with the automated (and decidedly inept) avoidance asterisking on iTunes I've reported on, it's hard to know whether to laugh or cry about these cases.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:42 PM

Philolsophers

You'll want to check out the philolsophers at flickr. Francis Heaney likes these two best. I agree, but I also enjoyed this one, perhaps because I saw the original at an impressionable age. And this one has promise, though I don't think it's quite there yet.

Again, linguists seem to be lagging a bit, though I have to confess that there are still a bunch of unpublished examples in my inbox.

Posted by Mark Liberman at 08:04 AM

Flowers of babble

Mark Peters has a new column at babble ("a magazine and community for the new urban parent") that runs under the title "Jabberwocky: An Urban Parenting Dictionary". So far he's covered "The peculiar language of the childfree", "Toddlers' words for breastfeeding", and "Nine months' worth of creative ways to say 'pregnant'". Those who are fans of Mark's work at Wordlustitude will not be surprised to find him focusing on quirky coinages ("pregcellent", "bratzilla", "boobamaphone") and memorable juxtapositions ("crib lizard", "baby rabies", "crotchfruit"), all of course impeccably sourced.

One of the striking things about this neologistic exuberance is that it's mostly not motivated by the need to communicate new denotational distinctions. No doubt a future column will feature "200 words for poop", but I predict that it won't be because new parents, urban or otherwise, find the need to distinguish this culturally-important substance along dimensions of color, texture, viscosity, timing, odor and social context. Rather, the motivation for terminological multiplication -- at least the kind featured in Mark's column -- is playful and attitudinizing. These are words that people invent and use for fun, or take from toddlers' speech errors out of simple delight in the cuteness of babytalk.

But some new concepts call out to be coined into words, as another sort of pop lexicography often reminds us. Perhaps the folks at babble have invented a shorter and less solemn term for the "new urban parents" that they're targeting, but I don't see it featured on their site. "Nuppies" is too obvious, aside from being already taken several times over. I look forward to Mark's review of this question.

Posted by Mark Liberman at 06:35 AM

June 03, 2007

Punishing speakers of Australian aboriginal languages

In Australia the Indigenous Affairs minister, Mal Brough, declared on May 24 that "he was considering a plan to restrict welfare payments to aboriginal parents in order to force their children to attend school and learn English." As if the linguistically fascinating but severely endangered Australian languages were not under enough threat already. Brough is concerned that there are some aborigines in isolated areas who "can only speak their own language, which perhaps is only known to 200, 300 or 400 other people." Quite: these languages are at the lower threshold of size with respect to having a sustainable populations of speakers. So his idea is to cut their welfare for not learning the language of the dominant majority. Will Australia never change?

The Economist (June 2, 2007, p. 43) incorrectly attributes Brough's remarks to prime minister John Howard, but correctly notes that for many years Howard has steadfastly refused, despite huge public pressure, to offer any kind of official apology to the aborigines for their appalling treatment by white Australians. The story cites Laklak Burarrwanga of Yirrkala as reporting that she was made to wash out her mouth with soap if she was caught speaking her aboriginal language at school. Worse used to go on: aboriginal children were literally kidnapped by the state and taken away against their parents' protests to be educated far away in English-speaking schools. Brough continues that English-by-force tradition, urging that aborigines to be required to learn English so that they can be absorbed into the mainstream of Australian culture — in other words, so that aboriginal languages and cultures can die and aborigines can become just a dark-skinned under-privileged substratum of English-speaking Australian society.

Plenty could be done to improve the lot of aborigines in Australia without doing anything to insist on their learning English (which is probably going to happen anyway, along with the extinction of the aboriginal languages). Australia has a lot to atone for. Such atonement will probably not occur. The racist politician Pauline Hanson of Queensland is now out of jail (she was convicted for wrongly claiming electoral funding through the fraudulent registration of party members; she was later acquitted on appeal) and is forming a new racist party for the aborigine-haters to vote for. She's not likely to amount to much politically, but the sad fact is that part of John Howard's enormous political success over the years has depended on making sure that he panders to the Pauline Hanson end of the spectrum — the aggrieved white voters who are hostile to both immigrants and aborigines — and scoops up plenty of their votes.

Posted by Geoffrey K. Pullum at 08:55 PM

Omit needless needless

George H.W. Bush, quoted in the NYT (story by Stephen Engelberg, 5/5/89):

NORTH GUILTY ON 3 OF 12 COUNTS;
VOWS TO FIGHT TILL 'VINDICATED';
BUSH DENIES A CONTRA AID DEAL
'No Quid Pro Quo,' President Insists

... The President, clearly hoping to put the Iran-contra issue to rest, derided what he termed "needless, mindless, needless speculation about my word of honor."

Tom Ace, who saved this gem all these years, puts it under the rubric "omit needless needless".

I'll have more to say in a little while on omitting needless stuff, but I thought this one (with Bush protesting rather a lot) stands nicely on its own.

[Note: the first version of this posting attributed the quote to Ronald Reagan, but it seems to have been his successor George H.W. Bush.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:26 PM

June 02, 2007

A potatoe moment

Being an indifferent speller myself, I sympathize:

The occasion was the "Silicon Valley Leadership Group's annual business climate summit", and according to the SF Chronicle's Politics blog,

Duffy Jennings, spokesman for the Silicon Valley Leadership Group, said the banner came from the Clinton campaign.

I was going to ask, "don't big-time politicians have handlers on the look-out for stuff like this?" But I guess the problem is that Silicon Valley hasn't yet invented a spell-checker for banners. The technology is there, of course -- you could start with the cell-phone-camera translator, and just swap in a spell-checker for your ambient environment.

I'd rather not, myself; but then, I can rely on Geoff Pullum to point out my (frequent) typos and (occasional) spelling errors, and I've never been splashed across the media of several continents in front of a multiply misspelled banner.

[For more on the psycholinguistics of doubled letters in English spelling, see:

"Liberal gemination", 6/8/2004
" Orthographic metathesis?", 4/1/2004
" Jeniffer afficionados", 3/30/2004
" The perils of degemination", 3/29/2004
" Conservation of (orthographic) gemination", 3/29/2004
]

Posted by Mark Liberman at 01:35 PM

Dediu and Ladd again: correlations and mechanisms

This is a guest post by D. Robert ("Bob") Ladd, co-author with Dan Dediu of "Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin", PNAS, Published online before print May 30, 2007. As Bob explains, "To avoid confusion with the referents of I and we, I've put this over my own signature only, but it comes from both of us", i.e. both of them.

Mark's discussion of the statistical reasoning in Dan Dediu's and my PNAS paper on the possible link between population genetics and language typology is interesting and useful. For those readers who want to know more about the statistical techniques we used, Dan has put up an addition to our "further information" site, specifically about the stats.

The main point we'd like to emphasize here, since this has been a matter of some misunderstanding (though not on Mark's part), is that we did NOT go trawling through a giant database of gene/language correlations and look for a good one. We were looking specifically to see if the hypothesized correlations between tone and the two genes under discussion stood out from the great mass of gene/language correlations, and they did.

That said, Mark's commentary raises three points that we'd like to amplify a bit.

(1) It's not clear what our statistical null hypothesis should be or how to control for coincidence; as he says, "even in a distribution created by completely random effects ... SOMETHING has to be way out in the tail." We did what we could to rule out coincidence as an explanation for our findings, but in the end it could still just be coincidence.

(2) Consequently, we've gone about as far as we can go with statistics; the only real confirmation that we are onto something will now come from experimental work demonstrating the existence of the hypothesized genetically-induced "cognitive bias" in individuals, followed by studies clarifying the neurological basis of the bias. As Daniel Nettle says in his Commentary on the print version of our paper (appearing soon), our work is really hypothesis-generating rather than hypothesis-testing.
We are now generating precise hypotheses about the nature of the bias, and hope to start testing them soon.

(3) Up to a point, Mark is right that our original hypothesis was not much more than a hunch based on human pattern recognition abilities. (Several referees said similar things on our way to publication.) Specifically, this project began in earnest when I pattern-recognized a connection between the Lahn group's gene maps and my mental map of the distribution of tone languages. But I have been thinking for some time about the cognitive status of tone, paralanguage and other non-sequential linguistic features (go to http://www.ling.ed.ac.uk/~bob/lhulme.html for more detail); Dan's PhD research, starting from entirely different premises based on evolutionary genetics and the study of human prehistory, was looking for evidence of gene-language correlations of exactly the sort we've documented; and of course, we knew that ASPM and Microcephalin are involved in brain development. So if it was a hunch, it was a reasonably well-grounded hunch. Now, it's certainly true, as Mark says, that our geographical correlations would mean more if they had proceeded from some experimental demonstration of some sort of genetically linked, language-related, cognitive/behavioral/perceptual difference. But given the widespread assumption (rooted in the Boasian tradition, but with a significant contemporary boost from Chomsky) that the human language faculty is absolutely uniform across the species, it's very unlikely that we would have been able to get funding to look for such a difference first. So we started by doing something we could do on our own without such support, namely testing the apparent correlation. Having done that, we hope we are now in a better position to apply for funding for the expensive part of the research. This might seem backwards, but it's a pretty common way of doing genetic mapping studies: start from your phenotype, use correlational studies to identify plausibly associated genetic markers, and then try to understand experimentally what the genetic markers actually do.

If we manage to make progress on that front, we will certainly let everyone know.

Bob Ladd

__________________________________________________________________________________

[Guest post by Bob Ladd]

Posted by Mark Liberman at 11:45 AM

Amplifying "faint signals" from the alpha geeks who are creating the future

That's how O'Reilly Media describe their mission. O'Rly Media, on the other hand, have a different goal:

You might say that this is amplifying the noise, but I prefer to think of it as found music.

[Hat tip to Mickey Blake, "budding linguist at the University of Aarhus, Denmark, who also found Danish pronunciation egregiously bizarre to learn and hates the Danish number system". But Mickey, when you're a linguist, "humani nihil" and all that.]

[A comment at I Can Has Cheezburger sketches the core of lolOS:

LOLCODE KERNEL OUTLINE

The Gimmeh and Canhas functions are parsed by the "has flavor" runtime. These are then passed to the DoWant/DoNotWant interpreter. Once this routine is completed all memory is flushed and the routine begins again.

Another commenter observes that the author must be "livin teh Life O Rly".]

Posted by Mark Liberman at 08:05 AM

June 01, 2007

Post on MySpace and get fired!

Bridget Copley writes in with the sad story of David Noordewier, who was fired by Wal-mart for posting the following in his MySpace:

Drop a bomb on all the Walmarts, trailer parks, ghettos, monster truck shows, and retarded fake "pro wrestling" events, and the average I.Q. score would probably double.

Why's this a sad story? After all, Wal-mart can't have its employees publicly insulting its customers. It's pretty harsh to fire him outright but you can see their point; it's a kind of nasty thing to say.

Thing is, Wal-mart didn't fire him for the implied insult. Rather, they fired him (and ensured that he was denied unemployment benefits) because they say he threatened them. He provides the following quote from their 'notice of determination':

"You were discharged from Walmart associates inc. on 2/27/07 for integrity issues. You had a posting on your personal website stating to "Bomb all the Walmarts" to increase the average IQ scores.

David needs to add a semanticist to his legal team, stat! Someone who can do the following:

(a) Explain, in words of one syllable, the concept of a 'conditional conjunction'. Actually for this David could just provide a link to an English as a Second Language website, e.g., here.¹

(b) Explain, with argumentation, the exact conditions that must be met for an utterance to be a 'threat' speech act. Conditionals (conjoined or regular) can be threats -- "Take another step and I'll shoot!" -- but not, I suspect, unless the consequent has a negative effect on the supposed threatenee. That's not the case in David's sentence (unless Wal-mart would be negatively affected if the average IQ score doubled...)

(c) Prove that the conditional conjunction of his MySpace sentence does not have an imperative antecedent, as the Wal-Mart legal department seems to think. An imperative conjoined conditional like "Bomb all the Wal-marts and double the average IQ!" wouldn't be a threat per se, but it would be incitement to violence and perhaps justify the treatment he's getting. But the antecedent here is not an imperative; rather, it's a declarative with a concealed impersonal 'you' subject. ('You bomb all the Wal-marts and...')

The author of this paper could probably do it. Or any of the people he cites.

Comments?

¹This is not an endorsement of the content here; it's just a high-up Google hit with examples of the right kind. I haven't really looked at it properly.

Posted by Heidi Harley at 06:15 PM

Something is ___ in Denmark

But apparently no one is sure what to call it.

Rumor has it the Royal Family has hired a committee of bonobos and orangutans to help out.

[Hat tip: Chandan Narayan]

[Update -- Lane Greene writes:

I *love* that Danish video, because my girlfriend's Danish, and I'm trying to learn it. Grammar is straighforward, even easy, and since I know German I can guess lots of words. But the pronunciation really is impossible. It sounds to me like they are having a big joke on us; the made-up Danish in the video is exactly what real Danish sounds like to me. I think that, based on what I know at the moment, I can understand Swedish more easily. Also, fun fact (in case you didn't get this one): the complexity of the Danish counting system is mentioned (the guy with the bike just hands over a large wad of money, since he doesn't understand how much he owes), but not explained. Here's a bit more.
http://www.olestig.dk/dansk/numbers.html

I don't know of any Western counting system so resolutely non-decimal.

]

Posted by Mark Liberman at 01:05 PM

	every(thing/body/one)	no(thing/body/one)	pos/neg ratio
just about	3.318M	97K	34.2
nearly	2.55M	272K	9.3
virtually	2.288M	1.51M	1.5
almost	3.98M	2.78M	1.38
literally	280K	320K	0.8

	good	evil	good/evil ratio
nearly	144K	2.19K	65.7
almost	178K	39.9K	4.46
almost/nearly ratio	1.24	18.2