July 31, 2005

Well, 10 out of 17 isn't bad...

Only a 41% word error rate! Today's FoxTrot offers a lesson in the value of leaving a few words out of a quote:

Posted by Mark Liberman at 02:42 PM

Bright line missing

Here's another common expression that's not in the standard dictionaries: bright line, in the sense of "clear criterion of demarcation".You won't find it in the American Heritage Dictionary (fourth edition), Merriam-Webster's Unabridged (third edition), Encarta, or the Oxford English Dictionary, but it's widely used these days. Google has 193,000 hits for "bright line", a substantial fraction of which seem to be instances of this expression. Searching today's Google News brings up 104 examples, and Technorati finds 20 blog examples in the past three days.

You could easily figure out what the expression means from these examples, if you didn't already know, and there's also a Wikipedia stub for bright-line rule:

(link) A bright-line rule is a clear-cut, easy to make decision.
In policy debate, it is a topicality standard which argues that the definition is black and white, that one can easily tell whether or not a specific circumstance meets the definition.

and you can find other suggested definitions on the web:

(link) The “Bright Line” is any legal principle that is so certain that every day citizens and businessmen can plan their activities confident of the legal standards that apply.

I wonder what the history of this expression is, and why the dictionaries have missed it.

A sample of today's Google News catch:

The Democrat is eager to make the 2004 tax and budget packages a bright-line issue between him and Kilgore.
That ability seems to mark a bright line between those species and monkeys, who, scientists have long assumed, look into mirrors and see only strangers.
It provides a bright-line definition for people coming in so they know what they can expect.
... there was one thing the MPAA drew a bright line against: "We were told in no uncertain terms that showing the kids smoking or drinking ... was a guaranteed R."
Later in the news conference, he said he fired the columnist because of the paper's "bright line" on illegal activity and "clear understanding" that the law was violated.
The McCreary opinion, however, remains very narrowly crafted and failed to draw a bright line test for displays of religious symbols on government property.

and of the last few days of blog usage from Technorati:

...the “Fair Use Guidelines For Educational Multimedia”, a document published in 1986 that attempted to establish “bright line rules” for determining copyright compliance and educational fair use legality.
Since mathematical operations and algorithms are equivalent, any attempt to draw a bright line between them will be totally arbitrary and likely to lead to absurdities, as indeed it already has.
If our ethical convictions matter enough that we feel compelled to respond to them with bright line rules, why are we undermining our ethical views with deference to utility?
Is one of those a media company and one of thos a communications company, with a big bright line between the two?
I never really said much about Grokster but what the Court did to the Sony bright-line test bothers me for exactly the reasons outlined in the Berkman brief...
It demolishes the bright line demarcating agricultural domestication in human prehistory that sustains this entire portion of Diamond's argument.

The OED does have

bright-line, (a) Physics, applied to a discontinuous spectrum consisting of bright lines resulting from radiation from an incandescent vapour or gas; (b) Photogr., applied to a view-finder in which the area of the picture appears framed by a white line;

and gives this citation for the photographic meaning:

1954 R. H. BOMBACK Basic Leica Technique iii. 39 The most compelling new feature of the Leica M.3 is..the Measuring Bright Line Viewfinder. Ibid. 40 Superimposed on the field of view is the ‘Bright Line’ contour which shows clearly the extent of the image covered by any one of the three standard lenses.

Another point of reference is from the traditional vocabulary of mechanical drawing:

1703 MOXON Mech. Exerc. 24 Drawing, or racing with a Point of hardned Steel, a bright Line by the side of the Ruler.

But I don't see any evidence that the "bright line" used today to mean "a clear criterion of demarcation" comes from any of these sources. The first place I can recall ever seeing it was in legal talk, and as you can see from the examples above, this kind of use is still common.

The earlier example of "bright line test" in the New York Times historical archive seems to be from 9/16/1987, in the transcript of Robert Bork's confirmation hearings. In response to a question by Strom Thurmond about whether his views on the First Amendment have changed, Bork responds (in part)

My views have changed for the simple reason, I was looking for a bright line test by which judges could decide which speech was protected and which was not.

The earliest NYT example of "bright line rule" is in a 3/21/1985 story about the Supreme Court ruling 7-2 that "a 20-minute roadside detention does not violate a narcotics suspect's Fourth Amendment right against unreasonable seizure:

As it did in the bullet removal case, the Court in this case stopped short of drawing a bright line between a permission detention and an illegal arrest.

"Much as a bright line rule would be desirable," the Chief Justice said, "in evaluating whether an investigative detention is unreasonable, common sense and ordinary human experience must govern over rigid criteria."

And in a 9/16/1986 Op-Ed piece, Bruce Babbit wrote that

The truth is that there is no bright line between training and operations and military affairs.

All of the earlier NYT examples of "bright line" that I've been able to find are either about spectroscopy, or a "bright line" of repartee in a play, or a feature of someone's clothing, or "the low bright line of dawn", or some other use having nothing to do with the notion of a clear line of demarcation. (Though there are more than 500 examples, available on line only in hard-to-read page images, and I haven't checked every one...)

I believe that I recall encountering this phrase in things written long before Judge Bork's ordeal, and it wouldn't surprise me to find that the legal use is quite old. Does anyone know when the expression "bright line" entered legal or philosophical talk, and what the original metaphor was? Why is the line "bright" rather than "clear" or "sharp" or "plain" or some other adjective?

There's an interesting take on this metaphor in a quote attributed to Tom Fiedler, the Executive Editor of the Miami Herald, in this 7/29/2005 AP story:

"The line of behavior for us has to be brightly lit. There can be no ambiguity," Herald Executive Editor Tom Fiedler told reporters Thursday.

I've always assumed that the brightness of "bright line" had to do with the reflectance of the line (in contrast to its surroundings) rather than with the overall illumination of the area.

[Update: searching on FindLaw, I found in Justice Frankfurter's concurring opinion in Wilkerson v. McCarthy, from 1949:

If there were a bright line dividing negligence from non-negligence, there would be no problem. Only an incompetent or a wilful judge would take a case from the jury when the issue should be left to the jury. But since questions of negligence are questions of degree, often very nice differences of degree, judges of competence and conscience have in the past, and will in the future, disagree whether proof in a case is sufficient to demand submission to the jury.

This strikes me as the routine deployment of a commonplace expression, not the invention of a new one, and so I'll bet there are older examples to be found. The phrase seems to me to have an 18th-century flavor, though I guess that would be consistent with a later legal origin as well.]

[I note that the online OED has a draft addition from 2003 for "(to draw) a line in the sand", with citations from 1950 onwards. And, inevitably, a Google search for { "bright line in the sand"} is not an empty one.]

[Update: Margaret Marks at Transblawg posts some further information, including several legal dictionary entries. One of them is Merriam Webster's Dictionary of Law, where I had previously failed to find it (due to the fact that the interface to this dictionary at FindLaw was confused by the quotes with which I too-trustingly surrounded the word sequence in my query). The term is indeed in the 1996 edition of Merriam-Webster's Dictionary of Law, and the gloss given there is "a clear distinction that resolves a question or matter in dispute". Dr. Marks give an entry from Garner's Dictionary of Modern Legal Usage, which includes a 1941 citation:

bright-line rule = a judicial rule of decision that is simple and straightforward and that avoids or ignore the ambiguities or difficulties of the problems at hand. The phrase dates from the mid-20th century. The metaphor of a bright line is somewhat older than the phrase bright-line rule - e.g.: "The difficult part of this case comes with regard to ... the activity of the Board of Temperance ....A bright line between that which brings conviction to one person and its influence on the body politic cannot be drawn." Girard Trust Co. v. I.R.C., 122 F.2d 108, 110 (3d Cir. 1941)./"[T]he McCambridge majority opinion ... agrees that the Kirby bright-line-rule is but a mere formalism ...." J.G.Trichter, Bright-Lining Away the Right to Counsel, Tex. Law., 6 Nov. 1989, at 26. Cf. hard and fast rule.

Noting the cross reference at the end of the entry, I can't resist pointing out that a Google search for { "hard and fast line in the sand"} is not empty, nor is { "hard and fast bright line"}, though "hard and fast bright line in the sand" is still unclaimed.. ]

Posted by Mark Liberman at 10:18 AM

July 30, 2005

"Quotations" with a word error rate of 40-60% and more

In my last post, I cited an extreme example of editing Shakespeare for performance, and I mentioned in passing that journalistic quotation is also so selective -- and often so inaccurate -- as to become a form of creation. Since the cited examples were distributed over a number of posts covering other topics as well, I'm reproducing some of the material here for the convenience of interested readers.

The key step is choosing what to include and what to leave out. If you quote people verbatim but change the context, omit qualifications and so on, you can change their emphasis or even make them seem to say something that they never meant at all. However, I'm not dealing with that issue here. Instead, my point is that even after the crucial choice of material and its context of presentation has been made, journalists are remarkably careless about the accuracy of the words that they put in direct quotes in their stories.

Within the past month or so, I've examined the details of this practice in the case of remarks by Rasheed Wallace (here and here), George W. Bush (here, here, here), and Tim Duncan ( here). My motivation was not to beat up on journalists, but to follow up on a remark attributed to Rasheed that caught my eye, and one attributed to W that similarly attracted Eric Bakovic's interest. In each case, a quick survey via Google News showed that there were essentially as many different versions of the quotes as there were journalists quoting them. Comparison with a careful transcription of recordings available on the web revealed that none of the journalists' versions were accurate. For convenient comparison, I then took a look at the reporting of Tim Duncan's remarks in the same post-game press conference in which I transcribed Rasheed Wallace's portion, and found that Tim was quoted even less accurately.

Looking at the New York Times' versions, and putting the original in black and the journalistic approximation under it in red, we have this for Rasheed's quote:


Uh, just- just went at it as- as another good game, 
uh, even though I did a bonehead play the other night, I just made a bonehead play the other night,
had to put it behind me, it was over with, I had to put it behind me
and we just came out here and had to play tonight and I had to come to play tonight

Leaving out the first clause and the "uh", there are 6 insertions and 13 deletions in 31 words, which is a Word Error Rate of (6+13)/31 = 61%, using (a version of) the metric employed in the speech recognition biz.

The same comparison for the quote from W gives

you 've got        people here who are  working to alleviate poverty
you         have   people               working to alleviate poverty
and help rid the world of the pandemic of AIDS and rid the world of the pandemic of AIDS
and they 're working on ways to have a clean environment and ways to have a clean environment

If we split off 've and 're as separate words, we get a score of 1 insertion and 10 deletions in 32 words, which is a WER of (1+10)/32 = 34%. If we don't, we get 2 insertions and 10 deletions in 30 words, for a WER of 12/30 = 40%. The Chicago Tribune's performance on Tim Duncan's quotes is worse, with hundreds of words omitted between fragments that are presented as if spoken continuously, but I won't repeat the details here.

As I observed in earlier commentary, the poor quality of the quotations is mostly due to the practice of using handwritten notes that are not checked against recordings. This would have been a plausible excuse 50 years ago, but it's pretty pathetic now. And (as I also noted), the sense of the speakers is generally preserved in the cases cited above -- although the NYT represented President Bush as committing a grammatical solecism of which he was in fact innocent, while standardizing Rasheed Wallace's lexical choices -- but the same level of approximation could easily result in serious misrepresentation. This can happen because of simple carelessness, or because of prejudicial misperception or memory errors, or because of malice. It's hard to tell and in the end it doesn't matter.

The fact is, the standards for direct quotation in print media are scandalously low, and should be reformed. If a student took similar liberties with print quotations in a term paper, (s)he would given a serious lecture on the responsibilities of scholarship. If an academic scholar did this, it would be grounds for rejection of publications submitted for review, and charges of culpable carelessness if not outright fraud. I'll bet that lawyers or judges who quote law, precedent or testimony this loosely don't retain the respect of their peers, if indeed they ever make it through law school. If a religious authority started "quoting" from scripture using these methods... well, you can go on in this vein yourself.

This doesn't mean that journalistic quotes have to reproduce every stutter and stammer, every self-correction, every um and uh. But there should be an explicit policy about what kind of editing is permitted (or even encouraged), and what is not. And in fact such policies do exist. For example, The New York Times Code of Ethics says that

Readers should be able to assume that every word between quotation marks is what the speaker or writer said. The Times does not "clean up" quotations. If a subject’s grammar or taste is unsuitable, quotation marks should be removed and the awkward passage paraphrased. Unless the writer has detailed notes or a recording, it is usually wise to paraphrase long comments, since they may turn up worded differently on television or in other publications. "Approximate" quotations can undermine readers’ trust in The Times.

The writer should, of course, omit extraneous syllables like "um" and may judiciously delete false starts. If any further omission is necessary, close the quotation, insert new attribution and begin another quotation. (The Times does adjust spelling, punctuation, capitalization and abbreviations within a quotation for consistent style.) Detailed guidance is in the stylebook entry headed "quotations." In every case, writer and editor must both be satisfied that the intent of the subject has been preserved.

This strikes me as just about right, except that it puts far too much trust in "detailed notes", which are hardly ever a reliable guide to speakers' actual words. And I think it should be clear from the examples cited above, which I believe to be typical of current practice at the NYT, that the spirit of this policy is not being followed in practice.

I don't believe that other papers are any better in this respect. Just for comparison, here's the Houston Chronicle's version of Rasheed's quote, which combines fragments from his answers to two different questions, and adds a phrase that (as far as I can tell) he never said at all during the post-game interview being reported on. Again, Rasheed's actual statements are in black, and the Chronicle's quoted version is in red. As the quote appeared in the original article, it was

"I did a bonehead play the other night,", he said. "I had to put it behind me. It was over with. It was no pressure. I don't feel pressure. I had to do the things I needed to do."

Rachel Nichols:
[...]
How emotionally did you approach the game, and how do you feel you played?
Rasheed Wallace:
Uh, just- just went at it as- as another good game, 
uh, even though I did a bonehead play the other night, I did a bonehead play the other night.
had to put it behind me, it was over with, I had to put it behind me, it was over with.
and we just came out here and had to play tonight
Rachel Nichols:
As a group, the Pistons all talk about how you guys are best when your backs 
are up against the wall.
How do you feel that you personally react when you're under pressure?
Rasheed Wallace:
Uh, I mean it's        no pressure, I don't- I don't feel pressure.
                it was no pressure.          I don't feel pressure.
Uh, no matter if it's the game winning shot, or I got the ball, you know, last possession,
I don't feel no pressure. 'Cause you still got to go out there and play.
I had to do the things I needed to do.

And if you want to see something approximating what Colin Hurley did to Shakespeare, take a look at what the Chicago Tribune did to Tim Duncan.

Posted by Mark Liberman at 11:49 AM

Selective quotation at the Globe?

Being a naive and trusting sort of person, I grew up assuming that direct quotations in serious newspapers are accurate, if selective, and that good actors in serious dramatic productions speak the words that the playwright wrote, adding their own inflections, expressions and gestures. Over time, I've learned that both of these assumptions are often false. Both journalists and actors (or their editors) present word sequences that may be remarkably far away from their sources.

A few days ago, I linked to a clip originally presented on a BBC Radio 4 show about an "original accent" production of Troilus and Cressida at the Globe Theatre, giving Colin Hurley's rendition of a few lines from his role as Thersites, which I also transcribed for the benefit of those who might have found them hard to understand (the added color-coding will be explained shortly):

The plague of Greece upon thee, thou mongrel beef-witted lord!
Thou hast no more brain than I have in mine elbows. Scurvy lord!
I would thou didst itch from head to foot and I had the scratching of thee;
I would make thee the loathsomest scab in Greece.

(By the way, though I'm no expert in the history of English, I think that in this case much of the difficulty might be unnecessary. I was taught that the raising of e: to i: in early modern English was essentially complete by 1450, long before Shakespeare's time, and I should have thought that Greece and beef were subject to this process, as opposed to the raising of ε: to i:, which didn't happen until 1700 or so, affecting words like beat and tea; but I digress.)

In order to make sure that I had the quote right, I checked the play, and found to my surprise that the 49 words of quoted material are in fact selected from a sequence of about 240 words from Thersites' part, skipping over 40-odd intervening words spoken by Ajax. Not only that, but the bits are put in a different order from the original, which had the "no more brain than I have in mine elbows" phrase more than nine sentences past the "I would thou didst itch" phrase. (See the end of this post for the details).

When I noticed this, my first thought was that Hurley was just riffing on his part, spinning out a collage of fragments to illustrate the original-accent effect. But now I'm not so sure.

I recently read an interesting article by Joe Falocco on editing Hamlet for production by undergraduates. He describes his experience this way:

I have been working on this project for six months and, if I did not force myself to stop, I could easily work on it for another six. I found that, the more editorial choices I made, the more I began to feel as if I was writing an original script rather than adapting an existing one. While this notion was clearly self-aggrandizing, I flattered myself even further by telling myself that Shakespeare must have experienced the same sensation when working over material from Belleforest and Saxo Grammaticus.

Why is it so hard? He explains

The length of the unedited Second Quarto text of Hamlet is 3732 lines. Even if the actors were to speak at the brisk pace of 1000 lines an hour (and few undergraduate performers can speak this quickly and maintain intelligibility) an unexpurgated Second Quarto Hamlet would run nearly four hours without considering intermissions or time for scene changes. As Polonius so distinctly states, “This is too long” ...

But once you're committed to major cuts, it's hard to maintain dramatic continuity, retain famous lines and your favorite jokes, and so on, without a considerable amount of "creativity", i.e. re-writing. Falocco is writing about a production with special issues, one to be acted by undergraduates at Arkansas State for other students at the same institution -- but an orginal-accent production has at least as many special issues for actors and audience. So it seems possible that the collage of fragments performed by Hurley was in fact a passage from his part as it will be spoken in the production, the result of a similarly laborious and creative process of editing Shakespeare's Troilus and Cressida for this original-accent presentation. Hurley is not the one responsible for the modifications, of course, but it's still not exactly Shakespeare.

Although I don't know much about the history of theatrical performance practice, I guess it's plausible that this is a continuation of long-accepted practices, and nothing new to insiders. I certainly knew that plays are often presented with extensive cuts; but I never realized that such extensive phrase-by-phrase cut-and-paste script-doctoring may be regarded as normal.

Here are the promised details. Colin Hurley's quotation again:

The plague of Greece upon thee, thou mongrel beef-witted lord!
Thou hast no more brain than I have in mine elbows. Scurvy lord!
I would thou didst itch from head to foot and I had the scratching of thee;
I would make thee the loathsomest scab in Greece.

A color-keyed version of the original scene, from an on-line e-text in modernized spelling:

Enter Ajax and THERSITES
AJAX. Thersites!
THERSITES. Agamemnon-how if he had boils full, an over, generally?
AJAX. Thersites!
THERSITES. And those boils did run-say so. Did not the general run then? Were not that a botchy core? AJAX. Dog!
THERSITES. Then there would come some matter from him; I see none now.
AJAX. Thou bitch-wolf's son, canst thou not hear? Feel, then.
[Strikes him]
THERSITES. The plague of Greece upon thee, thou mongrel beef-witted lord!
AJAX. Speak, then, thou whinid'st leaven, speak. I will beat thee into handsomeness.
THERSITES. I shall sooner rail thee into wit and holiness; but I think thy horse will sooner con an oration than thou learn a prayer without book. Thou canst strike, canst thou? A red murrain o' thy jade's tricks!
AJAX. Toadstool, learn me the proclamation.
THERSITES. Dost thou think I have no sense, thou strikest me thus?
AJAX. The proclamation!
THERSITES. Thou art proclaim'd, a fool, I think.
AJAX. Do not, porpentine, do not; my fingers itch.
THERSITES. I would thou didst itch from head to foot and I had the scratching of thee; I would make thee the loathsomest scab in Greece. When thou art forth in the incursions, thou strikest as slow as another.
AJAX. I say, the proclamation.
THERSITES. Thou grumblest and railest every hour on Achilles; and thou art as full of envy at his greatness as Cerberus is at Proserpina's beauty-ay, that thou bark'st at him.
AJAX. Mistress Thersites!
THERSITES. Thou shouldst strike him.
AJAX. Cobloaf!
THERSITES. He would pun thee into shivers with his fist, as a sailor breaks a biscuit.
AJAX. You whoreson cur! [Strikes him]
THERSITES. Do, do.
AJAX. Thou stool for a witch!
THERSITES. Ay, do, do; thou sodden-witted lord! Thou hast no more brain than I have in mine elbows; an assinico may tutor thee. You scurvy valiant ass! Thou art here but to thrash Troyans, and thou art bought and sold among those of any wit like a barbarian slave. If thou use to beat me, I will begin at thy heel and tell what thou art by inches, thou thing of no bowels, thou!
AJAX. You dog!
THERSITES. You scurvy lord!
AJAX. You cur! [Strikes him]
THERSITES. Mars his idiot! Do, rudeness; do, camel; do, do.

The Internet Shakespeare's edition is here, giving the 1609 quarto text:

Enter Aiax and Thersites.
Aiax. Thersites.
Ther. Agamemnon, how if he had biles, full, all ouer, generally.
Aiax. Thersites.
Ther. And those byles did run (say so), did not the generall run then, were not that a botchy core.
Aiax. Dogge.
Ther. Then would come some matter from him, I see none now.
Aia. Thou bitchwolfs son canst thou not heare, feele then.
Ther. The plague of Greece vpon thee thou mongrell beefe witted Lord.
Aiax. Speake then thou vnsalted leauen, speake, I will beate thee into hansomnesse.
Ther. I shall sooner raile thee into wit and holinesse, but I thinke thy horse will sooner cunne an oration without
booke, then thou learne praier without booke, thou canst strike canst thou? a red murrion ath thy Iades trickes.
Aiax. Tode-stoole? learne me the proclamation.
Ther. Doost thou thinke I haue no sence thou strikest mee thus?
Aiax. The proclamation.
Ther. Thou art proclaim'd foole I thinke.
Aiax. Do not Porpentin, do not, my fingers itch:
Ther. I would thou didst itch from head to foote, and I had the scratching of the, I would make thee the lothsomest scab in Greece, when thou art forth in the incursions thou strikest as slow as another.
Aiax. I say the proclamation.
Ther. Thou gromblest and raylest euery houre on Achilles, and thou art as full of enuy at his greatnesse, as Cerberus
is at Proserpinas beauty, I that thou barkst at him.
Aiax. Mistres Thersites.
Ther. Thou shouldst strike him.
Aiax. Coblofe,
Ther. Hee would punne thee into shiuers with his fist, as a sayler breakes a bisket, you horson curre. Do? do?
Aiax: Thou stoole for a witch.
Ther. I, Do? do? thou sodden witted Lord, thou hast no more braine then I haue in mine elbowes, an Asinico may tutor thee, you scuruy valiant asse, thou art heere but to thrash Troyans, and thou art bought and sould among those of any wit, like a Barbarian slaue. If thou vse to beate mee I will beginne at thy heele, and tell what thou art by ynches, thou thing of no bowells thou.
Aiax. You dog.
Ther. You scuruy Lord.
Aiax. You curre.
Ther. Mars his Idiot, do rudenesse, do Camel, do, do.

Here's the 1623 folio text from LION (you can also find a version on the Internet Shakespeare site above)

THE TRAGEDIE OF Troylus and Cressida
Author Name: Shakespeare, William, 1564-1616
Volume Title: Troylus and Cressida (1623)
Publisher: Printed by Isaac Iaggard, and Ed. Blount
Year: 1623
From a copy in the library of Trinity College, Cambridge by permission

Enter Aiax, and Thersites.
Aia. Thersites? /
Ther. Agamemnon, how if he had Biles (ful) all ouer / generally. /
Aia. Thersites? /
Ther. And those Byles did runne, say so; did not the / General run, were not that a botchy core? /
Aia. Dogge. /
Ther. Then there would come some matter from him: / I see none now. /
Aia. Thou Bitch-Wolfes-Sonne, canst y / u not heare? Feele then. /
Strikes him.
Ther. The plague of Greece vpon thee thou Mungrel / beefe-witted Lord. /
Aia. Speake then you whinid'st leauen speake, I will / beate thee into handsomnesse. /
Ther. I shal sooner rayle thee into wit and holinesse: / but I thinke thy Horse wil sooner con an Oration, then y / u learn a prayer without booke: Thou canst strike, canst / thou? A red Murren o'th thy lades trickes. /
Aia. Toads stoole, learne me the Proclamation. /
Ther. Doest thou thinke I haue no sence thou strik'st me thus? /
Aia. The Proclamation. /
Ther. Thou art proclaim'd a foole, I thinke. /
Aia. Do not Porpentine, do not; my fingers itch. /
Ther. I would thou didst itch from head to foot, and / I had the scratching of thee, I would make thee the lothsom'st / scab in Greece. /
Aia. I say the Proclamation. /
Ther. Thou grumblest & railest euery houre on Achilles, / and thou art as ful of enuy at his greatnes, as Cerberus / is at Proserpina's beauty. I, that thou barkst at him. /
Aia. Mistresse Thersites. /
Ther. Thou shouldst strike him. /
Aia. Coblofe. /
Ther. He would pun thee into shiuers with his fist, as / a Sailor breakes a bisket. /
Aia. You horson Curre. /
Ther. Do, do. /
Aia. Thou stoole for a Witch. /
Ther. I, do, do, thou sodden-witted Lord: thou hast / no more braine then I haue in mine elbows: An Asinico / may tutor thee. Thou scuruy valiant Asse, thou art heere / but to thresh Troyans, and thou art bought and solde among / those of any wit, like a Barbarian slaue. If thou vse / to beat me, I wil begin at thy heele, and tel what thou art / by inches, thou thing of no bowels thou. /
Aia. You dogge. /
Ther. You scuruy Lord. /
Aia. You Curre. /
Ther. Mars his Ideot: do rudenes, do Camell, do, do. /

Posted by Mark Liberman at 09:41 AM

July 29, 2005

Bertie and Jeeves as Babus

This is amazing. It seems that P.G. Wodehouse based his parodies of upper-class British speech, in part, on late 19th-century humorous stereotypes of babu English ("the ornate and somewhat unidiomatic English of an Indian who has learnt the language principally from books", according the OED). Apparently, F. Anstey's humorous sketches about Baboo Jabberjee, BA, influenced Wodehouse to the point that he recycled specific malapropisms, mangled quotations and other bits for use by Bertie Wooster, Jeeves and others. R. Devrai at Dick & Garlick points to an available extract from Richard Usborne's Plum Sauce, which quotes (some) chapters and (a few) verses.

A sample of Usborne's sample:

Wodehouse read all Anstey's stuff as a boy, including, as is obvious from his school stories, Vice Versa. But Baboo Jabberjee (which is quoted by name in Love Among the Chickens) was powerfully seminal to Psmith and the quintessential Wodehouse style of false concords. ...

Jabberjee writes:
'As poet Burns remarks with great truthfulness, "Rank is but a penny stamp, and a Man is a man and all that."'
This is a pleasant skid on the banana skin of education. Bertie and Jeeves, you remember, get tangled up in this same quotation at a moment of great crisis.
Rem acu tetigisti, non possumus, surgit amari aliquid, ultra vires, mens sana in corpore sano, amende honorable - these are gobbets of education that Jabberjee uses and Jeeves takes over. And (this is sad) we find that it was Jabberjee, and not Bertie, who first made that excellent Shakespeare emendation, only conceivable through the ears, only translatable through the eyes. Jabberjee writes: 'Jessamina inherits, in Hamlet's immortal phraseology, "an eye like Ma's to threaten and command".'

That last (wonderfully ironic) joke would not have worked in relation to an original-accent production, by the way, since Shakespeare's English was r-ful, and the original has Hamlet saying to the Queen his mother:

Looke heere vpon this Picture, and on this,
The counterfet presentment of two Brothers:
See what a grace was seated on his Brow,
Hyperions curles, the front of Ioue himselfe,
An eye like Mars, to threaten or command
A Station, like the Herald Mercurie
New lighted on a heauen-kissing hill:
A Combination, and a forme indeed,
Where euery God did seeme to set his Seale,
To giue the world assurance of a man.
This was your Husband. Looke you now what followes.

R. Devrai concludes: "Journalists like David Gardner have claimed to find echoes of P. G. Wodehouse in Indian English, but it seems more likely that the reverse is true: Wodehouse's style owes a debt to Babu bombast".

[via Shaswati's Blog]

Posted by Mark Liberman at 08:24 PM

You read it here first

With respect to the textual attestation of dike/dyke in the sense of "to remove by using diagonal cutting pliers", Mark Jason Dominus submitted this fragment of C from a mail library test program by Mark Crispin:

       if (pwd) { 
         strcpy (tmp,pwd->pw_gecos); 
                                   /* dyke out the office and phone poop */ 
         if (suffix = strchr (tmp,',')) suffix[0] = '\0'; 
         strcpy (personalname,tmp);/* make a permanent copy of it */ 
       } 

According MJD, the fragment dates from 1988, well after the 1971 date of the nominal attestation of dikes in Rate Training Manual NAVPERS 10085-B and the 1978 date of the to dike verbal entry in the TMRC dictionary. I presume that the verbal entry for dike in the jargon file also predates 1988, though I haven't been able to locate historical versions in order to verify this. (There's a dike entry in this "Original" version, which however is undated; the links to the 1981 and 1982 versions on Eric Raymond's site are non-functional.) This is the earliest citation that I've seen so far to the "dyke out" combination that's from a real communicative use, rather than a meta-description.

Anyhow, I think it would would be neat to have a historical citation in the OED that comes from a comment in an open-source (or otherwise published) computer program. I'm assuming that there are no source-code citations in that work now.

[Update: Joachim Ziebs asks

I, too, would like to see an open source quotation in the OED. However, I always believed the GNU Public Licence to be "infectious" to other software. Would this apply to books as well?

In this sort of case, the question doesn't arise, because the use of short quoted citations in dictionaries has always been covered by the "fair use" exception to copyright; this should apply to copyleft as well. At least so it seems to me.

And Ben Zimmer points out that there is a non-meta use of "dike out" in a story from the Jargon File:

This time we ran for Richard Greenblatt, a long-time MIT hacker, who was close at hand. He had never noticed the switch before, either. He inspected it, concluded it was useless, got some diagonal cutters and diked it out. We then revived the computer and it has run fine ever since.

If this was in the version of 1983 or earlier, it would trump the source code comment as the earliest published citation for that form. It's too bad that the historical links on elsewhere.org are broken -- I'd almost think that hackers didn't care about history, except that I know this to be false.]

[Update 8/22/2005: Back on the day this entry appeared, Rich Alderson emailed

There is a 1981 copy of the Jargon File available in the MIT TECO EMACS distribution, in the INFO: directory. This is available to all and sundry at Tim Shoppa's PDP software site:

http://pdp-10.trailing-edge.com/mit_emacs_170_teco_1220/01/info/jargon.txt.html

From the header:

If you'd rather not mung the file yourself, send your definitions to DON @ SAIL, GLS @ MIT-AI, and/or MRC @ SAIL.

The last edit (of this line, anyway) was by Don Woods on 81-07-22.

And the full entry reads

DIKE [from "diagonal cutters"] v. To remove a module or disable it. "When in doubt, dike it out."

No stories about RG, unfortunately.

As the current maintainer of MIT TECO and EMACS, I can testify to the accuracy of the file dates on INFO:JARGON.TXT on the Tops-20 filesystem from which they were put on tape, to go to Tim's archives.

Unfortunately I let this email drift up off my current email queue before I acted on it.]

Posted by Mark Liberman at 03:14 PM

Tudor linguistic homogeneity?

I'm used to the BBC's credulity towards outlandish scientific claims, but I thought their staff retained a certain amount of common sense in the humanities. However, there's an extraordinary historical howler in Joe Boyle's 9/19/2995 BBC News story on the Globe's "original accents" production of Troilus and Cressida. And it's not Boyle's suggestion that Shakespeare's dialect is "bizarrely, completely intelligible if you happen to come from North Carolina", which I mentioned in an earlier post.

No, it's the unchallenged assertion by one of the actors that Tudor society was linguistically egalitarian, or at least uniform:

Philip Bird, who plays the Trojan king Hector (pronounced 'Ecter)... says the "earthy, gutsy, grounded" accent forces the actors to find different ways of portraying power and seniority.

"When you're asked to play someone who is powerful or of high status, you act class, you act posh -- but with this production it is not available because everyone spoke the same way 400 years ago."

Why does Nevalainen & Raumolin-Brunberg's Historical Sociolinguistics: Language Change in Tudor and Stuart England have chapters on “Gender”, “Social Stratification” and “Regional Variation”? Well, I don't have a copy of the book and haven't read it yet, but I expect that they include these chapters because they don't believe that in 1600 "everyone spoke the same way."

This belief is not an unexamined prejudice. The main empirical basis for their work is the The Corpus of Early English Correspondence (CEEC), a digital corpus of personal letters written in English between 1410 and 1680 -- "more than 6000 letters written by nearly 800 individuals (2.7 million running words)". In addition to the corpus evidence, there's some contemporary meta-discussion, such as this in Puttenham's 1589 Arte of English Poesie:

I say not this but that in every shire of England there be gentlemen and others that speak, but specially write as good Southerne as we of Middlesex or Surrey do, but not the common people of every shire.

And even in London, is it plausible that the courtiers, the scholars, the merchants and the scullions all spoke the same way? In fact, Shakespeare himself mocks the kind of linguistic pretension that can only exist where there is social stratification and social anxiety. This is from Love's Labour Lost, IV.1:

Armado: Monsieur, do you not educate youth at the charge house on the top of the mountain?
Holofernes: I do, sans question.
Armado: Sir, it is the king's most sweet pleasure and affection, to congratulate the princess at her pavillion, in the posteriors of this day; which the rude multitude call the afternoon.
Holofernes: The posterior of the day, most generous sir, is liable, congruent, and measurable for the afternoon: the word is well cull'd, chose; sweet and apt, I do assure you, sir, I do assure.

And even a small amount of poking around in drama of the period, Shakespeare's and others, gives evidence of social stratification of pronunciation, morphology, word choice and syntax. Here's a small sample, from an anonymous 1639 masque The King and Queenes entertainement at Richmond. The speakers are Tom (a peasant) and Edward Sackville:

Tom hauing discouer'd M. Edward Sackvile standing neere the Queene, as looking on, calls to him.
Tom. O Mr Yedward: M. Yedward.  
M. Sa. How now Tom, whats the matter?
Tom. Good M. Yedward. Helpe mee to spoke with the Queene?
M. Sa. With the Queene Tom. why with the Queene.
Tom. Chaue a Presence for Her.
M. Sa. Thou doest not meane thine owne Tom. she can hardly see a worse.
Tom. Chaue a Million for her.
M. Sa. A Million Tom. that were a present for a Queene indeed. Let him come in, but who hast thou there to helpe thee to bring it?
Tom. Chad not thought you had bin zicke a voole M. Yedward, as if I were not soffocient to bring a Million my zell. Yes, though it were as big as a Pompeon.

This is a few years past Shakespeare's period, but the linguistic differentiation appears to be just the sort of thing that Puttenham wrote about in 1589.

Trying to guess where this strange idea of Tudor linguistic uniformity came from, I can only imagine that someone has become unhinged by learning that many of the shibboleths of today's "received pronunciation" are innovations since Shakespeare's time. As a result, the speech of a Shakespearean noble lacked some features considered essential to "posh" speech today, and included some features that would today be considered regional, lower class or otherwise substandard. So, since Shakespearean nobles did not uphold the standards of today's BBC English, they must have had no standards at all, right? and therefore they were like all those other people with no standards, and therefore everyone must have spoken alike, in a sort of state of nature, all linguistic rabble together.

Anyhow, David Crystal (the linguistic advisor to the Globe "original accent" production) knows more about all of this than I do, and I'm sure that the idea that "everyone spoke the same way" in Shakespeare's time didn't come from him. This leaves me uncertain about whether the quoted actor, Philip Bird, misunderstood something Crystal said, or went on his own through something like the line of reasoning I sketched above, or whether the journalist Joe Boyle made it up, either out of whole cloth or by misleading quote selection, or whether an editor intervened.

It's clear that the information in the quote is wrong. But despite the fact that it's in a quote attributed to a specific individual, in an article with a by-line, it's still not really clear who is responsible for the mistake: that old problem of attributional abduction again.

[Update: Richard Hershberger emailed to observe

It occurs to me that from an actor's perspective there is a kernel of truth. A production using modern pronunciation can also use stratification of modern accents. The audience knows viscerally what is upper class and what is lower class. This is unavailable in an original-pronunciation production. Even if they could affect the proper accents this wouldn't produce the same reaction in the audience. So perhaps there was some discussion of this, and something like "we can't use accents to distinguish between social classes" morphed in the mind of the actor into "they didn't have different accents back then". One of life's little mysteries is why actors are considered authorities on this sort of thing. Similarly, the Earl of Oxford-wrote-Shakespeare crowd uses actors agreeing with them as a talking point, as if this meant anything.

That makes at least as much sense as any other hypothesis I've come up with.

My beef, if I can use that word, is with the BBC editorial process. It seems to me that for anyone with a minimal knowledge of European history, Elizabethan drama and human nature, the statement that "everyone spoke the same way 400 years ago" ought to set the BS detectors chiming. And it's easy to check -- for example, Boyle interviewed David Crystal for the article, and could have called him again on this point -- or any one of dozens of other authorities who would have been happy to return a phone call from the BBC. Nor would this article would have lost its timeliness if delayed by a few hours, since the Globe's production of T & C doesn't even open until August. Based on the amount of preposterous crap that gets into BBC News stories, I've got to conclude that the culture there is simply not to care whether what they write is true or not, as long as it's interesting, strikes the right tone, and aligns with their editorial prejudices. ]

Posted by Mark Liberman at 09:16 AM

July 28, 2005

To hear the blatter of grackles and say invisible priest

In the response to all the the-ing-and-a-ing going on around here, Jason Streed at Finches' Wings has posted Wallace Stevens' poem "The Man on the Dump", which ends:

Where was it one first heard of the truth? The the.

Jason comments that

I think Stevens might be having some fun with the ways various pronunciations of "the" can be combined. The meter of that last foot also seems open to more than one reading--is it a pyrrhic that sounds like two beats on "an old tin can," or an iamb that sounds like a heartbeat, or something else?

Or -- thinking in terms of meaning instead of sound -- is "the the" about the operator for creating a definite description, or about the pragmatic distinction between old and new information, or about the performance practice of rapidly repeating phrase-initial syllables?

In antiphonal response to Jason's citation of "the the" in the writing of Wallace Stevens, I'll give an example of "a a a" in the speech of Associate Justice David H. Souter. This is from the oral arguments in Eldred v. Ashcroft:

but that's I mean that's the
that's the issue al- in the alternative reading
and and why is it
a a a limit case
uh rather than a discretion within a general scheme
kind of
uh clause?

Here's an audio clip -- the person saying "that's right" in the background of the recording is Larry Lessig. I'll have more to say about this passage later -- it's interesting linguistically, as well as a central point (in my inexpert opinion) in the case.

One of the interesting things is that listening to the spoken form in context, you really wouldn't notice Souter's flurry of disfluencies. At least I didn't, until I transcribed it. But the most interesting thing to me is his choice of pronunciations for the and a, in this passage and elsewhere in the argument.

Posted by Mark Liberman at 12:34 PM

A chronicle of signs and sounds

It's often the subeditors and headline writers, rather than the journalists, who make science journalism seem so crappy and trivial on linguistic topics. Take the article in this week's Science Times (in The New York Times on Tuesday, July 26, page D3) about the release of a new edition of the SIL Ethnologue. The Ethnologue is an encyclopedic listing of the languages of the world. The article is serious and sensible. It discusses the difficulties of doing a census of languages, the reasons for the slow increase in the number of languages listed in the book over the years, and the political issue of whether the missionary work and bible translation projects of the Summer Institute of Linguistics contribute to language death rather than language preservation. Interesting — and illustrated with a very interesting map of the world in which countries are given sizes corresponding to the number of languages they host (Papua New Guinea becomes the size of a continent). But what did the morons at the headline desk decide to stick in as the subhead? "Feeling misunderstood? A chronicle of signs and sounds explains why." The Ethnologue is not a chronicle; it is not about signs or sounds; and it does not aim or attempt to explain anything, still less why you might feel misunderstood. But when the topic is language, newspaper editorial staff, even in the Science Times department of the New York Times, believe they do not need to know anything at all about the subject matter or even the text of the article. They know enough already. Not for them any hint of dialect individuation or morphological systems or comparative reconstruction or cognate identification or mutual intelligibility tests or syntactic typology. Language is just funny sounds and signs and words for naming things and it's all about making yourself understood and we can write the subheads without even looking at the article. Michael Erard's nice article deserved better.

Posted by Geoffrey K. Pullum at 11:20 AM

The The, the The The, and The Who


In my last post, I mentioned the music group The The, which I sometimes call the The The. I thought nobody else did, but I was wrong to think I was alone. 

Note that the triple definite description isn't totally mad. For instance, The The afficionados wouldn't bat an eyelid at me saying my The The album,  whereas my The album is completely wrong. In some sense, the first The of The The is part of the group's name.

It doesn't necessarily follow that you should refer to the group in question as the The The, any more than you should refer to this blog as the Language Log. It's just Language Log. So plain old The The also seems logical. But the The The should come up sometimes. First you could legitimately refer to my The The LP as the The The LP that David owns. And second, sometimes people do put an extra definite determiner before a band name starting with The. It's hard to be sure, but I suppose that sometimes this is intentional. Here are some Google examples (my emphasis added):

The opening exhibit will include music by the The Dunes, a Bay Area-based band that plays a variety of North African Rai, Chaabi, and Berber music combined ...

With music by the The Revolution Crew.

Video music by the The Notre Dame des Bananes Choir from their 1997 CD "Ripe For Revolution".

The event will also feature hors d oeuvres, a no-host bar and 50s music by the The Bendover Sisters of the Silver Valley.

with callers Roger Alexander and Marcia Minear and live music by the The Good Old Way

At a press conference in New York City today, the The Who announced the details of its new album, The Blues To The Bush


If adding an extra the is a standard (though dispreferred) usage, it should be no surprise if we sometimes find the The The. And indeed the MSN music site obligingly tries to sell us:

The Singles Of The The by The The The

OK, so that one might have been automatically generated. But I also found many others, including (my emphasis, again):

1961 Matt Johnson of the The The born

One of the The The highlights is "Armageddon Days".

I went to two of The The The concerts, one in Boston at Great Woods

In my opinion the best of the The The incarnations.

May all the rest of the The The's lunatic fringe fill your in box with rude words

Jumping onto the band name wagon with another post, Mark took issue not with the number of definites, but with the capitalization of the prefinal one: he prescribes the The or the the The, and cites as evidence the band's logo. Logos are created with artistic license, and not necessarily intended as indicators of preferred usage, but still, I'll admit this as potential counterevidence to my double capitalized usage.

But what, you may ask, does our friend Norma Loquendi have to say about the issue? Capitalization of definite articles in rock group names is quite variable, but Norma is pretty clear on this one. For many classic bands with definites, lower case is mildly preferred, but for The The, writers couldn't care less about the logo, and uniformly prefer capitalization of the prefinal the. Here's the results of taking the first 50 Google occurrences of "Matt Johnson of * The" where * = "the" or "The", and similarly for other groups.

Search string
% the out of total the or The
Matt Johnson of * The
0
Roger Daltry of * Who
36
Eric Burdon of * Animals 60
Paul McCartney of * Beatles
68
Brian Wilson of * Beach Boys
74
Jerry Garcia of * Grateful Dead
88

For whatever reason, Norma dislikes The Grateful Dead, though she is quite fond of the Grateful Dead. On the other hand, she greatly prefers The Who to the Who. And, like me, she can't stand the The. We are both die-hard fans of The The. Or the The The. Or even, as in a couple of tokens above, The The The.

Posted by David Beaver at 12:12 AM

July 27, 2005

Aux-initial clause with complex subject heard in spontaneous speech!

It was on 26 July 2005, at about 6:24 a.m. Pacific time on the BBC World Service as relayed by KAZU of Pacific Grove in California, that I definitely heard one. I wasn't really listening, but I suddenly heard this sentence, as clear as a bell, and leaped out of bed to write it down. It was a spontaneous, unrehearsed, utterance of a closed interrogative clause with a complex subject containing an auxiliary. For such a clause, the initial auxiliary would not be the first auxiliary in the corresponding subject-initial clause, but rather the second. Real linguistics aficionados will already know why this is important and how it relates to certain theoretical issues in the study of language acquisition; others will perhaps have learned about the topic by reading my previous post on such sentences. For the rest, trust me, it's really important, and I got a real thrill from hearing such a clear case in ordinary conversation. It has been asserted by Noam Chomsky (reference given here) that you could easily go through your whole life without ever hearing one (though he gave no evidence for his statistical claim of rarity). But I heard one. Let me explain.

The sentence I heard was part of a description given by a man who had been a member of an Islamic extremist group. He was talking about what the group had taught him you should ask yourself each day:

(1)   "Is what you're doing enough, or not?"

The relevant part here is very short — just five phonological words: Is what you're doing enough?. The corresponding subject-initial declarative would be What you're doing is enough. It is the second auxiliary, is, that has to be placed in clause-initial position to make the closed interrogative. The first is are (here pronounced in a reduced form with the spelling 're). Choosing that one would yield a hopelessly ungrammatical result:

(2)   *Are what you doing is enough, or not?.

The question has been raised among linguists of how any child could possibly learn the regularity involved here. From all simple cases, it looks as if the closed interrogative corresponding to a declarative clause differs from solely in having the first auxiliary, wherever it might be, repositioned at the beginning of the clause (before the subject rather than after it). That works for all of these:

(3)a.You are an idiot.
 b.Are you an idiot?

(4)a. It is easy to do.
 b.Is it easy to do?

(5)a.She can go out later
 b.Can she go out later?

The sentence I heard, in (1) above, is crucial evidence in that it shows the "first auxiliary" generalization to be misleading and incorrect. The correct generalization is that the closed interrogative corresponding to a declarative clause differs from solely in having the auxiliary of the main clause placed at the beginning of the clause (regardless of what the subject might contain).

Do children learn that from the evidence of what they hear in the speech they are exposed to, or do they just sort of know it instinctively from birth? Barbara Scholz and I have written a lengthy journal article on the question of what sort of investigations might settle that question empirically. It turns to at least some extent on how much crucial evidence that separates the wrong guess from the right one does occur in speech. Well, at least some such evidence does occur. My previous post cited two examples that provide crucial evidence of the right sort, but those were open interrogatives — how-questions, in fact (sentences like How radical are the changes you're having to make?). Those are perfectly relevant evidence too, but what I hadn't been able to catch in my reading and listening until now was an example of the simpler closed interrogative type: no question word like how or who, just an auxiliary before the subject, with another included in the subject.

So now I know: I did not live the whole of my life without hearing one. I heard it yesterday, and I'm still alive. So that settles the question of whether they occur in spontaneous speech. They do. (Geoffrey Sampson actually suspected that Chomsky might be right about their non-occurrence: see his "Exploring the richness of the stimulus", The Linguistic Review 19 [2002], 73-104.)

How many other such examples have I heard in my past life without noticing? And how many did I hear while I was learning English? And could that have been relevant to how I learned to form interrogative sentences correctly? I have no idea. Nobody has any idea.

Posted by Geoffrey K. Pullum at 06:59 PM

Thee bands

John Brewer wrote in today with information about the musical culture of unreduced articles:

You might be interested to know that since at least the middle of the 1980's a usage has obtained among comparatively obscure bands usually playing in a style that revives the so-called "garage rock" of the mid '60's to use "Thee" rather than "The" in the name of the band. Thee Headcoats may have been the first prominent (within a small cult audience) example. A few minutes' Googling turns up Thee Hypnotics, Thee Minks, Thee Exciters (billed as "Southampton, England's answer to the White Stripes"), Thee Flying Dutchmen, Thee Lordly Serpents, and a number of others, along with a bar in San Francisco called Thee Parkside (www.theeparkside.com) where bands of that genre sometimes play live. In my limited experience, "Thee" is usually pronounced unreduced, like the second-person singular pronoun of the same spelling.

John also explains the origin of this meme: Thee Midnighters wanted to avoid brand confusion with The Midnighters:

My best guess is that the original inspiration for this usage was Thee Midniters, who came out of East L.A. circa 1964. (They may have recorded the first version of "Land of 1000 Dances," but you are more likely to have heard one of the subsequent versions by Cannibal and the Headhunters or Wilson Pickett.) I'm not aware of any other uses before the mid '80's revival, but I can't claim my knowledge is exhaustive. An interview with Little Willie G. of Thee Midniters (reproduced here: http://www.victoryoutreach.org/News/latimes.htm if you scroll down to the first article), says that they picked the distinctive spelling to avoid confusion with Hank Ballard and the Midnighters.

Finally, it seems that there's at least one other subculture of "thee":

A similar usage is often employed by Psychic TV and affiliated persons or entities in the titles of albums and songs (e.g. "Beyond Thee Infinite Beat"); I'm not sure why -- they really don't fit in with the garage rock scene, and also tend to use "ov" instead of "of." It appears to have something to do with an occult belief in the "magickal power of language." See http://www.topy.net/faq.html.

David Beaver tells us about a band that elevates the ordinary definite article orthography to full nounhood:

...somewhere I have an LP by The The, or the The The as I like to call them when I meet someone as minutiae minded as me and need to start an argument.

But their newsletter-style website ("This is the The day"), as well the logo sprinkled around their pages (at right), suggests that it's really "the The" or conceivably "the the The". Right?

[Update: Cameron Majidi emailed

I've always thought that the main reason for this (aside from garage-rock tradition) was to ensure that the band's records would be alphabetized under T.

Also worth bringing up in this context is the most outrageous such band name: Thee Michelle Gun Elephant. TMGE were the best Japanese rock band of the 90s. (I understand they broke up within the past couple of years.) They also had the silliest name of any Japanese rock band, and standards of silliness run very high in the J-Rock scene.

A friend who lives in Tokyo was able to give me the scoop on the name. Evidently they'd intended to name themselves Machine Gun Etiquette, after the classic 1979 album by The Damned. But due to a classic bit of bilingual mondegreenism, they ended up as Michelle Gun Elephant. This story doesn't explain how the "Thee" got tacked on to the beginning, but they probably admired bands such as Thee Headcoats, and figured that's how the cool kids do it.

OK, now I need to know: how many other band names are Mondegreens?]

[Update: Joe Gordon points out that Nick Leggatt's Origins of Band Names page says that "Prefab Sprout" may be a mondegreen for "pepper sprout" (but also maybe not), and that Aerosmith "may or may not have taken their name from the 1925 novel by Sinclair Lewis 'Arrowsmith'" (thought that would be more of an eggcorn than a mondegreen, even if true). Joe also points to this other band name page, which seems to be larger but is not so easy to search. However, on the first page I read that CKY is not (as I might have thought) from Cocke-Kasami-Younger (as the parsing algorithm is), nor from "Charlotte Katia York" (as an earlier item had claimed) but rather from Camp Kill Yourself... ]

[Update 10/10/2006 (yes, really!) -- J. Alexander writes:

Re: your July 27 (my birthday, coincidentally. And yes, you did need to know that) 2005 post on Language Log, I've found a band from some time between 1963 and 1966 called "Thee". Unfortunately my only source is the blog post here (third track). If you're still interested in this it might be worth checking around for in your secret linguist societies or whatever it is you have. :)

And thanks for that post, that phenomenon had been bugging me for a while.

Wow. Scholarship, man. And many happy returns!]

Posted by Mark Liberman at 02:52 PM

A film that speaks for itself -- in Mayan?

According to a 7/26/2005 CNN article, Mel Gibson's next movie will be in "a Mayan dialect".

Gibson is due to begin shooting the film, titled "Apocalypto," on location in Mexico in October and is aiming for a summer 2006 release, spokesman Alan Nierob said on Monday.
As with "Passion," Gibson will direct and produce the Mayan-language film from his own script through his own company, Icon Productions, and he will not appear in the movie.
The film's cast will consist of unknown performers native to the region of Mexico where the film is being shot... Few others details about Gibson's project were revealed.
"He lets his work speak for itself," Nierob said.

But some other details about the project are revealed in the next few paragraphs of the story:

The story, which Gibson began writing nine months ago, is described as a "unique adventure" set 500 years in the past. Nierob said the title, "Apocalypto," was taken from the Greek word for an unveiling or new beginning.
A note on the first page of the script says: "The dialogue you are about to read will not be spoken in English." Gibson presumably will have the script translated into Mayan by a scholar of the language and release the film with English subtitles, as he did for "Passion."

I wonder what "Mayan" means in this context: in the Ethnologue, it covers some 69 current languages. I guess the answer will depend on the action's place and time. Although CNN says that the story is set 500 years in the past, a 7/22/2005 article at TimeOut says that it is "[s]et in an ancient civilisation some 3,000 years ago". (TimeOut also says that "the film will apparently be full of action and violence, but will have no religious theme".)

3,000 years ago sounds unlikely for an "ancient civilization" speaking a Mayan language. As I understand it, the classical Mayan civilization dates to about 300-900 A.D. The Spanish conquered the post-classical Mayans in the Yucatan in the middle of the 16th century, which would be closer to 500 years ago -- but then some of the film would be in Spanish. According to MovieWeb, (citing Variety as a source)

Production chiefs went to Gibson's office in Santa Monica this past week to read the script under his watchful eye. Helmer-thesp set secrecy rules because he was distressed that copies of his script for The Passion of the Christ leaked to the media, fueling early controversial reports about the project.

But, um, weren't there some, uh, reasons why the script for the Passion of the Christ was controversial, reasons that wouldn't exactly apply to a non-religious violent epic set in ancient Mexico? Or is there something here I'm not seeing?

By the way, you can tell how un-wired I am, Variety-wise, by the fact that I boggled at the word helmer-thesp. At first I thought it was some kind of strange cut-and-paste error, but a quick Googling set me straight.

Anyhow, I'm waiting for Mel's Gilgamesh -- with dialog in Sumerian!

Posted by Mark Liberman at 08:25 AM

July 26, 2005

The the the and the thee the


Turns out people reduce the and a a lot. In case we doubted it, Mark showed us hereherehere, here and here. But when do speakers produce the fully articulated form, and when do they reduce?

For the, Jean Fox Tree and Herb Clark found the answer back in 1997: speakers mostly use the full form when they can't figure how to say whatever the hell they want to say next, and otherwise they say thuh. An extended theeeee (...uhhh...) may be both a way of biding time while you figure out that troublesome word or phrase, and a way of telling your audience "bear with me a moment: I'm about to say something so amazingly surprising even I can't figure out what it is. No really, it's gonna be absolutely fascinating. Nearly there now, I've totally focused all available neural circuitry on producing something special just for you, so please give me a few centiseconds more of processing time... you're starting to look distracted, but this is *definitely* going to be worth the wait... ok, here it comes...."

So the is normally reduced, but tends to occur with a full vowel, or even a greatly extended vowel, when the speaker is having difficulty planning or producing a following constituent. Furthermore, there's evidence that hearers use this information in real time.

The Fox Tree and Clark abstract gives a good idea of what's in the paper:

Jean Fox Tree and Herb Clark, Pronouncing "the" as "thee" to signal problems in speaking, Cognition 62 (1997) 151 - 167.

Abstract:
In spontaneous speaking, the is normally pronounced as thuh, with the reduced vowel schwa (rhyming with the first syllable of about). But it is sometimes pronounced as thiy, with a nonreduced vowel (rhyming with see). In a large corpus of spontaneous English conversation, speakers were found to use thiy to signal an immediate suspension of speech to deal with a problem in production. Fully 81% of the instances of thiy in the corpus were followed by a suspension of speech, whereas only 7% of a matched sample of thuhs were followed by such suspensions. The problems people dealt with after thiy were at many levels of production, including articulation, word retrieval, and choice of message, but most were in the following nominal.

Clark and Fox Tree think use of full or extended the is a signal (albeit not one we are normally consciously aware of) given by the speaker as part of the coordination game played by conversational participants. Elsewhere on Language Log we discussed a similar argument from Clark and Fox Tree that speakers use uh and um as signals. But can hearers use such subtle indications? Jennifer Arnold, Maria Fagnano, and Michael Tanenhaus later provided evidence that hearers are quite sensitive to this sort of signalling (Disfluencies Signal Theee, Um, New Information, Journal of Psycholinguistic Research, Vol. 32, No. 1, January 2003). Here's a picture they used in their experiment:

[Experimental setup]

When hearers looking at the picture above (and wearing eye-tracking devices) are asked e.g. to put theee uhh camel below the salt shaker, the theee uhh (as opposed to simple thuh) leads them to move their eyes to previously unmentioned objects. Presumably, this strategy is based on an assumption that the speaker is more likely to have processing difficulty when saying something new than when talking about something previously mentioned. So we can use a signal of processing difficulty to help us predict the meaning of an as yet unsaid word. Who'da thought?

I don't know of any equivalent results for full vs. reduced a, but I'll stick my neck out and guess that occurrences of full a correlate to some extent with following disfluencies.

All of this supports Mark's attempt to turn on its head the idea that reduced the/a is a sign of sloppiness. On the contrary, in fluent speech the and a are reduced: it's full renditions of the and a which, at least sometimes, indicate that the speaker is in trouble. That's not to say that full renditions are always bad. Perhaps the very nice full a Mark discusses here in George Vecsey of the NYT's phrase he goes out as a great champion with a clean record is an indication of processing difficulty, but perhaps it isn't. Maybe Vecsey just felt like putting clean record into its own intonational phrase for emphasis or contrast, meaning that an a got stuck as the ending to a previous intonational phrase.

Note that the word clean was probably very carefully chosen, given the swirl of unproven doping rumours that suround the seemingly superhuman Armstrong. So a pause before clean may have been needed either to allow Vecsey to conjure the word up in just the right way, or in order to to give it a perceptually distinctive frame in which to sit. Either way, the a would then need to be fully articulated in order to carry what intonational phonologists call a boundary tone, the marker of the end of an intonational phrase. (Technical note: phonologists distinguish between full intonational phrases and smaller units termed intermediate phrases. But I'll ignore that difference here.) Mind you, putting a at the end of an intonational phrase would itself be linguistically interesting, since it implies that Vescey treats he goes out as a great champion with a as an intonational unit. That's notable because some theories of intonation would forbid such a division of the sentence: he goes out as a great champion with a is neither a syntactic constituent nor (more importantly) a semantically natural unit. Then again, and quite informally, I've noticed that radio broadcasters tend to do intonationally weird stuff right at the end of a story, riding slipshod over semantics for the sake of a final rhetorical flourish. Maybe Vescey was doing his best radio voice.

In the interests of full disclosure of anything that might make me seem vaguely important or knowledgeable about this subject, Jean Fox Tree, who's now at UC Santa Cruz, and I were students together way back when in Edinburgh (yeah, ok, so she hadn't started working on the yet, but you know, the vibes were there man), and Herb Clark has an office a mere stone's throw from mine, though the angles would be tough. And Jennifer Arnold is an old friend who was still completing her PhD at Stanford when I arrived as a baby professor. I figure that by mental osmosis I must now be an expert on pronounciations of the.  Plus, somewhere I have an LP by The The, or the The The as I like to call them when I meet someone as minutiae minded as me and need to start an argument. And since Mark has now set a tough standard (tough for a semanticist like me anyway) whereby each LL post has to have a new audio segment, here is an innocent little number from my The The album, and here is a complete jukebox of their recordings.
Posted by David Beaver at 08:09 PM

July 25, 2005

Could there possibly be a less enticing premise for a blog entry?

That's what Ed Felten asks at Freedom to Tinker, responding to a post by Chris Waigl at Serendipity. As Ed puts it,

"Could there possibly be a less enticing premise for a blog entry than how the blog’s author pronounces the word 'the'? Well, I think the details turn out to be interesting. And it’s my blog."

Well, the details are indeed interesting, but I'm guessing that Prof. Felten wouldn't have gotten to the point of realizing this if someone else's pronunciation had been under the microscope. His reaction suggests a new way to help inform the denizens of the blogosphere about the intrinsic fascination of phonetics -- analyze the pronounciation of Big Bloggers publicly.

This all started with a prescriptivist question on MetaFilter. Bob S. asserted that he always pronounces the and a fully ("thee" and "ay", in an orthographic approximation) rather than in a reduced form ("thuh" and "uh"). I answered with a tongue-in-cheek offer of a wager, because I don't believe that any normal adult native speaker of English talks that way.

Several people then emailed to claim that George W. Bush often fails to reduce these words, in a way that bothered them; I asked for examples, and finally looked for my own. Examining W's nomination speech for John Roberts, I found that he failed to reduce 1 out of 37 phonetically pre-consonantal the's, and 6 out of 26 phonetically pre-consonantal a's. In Roberts' brief response, he failed to reduce 1 of 11 phonetically pre-consonantal the's and 1 of 4 phonetically pre-consonantal a's. I also looked at FDR's "Infamy" speech, and found that he failed to reduce 1 of 24 phonetically pre-consonantal the's, and 5 of 5 phonetically pre-consonantal a's. (Henceforth "phonetically pre-consonantal" → "ph.-pr.").

Then Chris Waigl remembered having noticed something like this in Ed Felten's speech; examining an mp3 discovered on the web, she found the unreduction effect (by my count) in 4 of 54 ph.-pr. the's and 1 of 31 ph.-pr. a's. Ed Felten noticed her post, and commented on it:

It’s not often that you learn something about yourself from a stranger’s blog. But that’s what happened to me on Friday. I was sifting through a list of new links to this blog (thanks to Technorati), and I found an entry on a blog called Serendipity, about the way I pronounce the word “the”. It turns out that my pronunciation of “the” is inconsistent, in an interesting way. In fact, in a single eight-minute public talk, I pronounce “the” in four different ways.

Meanwhile, I noticed one interesting example of unreduced a in a voice-over by George Vecsey. In a four-minute audio clip, he reduces every one of his 24 ph.-pr. the's, and 24 of 25 ph.-pr. a's -- but the single unreduced a was rhetorically interesting.

Summing up the results so far, it looks like non-reduction of the and a is something that everyone does sometimes. But there's a lot to learn about individual and dialect differences, the effects of formality, the rhetorical uses, the effects of phonological and syntactic context, and so on. Chris Waigl points out that "unreduced vowels take a little more time, and command more attention, than reduced ones". She focused on the idea that the extra time might be useful as a "denkpause" (German for "thinking-pause"). I focused on the idea that the extra attention might have rhetorical value, and also on the idea that a stronger juncture may inhibit reduction. Another possibility is that some people think of unreduced articles as more formal, correct and serious, along the lines indicated by the original question on MetaFilter, and therefore use them sporadically as a form of fancy talking.

Chris and I have agreed to work on this together for a while, so be warned -- you haven't heard the last of this fascinating topic. And if you're a blogger with audio on the web, we might be listening to you.

[Update: more from Chris Waigl here.]

Posted by Mark Liberman at 07:15 PM

The phonetic poetics of "a"

In this morning's online New York Times, George Vecsey narrates an "audio slide show" headlined Armstrong: the Legend. In the middle of Vecsey's narration, the following phrase occurs: "...he goes out as a great champion with a clean record." This is Language Log, not Cycling Log, so listen to Vecsey say this phrase , and see if you can hear what the linguistic issue is. Below, I'll explain the point visually as well.

Vecsey's context, of course, is a discussion of doping allegations. But our context here is the contrasting treatment of the two occurrences of the indefinite article a: "a great champion with a clean record". And I warn you that we're going to get deep down into it. If you don't enjoy mucking around in the tangled phonetic roots of communication, just move right along to another of our many fine posts.

As I'll explain, it's pretty clear what Vecsey is doing, in a physical sense, but there's room for interpretation in deciding what his phonetic performance means.

Here are waveform and spectrogram plots of the first subphrase "goes out as a great champion". Time runs from left to right: the waveform display below the transcription shows air pressure as a function of time, while in the spectrogram display above the transcription, the vertical axis shows frequency, and the degree of blackness shows the intensity of sound at a given frequency and time. (Click on the picture to hear the corresponding audio.)

As usual, the pronunciation of this a is quite short: the open portion of the vowel, from the start of voicing after the (devoiced) final [s] of "as" to the closure of the [g] in "great", is just 56 milliseconds. The vowel has a non-descript mid-central quality, mostly determined by assimilation to its context.

The situation is different for the a of "a clean record":

Here the open portion of vowel, from the start of voicing after the [th] of "with" to the closure of the [k] in "clean" is of 267 msec. , or almost five times longer. And the vowel quality is the full "long a" sound, roughly IPA [ej] .

We can place this trajectory on a two-dimensional plot showing the frequency of the first two resonances of Mr. Vecsey's vocal tract, sampling (the main-stressed vowels of) some of the words that he uses elsewhere in his voice-over:

I've used a formant-tracking program to measure the resonances, and tried to chose a characteristic point in the nucleus of each vowel. The results are plotted in the traditional way, with the second formant on the horizontal axis and the first formant on the vertical axis, both plotted so that the origin is (in effect) in the upper right-hand corner. The result is a space analogous to the IPA vowel chart, with fronter vowels to the left and backer vowels to the right, and lower (i.e. more open) vowels lower on the page than higher vowels.

Vecsey's first a is plotted as "a1", indeed a nondescript upper-mid central vowel. The trajectory of his second a is plotted as "a2", moving from a start just a bit higher and fronter than "test"(as it should be for IPA [e]), to an end just a bit lower and backer than "he's" and "believe", appropriate for an IPA [j] as an off-glide.

If you listen to Vecsey's phrase, it may sound to you (as it does to me) as if he pauses very briefly between "a" and "clean". In fact, there isn't really any physical silent pause -- the closure of the [k] in "clean" is only about 94 msec. long, which is not unusually long for the stop gap of a [k] before a stressed vowel. (Thus earlier in the voice-over, Vecsey says "...until I heard that he had cancer", with a stop gap of about 104 msec for the initial [k] of "cancer".) Instead, what's between "a" and "clean" is a "pseudo-pause" -- an extra bit of time into which the previous sound extends itself, the extra 200-odd milliseconds of "a".

But that isn't the only thing that's going on phonetically. There's also the fact that the vowel of the "a" is fully realized rather than reduced. And if you look at the spectrogram shown above, you'll also see some darker and more widely spaced vertical striations at the very start of the "a". These vertical striations show the individual pulses of glottal vibration. For an instant, Vecsey's pitch plummets. He ended "with" at about 102 hz., corresponding to a period of about 10 msec.; he starts "a" at about 42 hz., with a period of only about 2.4 msec., though he's back up to 91 hz. within a tenth of a second. This brief disruption of the pitch is a sign of some kind of glottal stricture, short of a full glottal stop. You hear it as a slightly rough onset to the vowel. In American English, this kind of glottalization is often a sign of juncture.

So what's going on here? I might add that Vecsey's 4-minute voice-over contains 25 intances of pre-consonantal a, and 24 of them work just like the example in the first phrase ("a great champion").

As discussed in previous posts, the usual pattern in English is for the indefinite article, when unstressed and preceding a word starting with a consonant, to be pronounced as a short reduced mid-central vowel. Of course the isolation, citation-form pronunciation is something like IPA [ej] (details depending on dialect), just like the prounciation of the first letter of the alphabet. And under contrastive stress, the same full, unreduced pronunciation emerges, as in a phrase such as "Not the reason, but a reason".

However, from time to time, non-contrastive instances of unreduced a are also found. In an earlier post, I observed that in President Bush's speech nominating John Roberts, 6 out of 26 instances of fluent pre-consonantal "a" were unreduced. Chris Waigl, looking at a speech by Ed Felten, found 1 out of 31 fluent pre-consonantal a's to be unreduced. We might suppose that Vecsey's usage is basically the same as these other two cases, frequency of occurrence aside.

I'm not sure, though. Bush's six unreduced preconsonantal a's don't sound to me as as if they precede pauses, pseudo- or otherwise -- though of course this sort of subjective judgment is often wrong. Listen for yourself and see what you think. I have the same impression of Ed Felten's unreduced pre-consonantal a, which occurs in the phrase "My third example comes from a question that Barbara Sarmonds asked yesterday". (Though Felten does seem to pause between "from" and "a"...)

I think something is going on in the Vecsey example for which the (pseudo-)pause following a is crucial.

Phrase-final instances of a are often given the unreduced pronuncation, even when they're not emphatic. This often happens in the case of disfluency or self-correction: when someone pauses because they're not sure what to say next, or gives up on a phrase after saying "a" and restarts from an earlier point. For example, in the Bush nomination speech previously discussed, at one point the president says:

Before he was a-
before he was a respected judge, he was known as one of the most distinguished
and talented attorneys in America.

The prepausal a is unreduced, even though its repetition in the phrase "a respected judge" is reduced as expected.

And as I said, I hear a hesitation in Vecsey's phrase, realized phonetically as a pseudopausal extension of the a. But why is it there? Was he uncertain for an instant of what to say next? Did he almost make a speech error?

I doubt it. The line "he goes out as a great champion with a clean record" sounds like one of the things that Vecsey worked out in advance, and even rehearsed. Instead, I think that he's using this little pauselet to emphasize "clean record", not by making it louder or higher in pitch (which it isn't), but by setting it off with a tiny little bit of extra space. The elevation of a to the status of a full word then follows from the decision to create this juncture.

Here's a full transcript of Vecsey's remarks on doping, so that you can see the rhetorical context:

The other side of Armstrong's career is ((that)) he raced for several years in Europe, in France,
with French officials and journalists trying to find specific proof that he might have used illegal drugs:
he denied it,
uh he's never been penalized, never had a positive test in legal terms,
so he can say he's the most tested athlete in the world, and he is, the fact that the French just don't believe
that an American, that somebody could have cancer and come back and win the Tour de France,
and he does have his critics,
but
all I can say is that there's no proof --
whatever he's doing, he's still beating all these other people by five minutes
and
as far as I'm concerned, he goes out as a great champion
with a clean record

This emphasis-by-juncture isn't a brilliant rhetorical innovation on Vecsey's part. It's in the standard repertoire of rhetorical performance in English. Vecsey uses it well -- if I'm right that he's using it -- but in any case he didn't invent it.

Now we need to ask whether the unreduced a's without noticeable juncture, previously noted in speeches by W, by Ed Felten and by FDR, are examples of the same phenomenon, or are something different. My guess is that these are a different sort of thing; but I could be wrong. Perhaps they all involve a covert juncture of some sort, whose interpretation might variously be emphasis, care in speaking, or compositional difficulties. In other words, either extra certainty or extra uncertainty about the message.

This requires further research, in Vladimir Nabokov's words, combining "the precision of poetry and the excitement of pure science".

Posted by Mark Liberman at 04:12 PM

July 24, 2005

Shakespeare as a Tarheel, Ajax as a privy

According to a BBC News article by Joe Boyle (7/19/2005),

In August the [Globe] theatre will stage an "original production" of Troilus and Cressida -- with the actors performing the lines as close to the 16th century pronunciations as possible.

By opening night, they will have rehearsed using phonetic scripts for two months and, hopefully, will render the play just as its author intended.

Boyle quotes the actors as saying that

their accents are somewhere between Australian, Cornish, Irish and Scottish, with a dash of Yorkshire -- yet bizarrely, completely intelligible if you happen to come from North Carolina.

You can listen to Colin Hurley, who plays Thersites in this production, reading a few lines in a BBC Radio 4 interview. I've created a URL for just the relevant bit of the BBC RealAudio stream here, and a local clip here in .wav format. The lines that Hurley reads are a collage of fragments from Act 2, Scene 1, where Thersites is cursing Ajax (if you want to see Thersites' lines in context, the play's etext is here):

The plague of Greece upon thee, thou mongrel beef-witted lord!

... Thou hast no more brain than I have in mine elbows ...

... scurvy lord!

I would thou didst itch from head to foot and I had the scratching of thee; I would make thee the loathsomest scab in Greece.

I can't say that I've ever heard anyone from North Carolina who sounds like this. I'm skeptical of the view that this passage will be "completely intelligible" to North Carolinians, or indeed any more intelligible to them than to any of the rest of us. I suspect that this is a variant of the "In the Appalachians they speak like Shakespeare" myth (see myth #9 in this collection, for example), filtered through Boyle's somewhat fuzzy concept of American geography (there's a bit of Appalachia in the west of North Carolina, but most of the state's population lives in the lowland regions).

If you'd like some more, here is Philip Bird reciting an abridged version of Hector's speech from Act IV, Scene 5, first in a modern pronunciation and then in the reconstructed Tudor pronunciation.

You can hear the whole BBC Radio 4 interview here, including a nice exchange with David Crystal, who is the linguistic advisor to the production. Here's a (slightly edited) transcript of what Prof. Crystal has to say in that interview about reconstructing 16th-century pronunciations:

BBC: David Crystal, how accurate can we be in trying to recreate the sound of Shakespeare's English, do you think?

David Crystal: Well, I think we can be about 80% accurate, on the whole. I mean, there are three important sources of evidence for this. One is, as Colin says, the sounds of the puns and the jokes that are in there.

BBC: So you work it out backwards, by saying "here's the joke", therefore this is what it must have sounded like.

David Crystal: That's right. And then the second piece of evidence is the spellings that are in the quarto and folio texts, which actually tell you sometimes how it's pronounced. But the third and the most important piece of evidence is that at the time, there were a group of guys there, phoneticians, "orthoepists" they were called, who actually wrote, in great detail, about how the sounds of English were pronounced. So how do we know that there was that "err" sound after the vowels? Because people like Ben Johnson tell us. They tell us there's a "doggy sound" -- think "grrr", you see -- after the vowels of Elizabethan English.

One of the recovered jokes is Thersites' implication to Achilles that Ajax (= "a jakes" in Shakespeare's pronuncation) is beshitting himself in fear of Hector:

THERSITES. A wonder!
ACHILLES. What?
THERSITES. Ajax goes up and down the field asking for himself.

I wonder how David Crystal's "phonetic scripts", which the actors have been rehearsing from, were expressed. Are actors at the Globe, like opera singers, expected to know IPA? If so, I'll add this to the press kit for the Language Log IPA PR campaign.

[Update: Ben Sadock writes

I think I can shed some light on the slightly cryptic comment about "original pronunciation" Shakespeare being intellegible to North Carolinians. It's not an incarnation of the 'Elizabethan English in the Appalachians' myth; I suspect it actually has to do with the backed/raised /ay/ vowel the actors are using, which is thought to be typical both of the Lumbee ethnolect of Robeson County NC and of the dialect of Ocracoke Island NC.

Ben is talking about a pronunciation that is sometimes caricatured in orthographic approximation as "toid" for tide. This would explain the geography, though the linguistics and the demography are still silly. For one thing, this is one small (and not very confusing) vowel feature among many -- folks from Ocracoke don't say "grace" for Greece, and so on. For another thing, Ocracoke and Lumbee speakers can't be as much as a tenth of a percent of the population of the state of North Carolina. ]

Posted by Mark Liberman at 08:08 AM

July 23, 2005

Analysis and authenticity

A few days ago, I heard Marty Moss-Coane interview Terry McMillan on Radio Times. Near the end of the hour, Moss-Coane asked "I read also somewhere that Ring Lardner was an influence on you as a writer ... why an influence for you?" McMillan answered, in part, that " [Lardner] let me know that you can write the way you talk, and you don't have to apologize for it ... he freed me up."

In the middle of McMillan's long and eloquent answer, one of her points took me aback.

... his work was very conversational, it was easy, but it was very powerful. And it didn't feel like you were being *told* a story. And he didn't apologize, it wasn't beautiful language, it wasn't all metaphors and similes and onomatopeia, and it wasn't, you know, packed with symbolism that you had to analyze. He just told it like he saw it. And I said "thank you, Jesus!"

McMillan's central point is that reading Lardner allowed her to find and use her own narrative voice. But rather than contrasting an authentic and believable voice with a forced and imitative one, she seems to be drawing a line between genuine writing and writing that uses certain rhetorical devices. Or is she rejecting prose that needs explicit analysis to be understood? She wouldn't be the first one to oppose analysis and authenticity. But whatever she means, it can't be exactly what she says, because Ring Lardner's stories are chock full of metaphors and similes and onomatopeia, and so are hers.

I'll give just one example of each from Lardner-- you can easily find more for yourself, if you want. A metaphor:

It was a Saturday and the shop was full and Jim got up out of that chair and says, "Gentlemen, I got an important announcement to make. I been fired from my job."

Well, they asked him if he was in earnest and he said he was and nobody could think of nothin' to say till Jim finally broke the ice himself. He says, "I been sellin' canned goods and now I'm canned goods myself."

A simile (with a pinch of hyperbole as well):

After supper Gleason went out on the porch with me. He says Boy you have got a little stuff but you have got a lot to learn. He says You field your position like a wash woman and you don't hold the runners up. He says When Chase was on second base to-day he got such a lead on you that the little catcher couldn't of shot him out at third with a rifle.

Onomatopeia:

If I was running the South Bend Boosters' club, I'd make everybody spend a year on the Gay White Way. They'd be so tickled when they got to South Bend that you'd never hear them razz the old burg again.

I'll add a case of (reported) metonymy just for fun:

I had a run in with Kelly last night and it looked like I would have to take a wallop at him but the other boys seperated us. He is a bush outfielder from the New England League. We was playing poker. You know the boys plays poker a good deal but this was the first time I got in. I was having pretty good luck and was about four bucks to the good and I was thinking of quitting because I was tired and sleepy. Then Kelly opened the pot for fifty cents and I stayed. I had three sevens. No one else stayed. Kelly stood pat and I drawed two cards. And I catched my fourth seven. He bet fifty cents but I felt pretty safe even if he did have a pat hand. So I called him. I took the money and told them I was through.

Lord and some of the boys laughed but Kelly got nasty and begun to pan me for quitting and for the way I played. I says Well I won the pot didn't I? He says Yes and he called me something. I says I got a notion to take a punch at you.

He says Oh you have have you? And I come back at him. I says Yes I have have I? I would of busted his jaw if they hadn't stopped me. You know me Al. ...

Some of the boys have begun to call me Four Sevens but it don't bother me none.

And here are a few similes, metaphors and such from two pages chosen at random from McMillan's own 1997 novel How Stella got her groove back:

p. 10 He bored me to death. Living with him was like living in a museum. It was drafty, full of vast open spaces and slippery floors.
p. 10 We never seemed to come to any neutral turf where both of our feelings and positions were acceptable or at least tolerable.
p. 10 We sort of kept this demerit scoreboard for the last eight years, until we ran out of space.
p. 10 We were both running on high octane and barely had time for sex anymore...
p. 10 At times I felt like his prostitute and I'm sure on occasion he probably felt that way too.
p. 26 ... this is getting too thick for me and I'm like sinking somewhere low and my heart weighs a ton here lately ...
p. 26 All I know is that I was sort of already using my reserve tank when he left and afterwards being alone took some getting used to.
p. 26 It was like this secret longing I felt to replace the void he left with something or someone else.
p. 26 Haven't walked past him in an airport and felt any current radiate from his body to mine.

It's no more surprising to find similes and metaphors and onomatopeia in the works of Lardner and McMillan than it is to find nouns and verbs and prepositions there. Good story-telling is full of the traditional rhetorical devices. When the ancients isolated these techniques and named them and taught them explicitly, they believed that such analytic instruction would help students learn to communicate more effectively. This belief might be right or wrong, but it's a belief about teaching and learning, not about speaking and writing.

Terry McMillan's remarks suggest that our culture has moved beyond the elimination of linguistic analysis from general education, and even beyond the view that such analysis is intrinsically harmful for ordinary people, towards the peculiar view that the objects of such analysis are themselves undesirable pollutants of the pure stream of authentic communication. (Listen to the distaste in McMillan's tone in this audio clip -- you'd think she was talking about yellow fever and dengue and cholera.) This has something in common with the view that the secret of health is not to allow any chemicals to enter your body.

For the record, here's my transcript of Terry McMillan's answer in its entirety:

Well
you know J.D. Salinger
    in Catcher in the Rye
    Holden Caulfield is reading Ring Lardner.

And uh- when I read it
    and I read Catcher in the Rye late
and I said "who in the world is Ring Lardner?"
    and I loved J.D. Salinger
    um and I found him
I went and looked.
And this is before the internet.

And I found out ((he was a)) sportswriter
    from- in the thirties at ((the)) Chicago Tribune
and I found out he wrote short stories
    and I read this story called "Haircut".

Well, I bought the collection.
It's called "Haircut and other stories".

And he starts out talking, this guy comes in, sits in his barber-
    in the barber seat
    and he says
    you know
the guy sits in his chair, and then he starts talking, he said I heard
    so-and-so and so-and-so and so

and the next thing- you're- there- it's a story!

And by the time he finishes, he says "cut it wet or dry?"

And -- I got chill bumps now -- he-
    he
    gave- he let me know
    that you can write
    the way you talk

and you don't have to apologize for it.

And he was from Chicago, and I think this was in the 30's or 40's
    and his work was very conversational
    it was easy but it was very powerful.

And it didn't feel like you were being *told* a story.

And he didn't apologize, it wasn't
    beautiful
    language it wasn't all
    metaphors and similes and onomatopeia

and it wasn't
    you know
    s- packed with symbolism that you had to analyze

he just told it like he saw it.
And I said "thank you, Jesus!"

And he freed me up.

You can find Ring Lardner's 1926 story Haircut in various places on the internet, if you could use some narrative freedom yourself.

Posted by Mark Liberman at 11:06 AM

July 22, 2005

Poteaux roses

Chris Waigl at /ser.ənˈdɪp.ɪ.ti/ presents a "long overdue post on French eggcorns, with an introduction and (in the second installment) a collection of about 40 of poteaux roses". The first example, of course, is the name: pot aux roses (= "pot of roses") is part of a French idiom découvrir le pot aux roses, literally "discover (or uncover) the rose pot", used to mean "to find out what's going on" or "to accidentally uncover a scandal". Since the role of the rose pot in this idiom is fairly opaque, it's not suprising that people often render it (whether jocularly or in error) as (the homophonous expression) poteau rose (= "rose-pink post"), an item whose discovery might in some contexts be felt to be the essence of the matter.

Back in February, Language Hat discussed this idiom, with a variety of scholarly links as well as a reference to the fact that Chris had chosen it as the most appropriate French term for the sporadic folk etymologies that we've come to call eggcorns in English. In a comment on Hat's post, Chris explains:

TLF and the Robert Historique de la Langue Française agree that the original meaning was récipient contenant de l'essence de roses, i.e. a recipient for rose essence or perfume. Découvrir is then understood literally, as dis-cover, take off the cover, which sets the scent free. What I find so attractive about poteau(x) rose(s) (lit. "pink pole") as an equivalent for "eggcorn" is not it is a common misspelling for pot aux roses (it isn't; the substitutions are overwhelmingly jocular, including in film and literature), but that it has undergone the double eggcornification process, just like &ae;cern->acorn->eggcorn. In both cases, the first step has been mostly forgotten, and only shows up as a pinch of folk etymology in the history of the word/expression.

Posted by Mark Liberman at 08:42 AM

July 21, 2005

Of four parts, one

A few days ago, I posted about the latest fuss from Tokyo's buffoon of a mayor, Shitaro Ishihara, who is being sued for responding to some insubordinate language teachers by asserting that French "is disqualified as an international language because French is a language which cannot count numbers", and that "guys desperately clinging to such kind of language are lodging opposition for the sake of opposition". I haven't heard that Mayor Ishihara has gotten any letters of support from Sergei Brin or George W. Bush -- though they might well be sympathetic to his conclusion in this case -- so I thought I would pitch in and produce one myself.

Well, not really. I might have defended William Safire against Steven Pinker and even against David Beaver, but Mayor Ishihara is going to have to deal with the French teachers without my help. However, his remarks about number names and counting, while foolish, are related to some real psycholinguistic results, which he may perhaps have heard about.

Apparently Ishihara's concern is with the irregular patterns of certain French number names, such as quatre vingt ("four twenty") for 80, soixante quinze ("sixty fifteen") for 75, and so on.

The theory that transparent number names are cognitively helpful was proposed in 1987, in a paper by K.F. Miller & J.W. Stigler ("Computing in Chinese: Cultural variation in a basic cognitive skill", Cognitive Development, 2, 279-305). This study found that four- to six-year-old Chinese children could count higher, with fewer errors, than U.S. children. Because the Chinese number system is more regular (e.g. eleven is "ten-one", twelve is "ten-two", and decade names are similarly compositional, the conclusion was that "systematically organized number names facilitate Chinese children's understand of counting."

In 1999, I.T. Miura and others (I.T. Miura, Y. Okamoto, V. Vlahovic-Stetic & C.C. Kim, "Language supports for children's understanding of numerical fractions: Cross-national comparisons", Journal of Experimental Child Psychology, 74, 356-365) found that similar linguistic transparency appears to help with the understanding of fractions. Korean, Croatian and U.S. children were tested, all first- and second-graders who had not gotten any prior classroom instruction in fractions. Each trial involved showing a picture like this one:

A fraction name (like 2/3) was then read aloud in the child's native language, and the child was asked to circle the picture corresponding to the name. The main result was that the Korean children out-performed their Croatian and American counterparts. The suggested explanation is that in Korean (as in Chinese and Japanese), the names of fractions explicitly mention the idea of fractional parts, so that the Korean name for "one fourth" is (in word-for-word glosses) "of four parts, one".

Jae Paik and Kelly Mix ("U.S. and Korean Children's Comprehension of Fraction Names: A Reexamination of Cross-National Differences", Child Development 74(1) 144-154, 2003) started out by replicating the findings of Miura et al. In the new study, 40% of Korean first graders got >5 out of 8 fraction problems correct, while among American first graders, only 31% did. This was a smaller difference than Miura et al. found -- Paik and Mix speculate that the relatively poorer performance of their Korean students, compared to those in the Miura et al. study, might be due to the socio-economic differences, and also might be due to the fact that their experiments were not performed by the children's classroom teachers.

Now comes the fascinating part. Paik and Mix also did an experiment with American students in which artificial English fraction names were used. Specifically, they tested five fraction-wording conditions, shown in the table below. Below each wording is the percentage of first-grade students getting at least 5 of 8 answers correct.

  Denominator then numerator Numerator then denominator
Part-whole relations explicit
"of four parts, one"
67% >5 of 8
"one of four parts"
57% >5 of 8
Part-whole relations not explicit
"four-one"
15% >5 of 8
"one-four"
25% > 5 of 8
Standard English fraction name
"one fourth"
31% >5 of 8

But the Korean first-graders' score, on the very same test, was only 40% >5 of 8 correct! When the idea of fractional parts was made explicit, the American students did substantially better than the Korean students -- even though the best-performing fraction name "of four parts, one" has an Elizabethan flavor that must seem quite strange to American six-year-olds!

Paik & Mix offer the following explanation:

In English, the word parts is in children's everyday vocabulary. By the time children enter grade school, they usually understand the word parts, even if they have not mastered the idea of equal parts. In contrast, the Korean word for parts that is used in fraction names (i.e. boon) is not in children's everyday vocabulary before formal schooling. This word is actually borrowed from Chinese and is not introduced to children until they are taught fractions in school. Until then, children use informal words to refer to parts in their daily lives (e.g. jo-gak). ...

An analogy might be if we called 1/4 "of four morceaux one" in English.

(There's that numerical inadequacy of French again!) Anyhow, these results suggest that the problems with French as a mathematical language, asserted by Mayor Ishihara, are real but entirely superficial. The lack of transparency in number names will surely make things harder for learners, and conceivably even for adults -- but it's easy to change things by using positional naming (e.g. "sept cinq" rather than "soixante quinze") or other expedients (see this earlier post for some evidence of entirely non-linguistic methods).

And of course the bad educational consequences of the absurd Japanese orthographic system trump any possible disadvantages of French number names...

Posted by Mark Liberman at 09:15 PM

Grammar school for scandal: Rove suspected of parsing

Barbara Partee wrote in to draw attention to a quote in Anne Kornblut's 7/17/2005 NYT article "'Indispensible': Does It Have a Shelf Life?":

One former Republican official who retains close ties to the White House said there could be a political cost for keeping Mr. Rove on board even if he is found to have done nothing illegal. "If Karl survives, he does so at the president's political expense," said the former official, who spoke on condition of anonymity because he did not want to be seen as disloyal to Mr. Rove.

"George W. Bush came into office promising two tenets that are in competition now: straight talk, non-parsing - and loyalty," the former official said. "He's either got to choose loyalty or straight talk. He can't do both." [emphasis added]

Barbara's comment: "Unless 'parsing' has come to mean 'playing sophistical linguistic games' or something, I don't understand why not parsing should be a virtue!"

My own first reaction was that it's nice to see interests of lexicographers given such weight by the media. The NYT's policy on Confidential News Sources reads in part

The use of unidentified sources is reserved for situations in which the newspaper could not otherwise print information it considers reliable and newsworthy.

Since the content of the quote from the "former Republican official" is banal conventional wisdom, except for this striking word usage, the justification for using it must be lexicographic, right? But in fact, the basic information about this usage turns out to be widely available.

To start with, the OED cites an established use of parse to mean "examine minutely", as in

1788 F. GROSE Art Caricature 14 When you wish to draw a face from recollection you must well commit it to memory, by parsing it in your mind (as schoolboys term it) by naming the contour and different species of features of which it is constructed.
2001 Newsweek 17 Dec. 60/1 Science has parsed nearly every move of every Olympic event and figured out what athletes must do to bring back the gold.

Searching Google News this morning for journalistic uses of parsing, I find some examples of this sort, like :

Armchair quarterbacks across Stark County are parsing the possible fate of North Canton’s Hoover plant.

Love those parsing quarterbacks! But there's a new sense of parsing in the air, which seems to involve very careful consideration of the meaning of words and phrases, either to design them to mislead in the first place, or to explain or excuse them in retrospect:

Is Tom Tancredo at all sorry for what he said? Oh, please. He is parsing what he said down to the very nub. (Rocky Mountain News, 7/20/2005 - link)

There is also a collocation "legal parsing" or "legally parsing", which associates this kind of behavior specifically with lawyers or lawyer-like behavior, and can be applied to deeds as well as words:

Whatever else Mr. Daley has accomplished for Chicago — and the list is immense — sorrow, legal parsing and overdue reforms won’t be enough.

An editorial in the Macon Telegraph makes the Rove/parsing connection, under the headline "Legal parsing of words does disservice to truth",

We now know when Clinton said, "I did not have sex with that woman, Monica Lewinsky," he was legally parsing his words. When Rove said last year, regarding Plame, "I didn't know her name and didn't leak her name," he was legally parsing his words, too.

Since almost no one learns about literal parsing any more, it's not surprising that the metaphorical sense of parse as "examine very carefully, especially with respect to possible alternative interpretations" has come to dominate. More recent examples in re Rove:

We obfuscate when we parse the meaning of ‘is’ or ‘name’ to cover our actions rather than illuminate them.
A president uncomfortable with moral ambiguities cannot be happy having to parse what the meaning of the word "name" is.
The way Washington parses these pronouncements, Rove should have been packing his suitcases for a move back to Texas.

Of course now, the press is busy finding ways to apply parsing to John Roberts:

As we parse the record, it seems clear that Roberts qualifies as someone who takes the law seriously and won't try to legislate from the bench.
My predictions on how Roberts will vote when he (almost certainly) gets to the court are not based on a close reading of his scholarly writings - there are almost none - or a parsing of his decisions as an appellate judge.
The blogs were quick out of the box this morning in parsing President Bush's choice of John Roberts to fill the Supreme Court seat left vacant by Sandra Day O'Connor's retirement.
By allowing the opposition candidate to get by with parsed words and nuanced policies, we failed to give the people a reasonable alternative.
...already there are Democrats parsing every word that he has written on civil rights, on abortion, on environmental protection, on criminal law.
Washington's business representatives are also doing their share of judge-parsing...

Posted by Mark Liberman at 08:52 AM

July 20, 2005

Curses!


Mark Liberman stubs his toe and considers abusive language, asking some questions:

I'm sure that people in every culture insult one another. But is there a counterpart in every culture to "abusive language"? Is the "abusive language" of insults always analogous to (and sharing expressions with) the "curse words" that express pain or frustration? What about ways to indicate emphasis?

and providing some bibliography on swearing.  I don't have the answers, but I have a few observations and more bibliography, including a recent book that takes linguists to task (unfairly, I claim) for having largely disregarded the taboo vocabulary of English.


The recent book:

Wajnryb, Ruth.  2005.  Expletive deleted: A good look at bad language.  NY: Free Press.

This is meant for a general audience and relies heavily on a few earlier publications: Hughes's Swearing and Jay's Cursing in America, which Mark cites, plus:

Allan, Keith & Kate Burridge.  1991.  Euphemism and dysphemism: Language used as shield and weapon.  NY: Oxford University Press.

Andersson, Lars-Gunnar & Peter Trudgill.  1990.  Bad language.  London: Penguin.

Dooling, Richard.  1996.  Blue streak: Swearing, free speech and sexual harassment.  NY: Random House.

Jay, Timothy.  1992.  Why we curse: A neuro-psycho-social theory of speech.  Amsterdam: John Benjamins.

Kidman, Angus.  1993.  How to do things with four letter words: A study of the semantics of swearing in Australia.  BA Honours thesis, Linguistics, Univ. of New England, Armidale NSW.  Available online.

Montagu, Ashley.  2001.  The anatomy of swearing.  Philadelphia: Univ. of Pennsylvania Press.

For the record, I note that Allan, Andersson, Burridge, and Trudgill certainly count as linguists; in fact, Wajnryb labels Allan and Burridge as "linguists" or "academic linguists" when she mentions them.  Among the other linguists she cites are David Crystal and Connie Eble; meanwhile, Erving Goffman, Steve Pinker, and Jesse Sheidlower surely have at least fellow-traveler status.  There are, however, a fair number of "academic linguists" she doesn't mention, but should have.  I'll get to that in a second.

The case against the scholars begins on page 5, where Wajnryb notes the reluctance of (some) lexicographers to treat "bad language".  On page 6 she floats an explanation:

The fact that serious word people have been so hesitant to take the plunge [and discuss taboo vocabulary] perhaps carries a message for the would-be researcher.  Part of the problem is that it's hard to write about SHIT, FUCK, CUNT, and their fellows without using the words themselves.  Although it has been done.  In 1948 one Burges Johnson succeeded in writing a book about swearing, rather romantically titled The Lost Art of Profanity, without once mentioning any of the naughty words.  And Jesse Sheidlower wrote a famous book called The F-Word, but such an endeavor can't have been easy.

Now, either this is incredibly sloppy writing, or else Wajnryb has never actually looked at The F-Word, in either of its editions.  The title avoids the word FUCK, for obvious reasons, but the book itself shrinks from nothing.  Wajnryb should check it out.  (She might also want to look at the work of Allan Walker Read, which goes back more than seventy years.  And the journal Maledicta, now up to volume 14.)

The indictment extends to linguists in general on page 7:

... I find it strange that linguists have allowed themselves to be affected by the taboo to the point that its exploration has been underresearched...  Of course, in referring to the lack of interest in my topic, I mean the absence of academic investigative interest.

What the fuck am I, chopped liver?  Studies Out in Left Field (edited by Zwicky et al., originally published in 1971 and reprinted by Benjamins in 1992) contains a number of papers by "academic linguists" (most notably, Jim McCawley) which are light-hearted in tone but entirely serious as pieces of linguistic analysis.  Wajnryb mentions neither SOLF nor McCawley (Kidman mentions both, though McCawley is cited under his pseudonyms).  On pp. 162-3 she notes that there are interesting facts for linguists to look at:

Swearing is culturally and linguistically shaped in other ways.  For example, it has its own grammar, dependent on the language in which the swearing takes place.  Take, for instance, the English sentence "Who the hell has been here?," which is probably derived [historically? synchronically?] from "Who in the hell has been here?." just as "What the fuck are you doing?" may be derived from "What in the fuck are you doing?"  Here the ordinary rules of English grammar combine with swearing-specific grammatical constraints, such as the use of "the" before "hell" and before FUCK, to give us a grammatically well-formed utterance.

Annoyingly, Wajnryb has chosen to illustrate the grammar of swearing with a construction -- postmodifying on earth, in the world, in (the) hell, the shit, the fuck, etc. -- that has received the attention of English syntacticians for over thirty years, but without alluding to this literature and without mentioning what most of us think are its most interesting features: that this postmodification is possible only with WH expressions, in fact only with interrogative (not relative) WH expressions; that these postmodifiers are strictly ordered with respect to postmodifying else; and that the postmodification is possible only for single-word WH phrases.  Someone with time on their hands could easily spend a good bit of it putting together a bibliography of places where this construction has been mentioned by syntacticians.

Jumping ahead to more recent literature, there's the analysis of NPs like (doodly) squat, (jack)shit, and fuck(-all) in sentences like You (don't) know jackshit about linguistics -- by, among others, Larry Horn ("Flaubert triggers, squatitive negation, and other quirks of grammar", in the 2001 volume Perspectives on Negation and Polarity Items, edited by Hoeksema et al.) and Paul Postal ("The structure of one type of American English vulgar minimizer", chapter 5 in his 2004 collection Skeptical Linguistic Essays).

That's a tiny taste of syntax.  In the world of phonology, Wajnryb mentions in passing (on p. 35) the insertion of expletives in emphatic forms like infuckingcredible, which she labels "the integrated adjective", following Geoffrey Hughes.  She seems not to know that there are hundreds of mentions of this phenomenon in the technical literature on phonology, the central work being John McCarthy's 1982 Language paper, "Prosodic structure and expletive infixation".  (The phenomenon is also known in the linguistics literature as expletive insertion, fuckin' insertion, and probably other things as well.)  Again, there's a job here for an earnest bibliographer -- and McCawley would once again be in the bibliography.

Sociolinguists haven't neglected taboo vocabulary, either, but maybe it's time to wrap up this topic and move to another.  To summarize so far:  It's very very far from being the case that "academic linguists" have ignored the taboo vocabulary of English.  We can fairly be said to have reveled in it, in fact.  (Linguists tend to be playful people.)  Wajnryb wasn't aware of this, probably because her own interests are in the psychological and social aspects of swearing, and this is where the interests of her readers undoubtedly lie as well, so she's tended to depend on people like Jay (a psychologist) and Hughes (a historian of the English language), dipping into linguistics mostly in easily accessible works intended for a general audience.  But this means she has no right to get all pissy about "academic linguists".

On to matters where linguists might not have a whole lot to offer, like the psychological and social functions of swearing.   Wajnryb distinguishes three kinds of swearing: cathartic swearing (called "annoyance swearing" by Burridge and Montagu), abusive swearing (insulting), and social swearing (what I think of as "social glue", marking solidarity).  Respectively: "Oh fuck, my computer just crashed" and "Fuck you, asshole!" and "How the fuck are ya doin', you old bastard?"  To which we might add Mark Liberman's emphatic swearing, as in "This posting is fuckin' brilliant!".  I'm not sure where this leaves more-or-less literal uses of taboo vocabulary, as in "I want you to fuck me, hard, and then suck my cock"; given an occasion where I actually want to communicate these desires to someone, perhaps urgently, it's hard to imagine doing so without dipping into the taboo vocabulary (or sounding ridiculous).  Sometimes these are just the right words for the job.

Wajnryb tends to view cathartic swearing in terms of the hydraulic metaphor: pressure builds up, and swearing relieves the pressure.  And she tends to view abusive swearing as displaced aggression: instead of hitting someone, you swear at them.  I have problems with both of these (extremely popular) ideas.

My objection to the hydraulic metaphor is, in fact, that it's extremely popular: it's just a bit of folk psychology, a cultural schema, retailed as an explanatory account of behavior.  No doubt a lot of people experience cathartic language as the blowing off of steam, but that's surely because that's the way we've learned to configure the experience.  There are, after all, plenty of cultures where people interpret bad events as the result of witchcraft -- because they learned that there are witches and learned what witches do.  From within the culture, such experiences and understandings are real enough, but they aren't scientific explanations.  (Note that cathartic language covers a lot more than cathartic swearing.  Ow and ouch are cathartic, but not swear words.)

My objection to the displaced aggression idea is that it covers so little of the territory of abusive language.  Some abusive language expresses retributive or pre-emptive aggression, I'm sure, but there are plenty of other motives: contempt, disgust, dissociation from The Other, assertion of superiority, at least.  (Note that abusive language covers a lot more than abusive swearing.  Idiot and moron are abusive, but not swear words.  Compare them with cocksucker and dickhead.)

The question of universality is vexed.  Literal swearing is usually said to depend on the taboos of particular cultures, and there's clearly a connection, but it's also clear that the connection isn't very tight.  Everybody knows that plenty of words in taboo areas aren't swear words, and it's also possible for some taboo areas to have little or no taboo vocabulary associated with them: money, in particular income, is a very sensitive area in American culture, but there are no clear financial swear words in English.  So literal swearing is significantly conventionalized, dependent not only on cultural taboos but also on conventional restrictions on how certain lexical items are to be used.  The question is then whether all languages have vocabulary that is conventionalized in this way.  But what counts as "in this way"?  Where's the line, if any, between swearing and merely abusive language?  Are retard and dago swear words in English?  You begin to worry that the metalanguage we're using just isn't up to the job, and to think that maybe it's time to call in the philosophers, as Mark Liberman suggests.

Wajnryb doesn't answer these questions, though in chapter 12 ("Cross-culturally foul") -- where she maintains that the Japanese insult one another by using the system of politeness and respect marking creatively -- she suggests that abusive language might be universal.  But abusive swearing?  Literal swearing?  Cathartic vocalizations that are conventionalized?  Cathartic vocabulary?  Cathartic swearing?

Wajnryb does observe that the same taboo vocabulary tends to be re-used for many different functions (as in my examples with fuck in them, above), an effect she attributes to there being such a "small reservoir of swear words" (p. 25) for so many functions.  In the case of social swearing and emphatic swearing, such re-use is essentially guaranteed.  Social swearing involves the use of literal, cathartic, or abusive swearing to demonstrate closeness and trust: we're such asshole buddies that I can use this language with you.  And emphatic swearing is just literal, cathartic, or abusive swearing bleached of its denotative content, leaving only the connotation of "strength".

Dammit!  This posting is way the fuck too long.  (And what's going on with way the fuck?)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:48 PM

Of Thee (and Ay) I sing

In response to my post about the pronunciation of the and a, several readers wrote in to draw my attention to (what they perceive as) the unexpectedly frequent use of unreduced forms of these words by President George W. Bush. I've taken my title from one of these messages. In general, my correspondents clearly find this aspect of the president's speech to be annoying: they use words like "childish", "stilted, staccato sound", "over-articulated", and so on.

I was puzzled by this, because in preparing my post I looked at a Bush audio clip and transcription that was lying around -- his speech at Gleneagles after the London bombing -- and found that all the (phonetically pre-consonantal) the's and a's were reduced as expected. I didn't include that observation in my post on the and a, because my goal there was to counter the strange view that all reduction of the and a is a symptom of the Decline of the West, and I figured that some people would just assimilate evidence about W's pronunciation to their general low opinion of W's linguistic abilities.

So I asked these correspondents to cite me some citations, with pointers to audio clips -- "at such-and-such a point in such-and-such a speech, W said thus-and-such" In reply, I've gotten three or four plaintive fish stories ("I heard it on the radio in the car, and there were several unreduced articles, except I don't remember exactly what he said or what the context was") but no actual fish.

Well, last night I decided to record President Bush's speech nominating John Roberts to the Supreme Court vacancy, seine out all the the's and a's, and sort them after their kinds. I was fascinated by what I found. (Of course, my fascinations are arguably eccentric...)

The president spoke for about six minutes and seven seconds, and used (by my count) 46 the's and 28 a's.

Bush THE: 2 of the 46 the's preceded vowels; 2 others preceded a pause; 5 preceded the y-sound (IPA [j]) at the start of words like "united" (where assimilation to an [i] is common though not invariant). This leaves 37 phonetically pre-consonantal instances of the. Of these, 36 have a clear schwa as their vowel, and 1 has a unreduced [i]-like vowel. The context was this:

I look forward to the Senate voting to confirm Judge John Roberts {pause} as the one hundred ninth Justice of the Supreme Court of the United States. (audio link)

In my opinion, this is an uncertain case. The sequence seems to involve a very slight pseudo-pause, as if the president wanted to check mentally to be sure that he got the number right (a pseudo-pause is a delay or hesitation that is filled by an elongation of preceding sounds rather than by silence). Also, the following word "one" starts with an orthographic vowel, which may have affected the choice.

Bush A: This is a different case altogether. Of the 28 instances of a, 2 were pre-pausal; but of the remaining 26 phonetically pre-consonantal instances of a, fully 6 (or 23%) have an unreduced vowel similar to IPA [ej].

In his career, he has served as a law clerk to Justice William Rehnquist ... (audio link)
In public service and in private practice, he has argued 39 cases before the Supreme Court, and earned a reputation as one of the best legal minds of his generation.
(audio link)
My decision to nominate Judge Roberts to the Supreme Court came after a thorough and deliberative process.
(audio link)
He has the qualities American expect in a judge ...
(audio link)
These senators share my goal of a dignified confirmation process ...
(audio link)
The appointments of the two most recent Justices to the Supreme Court prove that this confirmation can be done in a timely manner.
(audio link)

Compare the two schwa-quality performances of the occurrences of a in the President's first sentence:

One of the most consequential decisions a President makes is his appointment of a Justice to the Supreme Court. (audio link)

This raises several questions. Is this sporadic use of unreduced a (and less often, the) an individual characteristic of President Bush (as my correspondents have claimed), or something more general? If it's a more general phenomenon, does it have particular regional or social connections? Is it associated with emphasis, with formality, with particular syntactic or phonetic contexts? Does it happen in extemporaneous speech as well as in reading?

I can only answer the first of these questions: it's not unique to George W. Bush. In fact, a small amount of investigation suggests that it's not even especially characteristic of him.

Following the President's nominating speech last night , John Roberts spoke for about 77 seconds. By my count, he used 13 the's and 4 a's.

Roberts THE: 11 of Roberts' the's were the sort of pre-consonantal cases that are normally reduced; of these, one was not reduced:

It is both an honor and very humbling to be nominated to serve on the Supreme Court. (audio link)

Roberts A: He used pre-consonantal a 4 times, and of these, one was unreduced:

That experience left me a profound appreciation for the role of the court in our constitutional democracy. (audio link)

Summing up: Bush, 1 unreduced the out of 37; Roberts, 1 out of 11. Bush, 6 unreduced a's out of 26; Roberts, 1 out of 4. The numbers are too small to draw any serious conclusions about overall rates, but it would clearly be unwise to characterize Bush's determiner-reduction pattern as being very different from Roberts'.

As one more point of comparison, I took a look at FDR's "Infamy Speech" to the U.S. Congress after Pearl Harbor, asking for a declaration of war against Japan. (Note that the linked transcription is often wrong... I did my own.)

Roosevelt THE: Roosevelt spoke for about 7 minutes 7 seconds, and by my count he used 24 of the sorts of phonetically pre-consonantal the that we would expect to be reduced. (There were also 12 pre-vocalic the's and 5 preceding IPA [j].) Of the 24 "reduction-prone" cases, 23 were in fact reduced. This 23/24 is not strikingly different from Bush's 36/37. The unreduced case is

Japan has, therefore, undertaken a surprise offensive extending throughout the Pacific area (audio link)

Roosevelt A: Here's the surprise. Roosevelt uses the indefinite article a 5 times in this speech, and every single one of them is unreduced. Here they are in context:

a date which will live in infamy (audio link)
the Japanese Ambassador to the United States and his colleague delivered to our Secretary of State a formal reply to a recent American message (audio link)
Japan has, therefore, undertaken a surprise offensive extending throughout the Pacific area. (audio link)
I ask that the Congress declare that since the unprovoked and dastardly attack by Japan on Sunday, December seventh, nineteen forty one, a state of war has existed between the United States and the Japanese Empire (audio link)

I haven't had time to look at any other Roosevelt speeches to see if this was a consistent property of his formal style. I'm sure that it's not a general characteristic of formal speech style in the 20th century, since I've looked at many other examples (including several other presidents' speeches) without finding it. More on this as I learn it.

Posted by Mark Liberman at 10:52 AM

July 19, 2005

Don't look at their eyes!

Ben Zimmer has pointed something out to me concerning Dan Brown's Digital Fortress, and since this is the book of Dan's that I plan to read next, I am not entirely sure I am grateful. What he has revealed may be somewhat distracting. I was expecting a novel about cryptanalysis, probably one in which on the first page a renowned male expert at something dies a hideous death and straight away a renowned expert at something quite different gets a surprise call and has to take an unexpected plane flight and then face some 36 hours of astoundingly dangerous and exhausting adventures involving a good-looking (and of course expert) member of the opposite sex and when the two of them finally get access to a double bed she disrobes and tells him mischievously (almost minatorily) to prepare himself for strenuous sex. But what Ben has pointed out is that Digital Fortress is first and foremost a book about eyebrow movements.

Really. Dan Brown is a specialist in eye descriptions, but in this novel it's eyebrows. Don't take my word for it or Ben's; take a look at the textual evidence that Ben sent me, courtesy of Amazon.com's rather frightening search-within-the-book capability:

Page 13: "Susan arched her eyebrows coyly";
Page 28: "Strathmore arched his eyebrows";
Page 54: "Strathmore raised his eyebrows";
Page 55: "Strathmore raised an eyebrow";
Page 76: "Strathmore raised his eyebrows expectantly";
Page 104: "Hale arched a surprised eyebrow";
Page 140: "raising her eyebrows in mock anticipation";
Page 141: "Rocío raised her eyebrows";
Page 148: "Rocío arched her eyebrows";
Page 186: "Brinkerhoff arched his eyebrows";
Page 253: "Numataka arched his eyebrows";
Page 369: "The enormous man arched his eyebrows";
Page 396: "Smith arched his eyebrows, impressed";
Page 408: "He arched his eyebrows, obviously impressed"...

Whether they're coy, expectant, or impressed, it's always the same with these people: up go those eyebrows. The cover of the paperback edition has a depiction of a sheet of paper with encrypted stuff on it. And at one point, just below EKNERLM `QLFRHNDF BN GBKIA0UFO, the sheet is split and peeled back, and looking through it is a pair of eyes. Can't quite see the eyebrows, but my bet would be that they are... arched.

I'm planning to read this novel during a plane flight on September 1. I do hope Ben hasn't started to spoil it for me.

Posted by Geoffrey K. Pullum at 09:00 PM

Dykes at the TMRC

I've been looking into the history of the common American term for "diagnonal cutting pliers", which is dykes or dikes. None of the standard English dictionaries have it: the Oxford English Dictionary, Merriam-Webster's 3rd Unabridged, Merriam-Webster's Collegiate, the American Heritage Dictionary, Encarta. In a previous post, I've cited text references going back to 1971 (a U.S. Navy book on hand tools), and (via Ben Zimmer) some hardware-store advertisements from 1977-78, along with a 1955 ad for "diags".

And then there's the entry for the verbal form dike "to attack with dikes" in the Tech Model Railroad Club dictionary. The on-line version of the dictionary says:

This dictionary is derived from one originally written in 1959 by Pete Samson. It was put on the net by Mark Stiles who added some entries. The online version was improved by contributions of several others, including Larry Allen, Richard Polis, Joe Onorato, Mike Patton, and others.

So could this be a citation for "dikes" dating back to 1959?

To find out, I wrote to the TMRC WWW team, and got this answer from Mark Styles ("aka WAS, aka Darkhorse"):

I created the definition for DIKE when I updated Pete's original entries and put the dictionary online on the (then) ARPANET, now the Internet. That was circa 1978. The use of the word at TMRC in that capacity certainly predates that by at least 3 years, since I learned it the first night I was at TMRC as a freshman in 1975, and diked out an old terminal block to earn hours towards my key.

Regardless of the claims in the current document header, I actually made up most of the post-Sampson definitions and continued to update the document for awhile. In the mid to late eighties it went into hibernation for quite some time.

Larry Allen, Richard Polis and I pulled an all-nighter in the AI lab once to do a big update. Joe Onorato, Mike Patton, Steve Russell and others suggested various definitions around that time. Slug (Steve) came up with GUNCH, I embellished it a little.

A couple of my own personal favorites (especially considering the popularity of the card games at the time) are BRIDGE (multiple puns) and HEARTS, and my all time favorite multiple pun is for EPOXY JOURNAL (sorry, JEM!). Polis almost choked on his cigar when I came up with that one...

Alan Kotok added:

I could be making this up, but it seems to me that "dikes" (however it was spelled) was in common usage when I first showed up at TMRC in 1958.

And Dick Lord agreed:

I've been an electronic hobbyist for close to 50 years (since 1955) and a working hardware design engineer for more than 35 years.  My dad was a long time ham radio operator who had been active in radio from the 1920's.  In all the years of my awareness, I have always heard small diagonal cutters referred to as "dikes" throughout the electronics industry.  Though TMRC can lay claim to a great many words, I'm inclined to believe that the use of the term "dikes" for diagonal cutters preceded the existence of TMRC or the hacker dictionary.

However so far, in my research over the last hour through a half dozen 1955-1956 issues of Popular Electronics and a 1955 Allied Radio parts catalog stashed away in my basement, everything I've run across has referred to them by the name "diagonal cutters."   I haven't run across the slang term in print, but who knows what might turn up.  I have far too many old books and magazines piled up in places where there ought to be trains running !

This agrees with my memory that it was in common usage among the teenage car mechanics that I admired as a ten-year-old in rural eastern Connecticut in 1957.

The Tech Model Railroad Club, or TMRC, is the topic of the first chapter of Steven Levy's 1984 epic "Hackers: Heros of the Computer Revolution", a book that I recommend to anyone who hasn't read it. Project Gutenberg offers the first two chapters of the 1986 edition here.

[Update 7/24/2005: Michael Patton emails:

There's been much discussion of this among the TMRC alumni with the basic result that the term was not in the 1959 version of the dictionary because it was such a common term and not specific to TMRC.

I asked my father, who got his electrical engineering degree from the University of Kentucky in the early 40's, about the term and he said:

I believe they were dykes when I went to work for RCA in 1943 although I never saw the slang term in writing.

So, the slang term pretty definitely predates the TMRC dictionary by many years. Unfortunately, finding it in writing is going to be hard since the slang would usually not have been used there.

]

Posted by Mark Liberman at 04:09 PM

Roll over Bourbaki, and tell Cholesky the news

According to a July 14 story in Japan Today

A French language teacher and 20 other plaintiffs filed a damages suit Wednesday against Tokyo Gov Shintaro Ishihara for his insulting remarks against the French language last year.

Malik Berkane, a 46-year-old principal of a French language school in Tokyo, filed the suit at the Tokyo District Court, together with 20 other French and Japanese people, demanding an apology over the remarks and 500,000 yen in compensation for each plaintiff.

Before becoming a politician, Ishihara was a novelist. His 1955 Season of the Sun describes wealthy students who "express their defiance of postwar respectability by gambling and brawling and indulging in promiscuous sex.” He made a political name for himself with the million-selling 1989 nationalist screed "The Japan that Can Say No: Why Japan Will be First Among Equals".

So why did Ishihara insult the French language, and how? Because some teachers of French criticized a reorganization of Tokyo Metropolitan University, and because (he says) the French language can't be used to count.

According to the petition, Ishihara said Oct 19, "I have to say that it should be no surprise that French is disqualified as an international language because French is a language which cannot count numbers."

He made the remarks at a meeting of a support organization for Tokyo Metropolitan University, which opened in April after integrating five universities and colleges run by the metropolitan government, when he criticized university employees who opposed the integration, including those teaching French and other languages.

"After all, those guys desperately clinging to such kind of language are lodging opposition for the sake of opposition," he said.

The newspaper accounts are not very exact about Ishihara's criticism, but it seems to have something to do with the fact that 80 in French is quatre-vingt ("four-twenty").

Back in the days of the Japanese economic boom, one of the many cultural characteristics cited as a reason for Japan's then-superior economic performance was the rarity of lawsuits, whether serious or frivolous. Has Japan now adopted American "sue the bastard(s)" attitudes along with baseball and coffee, or is this action atypical? Or is there some more traditional concern with linguistic "face" in the mix here:

The plaintiffs said the governor's remarks give a false impression that French is a poor language, which is not acceptable by international standards, and brought disgrace to the plaintiffs, many of whom are involved in running language schools.

Berkane said, "I was shocked when I heard his remarks. We decided to file the suit as the governor has not responded to our letter demanding his apology."

Other stories at the Mainichi Daily News (7/12/2005), the Asahi Shimbun (7/13/2005), the BBC News (7/13/2005), the National Post (7/16/2005), the Taipei Times (7/19/2005), and a relevant comment from Korea, suggesting that Ishihara should be taken seriously as a demagogue if not as a linguist. (And more here on Ishihara's rhetorical stance...)

I do need to point out that here at Language Log, Bill Poser went there first, though no one sued him.

[via Overlawyered via Ed(itor) at blawgreview]

Posted by Mark Liberman at 07:27 AM

July 18, 2005

Documenting dykes

I was unsure how to spell dykes (or is it "dikes"?), and surprised to find that this everyday word is missing from dictionaries, or at least from the half-dozen dictionaries that I tried. (I'm talking about the common term for diagonal-cutting pliers, of course -- I know how to spell the words for "embankment of earth and rock", or "long mass of igneous rock that cuts across the structure of adjacent rock", or "disparaging [ ?] term for a lesbian").

Language Hat picked this up, and noted that "It is indeed strange; I don't recall previously seeing a normal word of long standing, even one of limited circulation, that was not in any dictionary; that snub is usually reserved for recent slang terms."

Attempting to find some print citations for the term, I struck out in the various ProQuest historical newspaper indices, though this may be because of my limited skill in searching them. But for those of you who are not already completely bored with the topic, I'm happy to report that I succeeded easily with Google Print, by searching for {dykes|dikes pliers}. (And even those who are totally dyked out may be interested in a small initial example of the lexicographic value of this service, even in its current beta form.)

Here are a few of the more relevant results for the dykes form:

2000 David E Shapiro "Your Old Wiring" p. 133. There is even less a need for diagonal cutters, also referred to as "side cutters", "side cutting pliers", or colloquially, "dykes," (from DIagonal CutterS). Unlike lineman's pliers, they are designed only for cutting, not grabbing. In some parts of the country, dykes are called "side cutters", confusing though that can be to people who use that term for lineman's pliers.
2001 Newton C Braga "Robotics, Mechatronics and Artificial Intelligence" p. 15. Cutting pliers or diagonals (often called dykes) in sizes from 4 to 6 in.
2004 Art Glass Originals "Stained Glass for the First Time" p. 28. Using lead dykes, cut off each end of the came.

and for dikes:

1980 [Update: actually 1971] Bureau of Naval Personnel "Tools and Their Uses" p. 49. The diagonal cutting pliers, commonly called "diagonals" or "dikes", are designed for cutting wire and cotter pins close to a flat surface and are especially useful in the electronic and electrical fields.
1996 Bruce Caldwell "Auto Upholstery & Interiors" p. 15. Diagonal cutters, which are sometimes known as side cutters or dikes, are the basic tool for removing old upholstery.
1999 Jack G Ganssle "The Art of Designing Embedded Systems" p. 170. You'll live with those dikes and needle-nose pliers for weeks on end.
2002 John Holloway "Illustrated Theatre Production Guide" p. 123. Diagonal pliers or “dikes” are actually intended to cut pieces of wire or small metal hardware like pins or nails.
2003 Rick Peters "Electrical Basics" p. 50. I often use insulated-grip diagonal cutters (commonly called "dikes" in the trade) to cut individual wires in the 10- to 22-gauge range.

I'm from the "side cutters" = "linesman's pliers" culture, by the way.

In a comment on Hat's post, Rupert Goodwins points out that the verb "to dike" is in the Jargon File:

To remove or disable a portion of something, as a wire from a computer or a subroutine from a program. A standard slogan is “When in doubt, dike it out”. (The implication is that it is usually more effective to attack software problems by reducing complexity than by increasing it.) The word ‘dikes’ is widely used to mean ‘diagonal cutters’, a kind of wire cutter. To ‘dike something out’ means to use such cutters to remove something. Indeed, the TMRC Dictionary defined dike as “to attack with dikes”. Among hackers this term has been metaphorically extended to informational objects such as sections of code.

The word surely goes back much further than 1980 -- I learned it in the mid 1950s, and the TMRC (Tech Model Railroad Club) dictionary dates from 1959 (though I can't tell whether dike was in the original version), and Mike Albaugh suggested by email that his father had learned it in his squadron in WWII. I surmise that Google Print is working its way backwards from the present (at least until the submissions from the associated libraries start coming in), so we should learn more in time from that source, if nowhere else. And perhaps someone out there knows when the tool in question was invented.

[Update: Ben Zimmer found two tool shop ads from 1977-78:

Valley News (Van Nuys, CA), Oct 6, 1977, p. IV5
ad for: Tool Shack of America, Inc.
8" LINESMAN
NEEDLE NOSE PLIERS
DYKE
$2.00 EACH

Los Angeles Times, Jun 25, 1978, p. Y14
ad for: The Supertool Shops in Santa Monica
8" linesman plier...$2.95
6" needle nose..$2.85
6" dike..$2.85

Ben observes that both of these are from southern California, and both have the unexpected singular dyke or dike. Ben also found ads going as far back as 1955 for "diags":

The Independent (Pasadena, CA), Dec. 8, 1955, p. 28
ad for: Colorado Hardware
ASS'T PLIERS
Needlenose, Lineman, Diag's
$2.85 Value
SALE PRICE 77¢

and surmises that diags might have been a transitional form, or at least a transitional spelling.

These were found on newspaperarchive.com.]

[Update: I checked the details on the book "Tools and their Uses" and found that the copyright page says:

This Dover edition, first published in 1973, is an unabridged and unaltered reproduction of the work originally published by the United States Government Printing Office in 1971 as Rate Training Manual NAVPERS 10085-B.

Note that Google Print's algorithms still have a few rough edges: this work is cited in Google Print's index as "Tools and Their Uses", by S. Navy U." ]

Posted by Mark Liberman at 06:07 PM

Proper Ebonics of all things

Yesterday, The New York Times had a story about a young black woman from tough streets in Jersey City who is coaching an actress on playing her in a movie. She mentioned teaching the actress how to "speak Ebonics the correct way."

Have academic linguists finally had a public impact beyond the education school circuit?

Remember, before the Oakland controversy over the use of Black English in the classroom ignited in October 1996, no one beyond a few Black English specialists even knew the term EBONICS, which had been only one of many terms proposed for the dialect over the years, and had never really caught on.

It had just happened that a person or two favoring the term had caught the ear of the Oakland school board. When the media got hold of their infamous resolution to use "Ebonics" in the classroom, suddenly the whole country was referring to black America's home dialect with a term that I, for example, only knew from aging and semi-professional literature until then. It was as if suddenly the public was referring to alcoholism with an antique term like dipsomania. Linguists' hope that terms like "African American Vernacular English" would catch on were dashed.

At the time, I regretted the term's clumsiness. It has always sounded like some kind of linguistic proctology to me. But over the years, it has become so well-established among journalists, educators, and the lay public that resistance is futile.

But meanwhile, since 1996 linguists involved in the controversy have also shaken their heads over the fact that once the dust was settled, the general public still thought of Ebonics as a big joke — bad grammar, just "slang" or a combination of the two, propped up by a school board angling for bilingual education funds to line their own pockets with. The string of books that linguists wrote for the general public on Black English have been minor sellers at best, largely appealing to the already converted (this includes my own THE WORD ON THE STREET).

And yet -- here is an 18-year-old black girl who casually refers to speaking Ebonics "the correct way." Importantly, she was about nine during the controversy, which means that she has grown up in an America where the "Ebonics" term has been common coin. She would not have, and could not have, spoken of "Ebonics the correct way" before 1996 because the term had had no public exposure. But — she would certainly not have said "proper jive" or the like, except in jest. Nor would terms like African-American Vernacular English or Black English Vernacular, which only a linguist could love, ever have made their way into her heart.

But "Ebonics" is one word, and perhaps the EBONY connection helps in terms of memory-friendliness and identification. In fact, that the term at first sounds like a bit of a joke might have enhanced its staying power in the long run — one does not forget the word once one hears it.

I wonder if this young woman might represent a new generation of black kids who spontaneously think of Black English as a legitimate way of speaking, as something that can be spoken "right" or "wrong"? Perhaps the sheer existence of an entrenched and resonant name for the dialect has had an effect that we could not have predicted.

To be sure, the example the girl gives is a matter of having the freshest slang. But then few people besides linguists have much but a vague sense of what grammar is, and then, the typical white youngster would be unlikely to refer to their in-group slang as a matter of "correct" versus "wrong." In this girl's comment, we might hear an easy linguistic confidence that is exactly what "Ebonics" advocates see as lacking in speakers of the dialect.

Posted by John McWhorter at 10:14 AM

Bakhtin in the West Wing

Howard Kurtz writing in the WaPo 7/17/2005, referring to Matt Cooper's revelations in the latest Time Magazine (which requires a subscription to access):

Cooper cleared up one lingering mystery: his description in a memo of his talk with Rove as being on "double super secret background." He said this was "a play on a reference to the film 'Animal House,' in which John Belushi's wild Delta House fraternity is placed on 'double secret probation.'"

The "double super secret background" phase was first published in Michael Isikoff in Newsweek, in a column that (I think) appeared on July 10. Many people (including me) picked up on this as an allusion to Animal House, for example in this July 10th comment on Kos:

Is Rove a fan of "Animal House"?
And does "double super secret background" have anything to do with this:

[Dean Wormer's plotting to get rid of Delta House]
Greg Marmalard: But Delta's already on probation.
Dean Vernon Wormer: They are? Well, as of this moment, they're on DOUBLE SECRET PROBATION!

This allusion consists of two (fairly common) words embedded in a four-word phrase:

Original: double       secret probation
Allusion: double super secret background

I've never seen a systematic empirical study of how echoes like this work. Once it was Homer, Horace, the Bible or Shakespeare that were the most likely sources, while these days it's more likely to be a movie ("I'm shocked! shocked!") or a TV show ("Mmm... exams!").

Posted by Mark Liberman at 06:37 AM

Dangling modifier arrested in Beeston

From the Guardian, "Three cities, four killers", 7/17/2005

Yet last summer Khan changed. It was following his final trip to Pakistan. Those who knew him had detected a mood change. Two months after he visited the Commons last October, Khan resigned as a popular teaching assistant at the Hillside Primary School in Leeds. In the same period, Tanweer too was undergoing a profound personal transformation. Last December, he met militant groups linked to al-Qaeda north of Lahore. Days after returning to Beeston, a man he met was arrested for an attack in 2002 on an Islamabad church near a US embassy. [emphasis added]

I had to read this twice, because at first I thought I was learning that one of the Islamabad church bombers had been arrested in Yorkshire. That's why, as Geoff Pullum has pointed out, dangling modifiers are bad manners.

Posted by Mark Liberman at 05:45 AM

July 17, 2005

Adverbial license

As Mark observes, there are no end of ironies in the Patent and Trademark Office's refusal to register the mark DYKES ON BIKES on the grounds that dyke is vulgar, offensive and scandalous. As the San Francisco Chronicle explained the decision:

The USPTO referred to an entry in Webster's dictionary, which says "dyke" is "often used disparagingly."

"The examining attorney found it to be offensive to a significant portion of the lesbian community," Jessie Roberts, a trademark administrator with the U.S. Patent and Trademark Office, was quoted as saying. "And we're also looking out for the sensitivities of the general public more than that of a specific applicant."

What the examining attorneys failed to understand, of course, is that that "usually" in Merriam-Webster's usage label is there precisely to exempt ironic or defiant uses of the term, particularly by the people it applies to. Since their founding, Dykes on Bikes have made it their mission to épater the straightoisie, and everybody knows it.

But there's a further irony here, as well, when you look at the way a similar usage label was interpreted by the D.C. District Court Judge Colleen Kollar-Kotelly last year in overturning the Trademark Board's decision that the mark REDSKINS disparages American Indians.

As Kollar-Kotelly wrote in her decision:

[T]he dictionary evidence only states that the term 'redskin(s)' is 'often offensive,' which, as Pro-Football observes, means that in certain contexts the term 'redskin(s)' was not considered offensive. In fact, the TTAB concluded that the term 'redskin(s)' means both a Native American and the Washington-area professional football team. The fact that it is usually offensive may mean the term is only offensive in one of these contexts."

For sheer obtuseness, that could go mano-a-mano with the decision on dyke. I mean, when Merriam-Webster's labels nigger as "usually offensive," we don't take that as meaning that a candy company would have lexicographical license to market Grinnin' Nigger brand chocolate bars.

Speaking as a sometime usage editor for the American Heritage, I suppose this all suggests that dictionaries should try to make it clear in their front-matter what "usually" and the rest actually mean in usage labels. But I doubt if there's anything we could do that would keep courts and attorneys from reading the entries to suit their purposes.

This said, I can see some legal complications to suspending the "disparagment" condition for reclaimed epithets like dyke. For one thing, marks are transferable, and the intentions of the original registrants don't always go along with them. What if the DYKES ON BIKES mark were acquired by the Family Research Council for use on a series of homophobic t-shirts -- would that give a lesbian group grounds for a petition to cancel the mark? And if the registrant's intentions alone were decisive, you might be disposed to credit the Redskins' claim that they intended the mark to "honor" Native Americans (file under "butter wouldn't melt"). But most Indians wouldn't buy that. In the end, the determination that a mark is disparaging depends not just on the user's intention but on the perception of that intention by the group the term refers to.

Posted by Geoff Nunberg at 03:22 PM

Giving first aid the already disheveled hair projection

No, it's not spam text. Matthew in Beirut reprints the subtitles of a pirated Chinese DVD of Revenge of the Sith, available in English on the DVD via remarkably poor quality Machine Translation. The title becomes "The Backstroke of the West", and it gets better from there.

Other favorites:

"He big in nothing / important in good elephant"
"I hope that these dreamses really can't become"
"Send these troopseses only"
"I was just made by the Presbyterian Church"
    (= Jedi Council)
"Ratio tile, the wish power are together with you"   (="Obi Wan, may the Force be with you")

Posted by Mark Liberman at 01:09 PM

You taught me language, and my profit on't / Is, I know how to curse

Today's Foxtrot questions the role of linguistic content in cussing:

I sympathize. Shortly after reading the Sunday funnies this morning, I totally smashed my toe on a suitcase left in an inappropriate place. But why does saying certain words, with or without lightening bolts and daggers, seem to help in this situation? As far as I can tell, though there are many theories about this, the answer is still not clear. [See e.g. Timothy Jay's "Why We Curse. A Neuro-Psycho-Social Theory of Speech" (1999) , and its review by Howard Kushner, Journal of Nervous & Mental Disease, 189(4):269-270, April 2001].

As I nursed my sore toe, a different question occurred to me. Is cursing a cultural universal? Perhaps there's a good review of this question somewhere, though a bit of searching in Google Scholar and other places didn't turn up an answer. As I thought about it, though, I realized that this is a very difficult question to frame in an empirically useful way. In simpler language, how would you tell?

There are many circumstances that we think of as "cursing and swearing", and they seem to be connected at best by a sort of family resemblance: expressing pain, anger, frustration, annoyance; insulting someone, directly or descriptively; invoking or wishing for supernatural assistance in harming someone; adding emphasis to statements whether positive or negative in content; using certain taboo words, or referring to certain concepts in any way at all; issuing ritual guarantees of truthfulness, perhaps by exposing oneself to supernatural punishment for falsehood; and so on.

In any one of these areas, there are expressions that are not "curse words" (e.g. "ouch" or "oh oh oh") as well as those that are (e.g. the phrases that Bill Amend represented in the Foxtrot strip as "Asterisk! Dollar Sign! Ampersand!"). We think we know the difference, but sometimes kids get it wrong (or at least learn a different categorization), and expressions famously change category over time.

The linguistic details also differ markedly across cultures, even among Europeans. According to Boele De Raad et al., "Personality terms of abuse in three cultures: type nouns between description and insult", European Journal of Personality 19(2), 153-165 (2004), who asked "one hundred and ninety-two male subjects from Spain, Germany and the Netherlands ... to write down terms of abuse that they would use given a certain stimulus situation",

[i]n Spanish abusive language is typically about family and relations, in Germany it is typically about anal aspects, and in the Netherlands it is mainly about genitals.

I'm sure that people in every culture insult one another. But is there a counterpart in every culture to "abusive language"? Is the "abusive language" of insults always analogous to (and sharing expressions with) the "curse words" that express pain or frustration? What about ways to indicate emphasis?

This seems to me to be analogous to a set of problems in philosophy of mind:

Do octopuses wince and groan when in pain? Scream and yell? How similar is octopuses' escape behavior, from the purely physical point of view, to the escape behavior of, say, middle-aged, middle-class American males? Is there an abstract enough nonmental description of pain behavior that is appropriate for humans and octopuses and all other pain-capable organisms and systems? [Jaegwon Kim, "Philosophy of Mind", Westview Press, 1996, p. 96]

However, I'm not aware of any similar philosophical investigation of the cross-cultural interpretation of cursing.

[Here are some other interesting links, which unfortunately don't address the question of universality:

Geoffrey Hughes, "Swearing: A Social History of Foul Language, Oaths and Profanity in English", Blackwell, 1991.

Timothy Jay, "Cursing in America: A Psycholinguistic Study of Dirty Language in the Courts, in the Movies, in the Schoolyards and on the Streets". John Benjamins, 1992.

Howard I. Kushner. "A Cursing Brain? The Histories of Tourette Syndrome". Harvard University Press, 1999.

Kate E. Brown and Howard I. Kushner, "Eruptive Voices: Coprolalia, Malediction, and the Poetics of Cursing", New Literary History 32.3 (2001) 537-562.

Kushner's "Cursing Brain" book raises a medico-anthropological question for me. Presumably every culture has sufferers from the condition sometimes called "Tourette's syndrome", apparently caused by hypersensitivity of dopanergic receptors in the basal ganglia, in (some variants of ) which there are compulsive taboo verbal outbursts of various sorts. The details of these verbal tics are obviously language-specific -- the famous Marquise de Dampierre repeatedly burst out with "merde et foutu cochon", not "shit and fucking pig", because she was a 19th-century French aristocrat, not an English one. The details are also apparently (often?) context specific: the description by Jean Marc Gaspard Itard of Madame de Dampierre's case (Archives Générales de Médecine, 1825) explained that

In the midst of a conversation that interests her extremely, all of a sudden, without being able to prevent it, she interrupts what she is saying or what she is listening to with bizarre shouts and with words that are even more extraordinary and which make a deplorable contrast with her intellect and her distinguished manners. These words are for the most part gross swear words and obscene epithets and, something that is no less embarrassing for her than for the listeners, an extremely crude expression of a judgement or of an unfavorable opinion of someone in the group.

So examining the behavior of such cases across cultures should give some indication of what counts locally as "gross swear words and obscene epithets". However, some of these Tourette's outbursts are apparently just insulting or embarrassing, without involving what we would call "curse words" or violating other specifically linguistic taboos.

]

Posted by Mark Liberman at 12:49 PM

The NYT updates the framing wars

Nine months after the blogospheric buzz, the NYT magazine comes to term with Matt Bai's article "The Framing Wars". Well, that's not fair. The article does recycle much of the usual material, including its title, the question of whether success in political discourse is about words or about ideas, the obligatory passage about Frank Luntz, and so on; but it's mostly about the current state of framing uptake on the Democratic-party side of American politics.

Bai leads with a nice image:

After last November's defeat, Democrats were like aviation investigators sifting through twisted metal in a cornfield, struggling to posit theories about the disaster all around them.

There's an update on George's current lifestyle:

When I first met Lakoff in April, at a U.C.L.A. forum where he was appearing with Arianna Huffington and the populist author Thomas Frank, he told me that he had been receiving an average of eight speaking invitations a day and that his e-mail account and his voice mailbox had been full for months. ''I have a lot of trouble with this life,'' Lakoff confided wearily as we boarded a rental-car shuttle in Oakland the following morning. He is a short and portly man with a professorial beard, and his rumpled suits are a size too big. ''People say, 'Why do you go speak to all these little groups?' It's because I love them. I wish I could do them all.'' Not that most of Lakoff's engagements are small. Recently, in what has become a fairly typical week for him, Lakoff sold out auditoriums in Denver and Seattle.

There's an interesting description of one of framing's successes -- the SS debate:

Bush had tried to recast his proposed ''private accounts'' as ''personal accounts'' after it became clear to both sides that privatization, as a concept, frightened voters. But as they did on the filibuster, Democrats had managed to trap the president in his own linguistic box. ''We branded them with privatization, and they can't sell that brand anywhere,'' Pelosi bragged when I spoke with her in May. ''It's down to, like, 29 percent or something. At the beginning of this debate, voters were saying that the president was a president who had new ideas. Now he's a guy who wants to cut my benefits.''

and an even more interesting suggestion that these successes are more about party discipline than about public reactions to language:

In the end, the success of the Social Security effort ... may have had something to do with language or metaphor, but it probably had more to do with the elusive virtue of party discipline. Pelosi explained it to me this way: for years, the party's leaders had tried to get restless Democrats to stay ''on message,'' to stop freelancing their own rogue proposals and to continue reading from the designated talking points even after it got excruciatingly boring to do so. Consultants like Garin and Margolis had been saying the same thing, but Democratic congressmen, skeptical of the in-crowd of D.C. strategists, had begun to tune them out. ''Listening to people inside Washington did not produce any victories,'' Pelosi said.

But now there were people from outside Washington -- experts from the worlds of academia and Silicon Valley -- who were making the same case. What the framing experts had been telling Democrats on the Hill, aside from all this arcane stuff about narratives and neural science, was that they needed to stay unified and repeat the same few words and phrases over and over again. And these ''outsiders'' had what Reid and Pelosi and their legion of highly paid consultants did not: the patina of scientific credibility. Culturally, this made perfect sense. If you wanted Republican lawmakers to buy into a program, you brought in a guy like Frank Luntz, an unapologetically partisan pollster who dressed like the head of the College Republicans. If you wanted Democrats to pay attention, who better to do the job than an egghead from Berkeley with an armful of impenetrable journal studies on the workings of the brain?

You might say that Lakoff and the others managed to give the old concept of message discipline a new, more persuasive frame -- and that frame was called ''framing.'' ''The framing validates what we're trying to say to them,'' Pelosi said. ''You have a Berkeley professor saying, 'This is how the mind works; this is how people perceive language; this is how you have to be organized in your presentation.' It gives me much more leverage with my members.''

There are some factual errors, such as Bai's description of Lakoff's history with Chomsky:

It began nearly 40 years ago, when, as a graduate student, Lakoff rebelled against his mentor, Noam Chomsky, the most celebrated linguist of the century.

When I first met George, in 1965, he was already a faculty member at Harvard, and still considered himself on the same linguistic team as Chomsky, so to speak. I don't think that his level of enthusiasm for the enterprise of Chomskian linguistics had diminished since his time as a graduate student at Indiana (a period which technically ended with his PhD in 1966, after he began work at Harvard). Their disagreements -- part of the so-called "linguistics wars" -- developed into acrimonious argument several years later, in the late 1960s and early 1970s. As far as I know, Chomsky had never been Lakoff's "mentor" in any meaningful sense of that term, unlike the relationship that Chomsky had with other "generative semanticists" such as Haj Ross and James McCawley, who had been Chomsky's students at MIT and who joined Lakoff in "rebelling", as Bai puts it.

There are a few very strange choices in Bai's article. For example, the Designated Detractor role is assigned to Kenneth Baer, and backed up with a spectacularly inappropriate ritual quote:

Lakoff's detractors say that it is he who resembles the traveling elixir salesman, peddling comforting answers at a time when desperate Democrats should be admitting some hard truths about their failure to generate new ideas. ''Every election defeat has a charlatan, some guy who shows up and says, 'Hey, I marketed the lava lamp, and I can market Democratic politics,''' says Kenneth Baer, a former White House speechwriter who wrote an early article attacking Lakoff's ideas in The Washington Monthly. ''At its most basic, it represents the Democratic desire to find a messiah.''

George "marketed the lava lamp"? What is that all about? My client is innocent, your honor: when the lava lamp was marketed, he was busy mimeographing handouts about ordering constraints among syntactic transformations.

Finally, I'm surprised that there's so little sensitivity to a relevant piece of word-sense ambiguity here. The founding fathers framed the Constitution, and are conventionally called " The Framers". That metaphor is based on the idea of framing a building, setting its essential structure in place. Why does (almost) everyone now seem to take it for granted that "framing" an issue is how you tell people about it, rather than how you decide to think about it? Is everyone caught instead in the idea of framing as wrapping a (purely decorative and presentational) frame around a picture? This was my first reaction to the MSM discussion of framing, and nothing much seems to have changed, in this respect, since July of 2004. Thinking about The Framers might help re-frame framing.

Though Bai's stuff on current uptake is interesting, the article's treatment of the framing ideas themselves is weak or worse. If you're interested, you can find better stuff in earlier magazine articles, and still better stuff in the (admittedly sprawling) blogospheric discussion:

Magazine links:

Michael Erard, "Frame Wars", in The Texas Observer (11/5/2004)
Marc Cooper, "Thinking of Jackasses", in The Atlantic Monthly (4/2005)
Joshua Green, "It Isn't the Message, Stupid", in The Atlantic Monthly (5/2005)

Blog links:

Language Log:
9/3/2003 "Linguistic punditocracy: the Rockridge Institute",
6/13/2004
"W the debater",
7/23/2004
"It's about ideas, not words",
9/4/2004 "Frames and messages",
9/9/2004
"More on Lakoff on framing",
9/22/2004
"Lakoff hits the big time, blogwise",
10/5/2004
"Good theory, bad practice -- or contrariwise?",
4/14/2005 "Linguistic sorcerers".

Chris at Mixing Memory:
9/09/2004 "Framing the convention",
9/16/2004
"Karl Rove the Feminist Bankteller",
9/21/2004
"Lakoff in the Blogosphere",
9/22/2004
"Understanding Frames with an Eye Toward Using Them Better",
9/27/2004
"Lakoff's View of Metaphors",
10/02/2004
"Lakoff is Everywhere!"

Coturnix at Science and Politics:
9/17/2004 "'Framing' is spreading through the Blogosphere"
9/20/2004 "Kos discovers Lakoff"
9/21/2004 "Nurturant is not Coddly!"

Justin at Semantic Compositions:
9/30/2004
"Maybe try thinking of a donkey",
9/30/2004
"What George Lakoff knows about the mind",
10/1/2004
"How not to test a hypothesis",
10/5/2004 "Excellent, excellent",
10/14/2004
"Relax? I can't relax!",
10/18/2004
"Elephants in George's pajamas",
11/15/2004 "A reply to my critics".

[Update: Arnold Zwicky has corrected my false belief that Paul Postal was a grad student with Chomsky at MIT:

you're right about george lakoff, but not about paul postal. neither george nor paul was a student at MIT; paul's ph.d., on mohawk morphophonology, is from yale (with floyd lounsbury as his adviser, if i remember correctly). paul did teach at MIT. but he was a "student" of chomsky's only in the way that, say, bob stockwell and chuck fillmore and emmon bach were chomsky's students.

That's teach me to put down "facts" without checking. It's embarrassing to do something like this in the context of correcting someone else's error. As always, the Language Log customer service representatives stand ready to refund your subscription fees in case of less than full satisfaction.]

Posted by Mark Liberman at 08:50 AM

July 16, 2005

Are we conversational, or not?

While collecting examples of "initial material deletion" ("Must have a word with my publisher", Ruth Wajnryb, Expletive Deleted, p. 236), I came across one close by a possessive with gerund (using traditional terminology, as in Merriam Webster's Dictionary of English Usage) which clashed stylistically with the conversational style of "Turns out, that all-American nightmare is mostly myth." Right there in a New York Times editorial ("Using Farmers as Bait", 7/16/05, p. A26). Probably a case of the writer's aiming for a conversational, but serious, tone (with the subject omitted in "turns out") running up against editorial policy (proscribing accusative with gerund, even where the possessive alternative seems fussy and overformal).

Here's the passage:

Anti-tax forces like to conjure up a field of dreams' turning into condos -- a young family inherits its birthright and then has to sell it to pay the taxes.
Turns out, that all-American nightmare is mostly myth.

In my collection of VPs with omitted subjects (50 so far; I'm not doing a particularly systematic search), there are three instances of "turns out", and two of these are from newspapers. Like the article omission in the following examples, "turns out" is probably used this way to lend a somewhat conversational, though still serious, tone to newspaper writing.

Except that's not true. Truth is, there's no way to tell which program is better. (Michael Winerip, On Education column "When Data Don't Mean That One Way Is Better", NYT 7/16/03, p. A15)
One of the visits included a fundraiser that bought Mr. Thune's campaign more than $300,000 in contributions. Word is, the president will be back at least once more before the election. (John Nichols, "Get That Pollster Off My Lawn", NYT Week in Review, 10/27/02, p. 13)
Most Americans -- two-thirds, accordng to a Pew Research poll this month -- believe that Saddam Hussein had a hand in the Sept. 11 terrorist attacks. Trouble is, no hard evidence of such a link has been made public. (NYT editorial "The Illusory Prague Connection", 10/23/02, p. A26)

The possessive with gerund that comes a bit before "turns out" is, however, distinctly formal, at least to my eye. In this case, you can tell it's possessive only by the punctuation, with an apostrophe, but presumably the Times would have insisted on an audibly possessive singular, say:

Anti-tax forces like to conjure up a dream's turning into condos -- a young family inherits its birthright and then has to sell it to pay the taxes.

Despite (or perhaps because of) frequent advice that nominal gerunds must have their subjects expressed by a possessive (not accusative, or uninflected) NP, a non-pronominal (things are different for pronouns) possessive in an object (things are different for subjects) strikes many modern speakers as very formal, even unnatural. Recall the fictional judge objecting to splitting in court, in one of the Rumpole stories; he used an accusative in a gerund object, even for a pronoun,

Do we have to add to the disagreeable nature of the proceedings the sound of you tormenting the English language?

over a possessive,

Do we have to add to the disagreeable nature of the proceedings the sound of your tormenting the English language?

presumably because John Mortimer thought the accusative sounded more natural.

So where did the possessive come from in the Times editorial? Probably from a copy editors' policy requiring the possessive with gerunds. But I'll bet they're not consistent in applying this policy; it's very hard to consistently enforce rules that sometimes go against the feelings of native speakers.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:50 PM

Disparaging trademarks and the lexicography of tools

Back in October of 2003, Geoff Nunberg wrote about how the DC District Court reinstated the trademark of the Washington Redskins, overruling the earlier decision of the Trademark Trial and Appeal Board, "which ruled in 1999 that the mark was improperly registered back in 1965, since the Lanham Act forbids the registration of marks that are 'disparaging'". Now (according to Abnu at Wordlab),

The San Francisco Women's Motorcycle Contingent, the nonprofit lesbian motorcycle group that has become internationally known for riding in the lead position at San Francisco's pride parade every year for nearly three decades, has been refused a trademark for its moniker, Dykes on Bikes.

This was an examining attorney's preliminary decision, and the case now goes to the Trademark Trial and Appeal Board. For the details, Abnu refers us to an entry at Martin Schwimmer's Trademark Blog, which quotes the examining attorney as observing that

A disparagement claim must show that the proposed mark (1) would reasonably be understood as referring to the disparaged party; and (2) is in fact disparaging, that is, would be considered offensive or objectionable by a reasonable person of ordinary sensibilities.

and that

The fact that some of the disparaged party have embraced or appropriated the term DYKE, does not diminish the offensiveness of the term that has historically been considered offensive and derogatory.

Furthermore, the examining attorney gave a great deal of weight to lexical reference works:

Dictionary definitions alone may be sufficient to establish that a proposed mark comprises scandalous matter, where multiple dictionaries, including at least one standard dictionary, uniformly indicate that a word is vulgar, and the applicant’s use of the word is limited to the vulgar meaning of the word.

Presumably the Trademark Trial and Appeal Board will uphold the examining attorney's rejection, which seems to reflect the wording of the Lanham Act, whatever common-sense justice may say in this case. (Though the group's submissions to the USPTO, linked below, make a pretty persuasive case with respect to the non-disparaging history and current usage of the word dyke within its subculture of origin.)

Too bad that Dykes on Bikes can't get their case into the DC Circuit Court, where the judge ignored what Geoff describes as

a pretty strong portfolio of evidence to support the claim that redskin was a disparaging term when the mark was originally registered and remained so afterward. We had print citations for the word going back to the nineteenth century, like a passage from the 1910 edition of the Encyclopedia Britannica that described the word as not being "in good repute."

The contemporary evidence would include the AHD's characterization of redskin as "offensive slang", Merriam-Webster's 3rd Unabridged which says that it's "usually taken to be offensive", and so on.

Maybe Dykes on Bikes could re-gloss themselves as the Diagonal Cutting Pliers on Bikes? Though this page on a Human Resources site suggests that it might not help their case.

And curiously, there wouldn't be much dictionary evidence to bring forward. Dykes (I guess that's the spelling) in the meaning of "diagonal cutting pliers", isn't in the AHD, M-W Unabridged, the OED, or Encarta.

I think this is really strange. As far as I know, dykes is the standard American term for this ubiquitous and useful tool. In my experience as a child working on bicycles and later cars with my friends, as a mechanic in the army, and hanging around electronics technicians at Bell Labs, "dykes" was as common as a term as hacksaw or chisel. I mean, what else would you call them?

I say "them" because as a tool term, dykes is always plural, like scissors. And similarly, the derived verb loses the -s. For example, if you remove a component from a circuit by cutting the connections, you "dyke it out" (though I've just discovered that Google has quite a different notion of what this phrase means).

Of course, the derived verb is also missing from all the dictionaries I've consulted. Either I've slipped in from some parallel linguistic universe, or the profession of lexicography is falling short in the domain of tools.

[SF Chronicle story; USPTO case file. In the case file (under "Paper Correspondence Incoming, 28 April 2005") you'll find declarations by Jesse Sheidlower and Ron Butters, among others. For your reading convenience (and they're worth reading), I've snipped out those two declarations from the 814-page .pdf of the submissions that I downloaded from the USPTO.]

[Update: more on the lexicography of dykes/dikes at Language Hat.]

[Update #2: more on this by Geoff Nunberg here.]

Posted by Mark Liberman at 08:10 AM

Beaten

Ceej complains:

AP headline on Salon right now: Ill. Student Beat to Death With Bike Lock. Was beaten. Beaten beaten beaten. Jeebus, I can't stand looking at that headline. It's like fingernails on a chalkboard. [...]

What is wrong with that headline writer?

Just an old-fashioned guy or gal, according to the OED:

The pa. pple. beat, still occasional for beaten in all senses, ... comes naturally enough from ME. bet, shortened from bete, beten, found already in 13th c., and having the open e of the present.

Cited examples include:

1541 R. COPLAND Guydon's Formul. Uiij, The whytes of egges, and oyle of roses bet togyther.
1602 W. FULBECKE Pandects 28 Democracie hath beene bette doune, and Monarchie established.
1607 SHAKES. Timon III. vi. 123 He gaue me a Iewell th' other day, and now hee has beate it out of my hat.
1611 SHAKES. Wint. T. I. ii. 33 He's beat from his best ward.
1712 ARBUTHNOT John Bull (1755) 47 They were beat..and turned out of doors.
1738 WESLEY Wks. (1872) I. 91, I was beat out of this retreat too.
1793 SMEATON Edystone L. §238 The stone..was then lowered..and beat down with a heavy wooden maul.
COOK Voy. (1790) V. 1772 The bark of the pine-tree, beat into a mass resembling hemp.
1798 NELSON in Nicolas Disp. III. 2 The man who may have his Ship beat to pieces.

The AHD gives the participle as "beat·en or beat", and Merriam-Webster's Unabridged likewise says "beaten; also beat".

I don't want to make (too much) fun of ceej, to whom we owe a debt for precision in expressing the feeling of hearing or seeing (certain) perceived rules violated: "it's like fingernails on a chalkboard". In a case like this, the facts of linguistic history and current usage don't really matter. Ceej is in pain, and needs to tell someone.

I remember vividly the first time I noticed that some people use the word real as an adverb, as in "it's real hot in here". I was ten years old, and the offender was a teenage boy from Ohio. He used real that way a real lot, essentially every time he produced an adjective. Every time he did it, it set my teeth on edge. After a couple of hours, I felt like hitting him.

I was already aware of many other non-standard usages that didn't bother me a bit. Ain't wasn't a problem; double negatives didn't trouble me; "we gotta go" and the like were fine. Of course, the fact that my friends and I used such expressions all the time probably had something to do with it. But somehow I had never run across adverbial real, or never registered it.

By the age of 10, I think that I'd already internalized the libertarian ideology about usage that's natural to most Americans. People should talk as they please, I believed. Others are free to evaluate the results according to their lights, but I thought of this as a rational categorization: person A has a French-Canadian accent; person B talks like a Yankee farmer; the kids in the trailer park drop their g's and use a lot of slang. This was like noticing that someone had red hair or bad posture or wore flannel shirts all the time. I had never before experienced anything like this visceral reaction to someone's speech. My feelings offended my inner Spock: "irrational", I thought to myself. But I still couldn't stand that boy.

I got over it. You can hurl adverbial reals at me all day, now, and I won't flinch. I don't use them, though, myself.

Posted by Mark Liberman at 12:26 AM

July 15, 2005

"Variant of, error for, or punning on"

As Chris Waigl pointed out, William Grimes' NYT review of Joan DeJean's "The Essence of Style" takes DeJean to task linguistically:

It's a little dismaying ... when a literary scholar writes "throws of passion" and "to the manor born." (That's "throes" and "manner.")

Was it DeJean or her copy editor? But in any case, it's interesting to see Grimes beating the zombie horse of manor vs. manner.

The OED says that "to the manor born" is

[variant of, error for, or (in sense 4(b)) punning on to the manner born s.v. MANNER n. 3b]

and gives 11 citations, including these:

1847 Princeton Rev. July 320 He intended..to return to Scotland and reside on his estate there as ‘though a native --and to the manor born’.
1916 Philos. Rev. 25 307 The modern man..is..one who looks upon Christianity not as an outsider, but as one to the manor born.

One would think that if the Princeton Review printed a phrase in 1847, and the Philosophical Review in 1916, and the phrase makes reasonable compositional sense in contemporary English, that it would be OK in 2005. However, this 150-year-old history of (reputed) linguistic bastardy means that a NYT reviewer still feels called on to twit a writer for using it.

With respect to the phrase Hobbesian choice, which is a "variant of, error for, or punning on" Hobson's choice, I suggested that

The key linguistic point is that Hobson's blocks Hobbesian here. Even if there is a valid and coherent reason for Anderson to see his choice as a "Hobbesian choice", he can't use that phrase without taking literate readers aback, and leading some of them to make fun of him. Unless, of course, he can convince them that the whole thing was a clever pun all along.

Apparently, the rule is "once an eggcorn, always an eggcorn". The only way to be redeemed is to take over. No one would complain now about a writer using the term "Jerusalem artichoke", which before 1620 or so was Girasóle Articiocco, or "sunflower artichoke" in Italian, because "Jerusalem artichoke" has completely displaced its rival. It's a doggy dog world, out there in the land of words.

[Note: the "Hobson's Choice" sign is a real photo of a real sign for a real shop (selling dried flowers in Hoosick, NY), according to Jim Hanas who took the picture.]

Posted by Mark Liberman at 03:36 PM

Wrong and wronger

In response to this morning's post about the LA Times' article on French food slang, Chris Waigl wrote from Paris that

I loved your post about that French food slang article. The amount of stereotyping of France and the French that goes on in the English-speaking press (worst of all in the US, I'm afraid) is something I find quite disconcerting.

Well, I guess there's enough stereotyping back in the other direction to balance the books... but the key thing in this case, as Chris goes on to explain, is that the "list of so-called French slang is une vraie catastrophe."

Chris' critique in detail (the list she's talking about is here):

Some of Leslie Brenner's idioms are either archaic or so rare I've never heard of them. This doesn't mean she must have invented them, but I imagine a number might be regional south-western dialectal (since she writes her in-laws live there), and others come from highly literary reference books.

If I had to pull an opinion out of thin air, I'd think that English has a greater number of widely used food-related idioms. (And then there's the difference between "idiom" and "slang" -- many of her examples aren't slang by any stretch.)

Some of the more egregious misrepresentations:

- "Je pourrais manger un curé frotté d'ail" -- very cute, but the only GHit I get apart from Brenner's article is from a discussion forum where this is presented as a _rare_ expression. And no one would be surprised that a slang term expressing hunger would mention eating or food. The French people on the street would say they could eat "un bœuf", "un cochon entier" and, like in English, "un cheval". The only interesting part I can see is the vague reference to the real slang expression "bouffeur de curé", referring to a somewhat outspoken proponent of radical anti-clericalism. (I haven't looked it up, but it must date from 1905 or thereabouts.)

- "Oh purée!" -- this only seems to be about mashed potatoes. The exclamation is a prudish reshaping of "Putain!". Just like saying "frigging" instead of "fucking".

- "Tomber dans les pommes" -- I admit most speakers do think that this refers to apples, but it's a relatively well-known folk etymology, your basic ex-eggcorn. The original expression employs "pâmes", from "se pâmer", i.e. to pass out. "Les pâmes" is now archaic and signified the state of unconsciousness.

- "Il n'est pas dans son assiette" -- yes, "assiette" means plate, but that's not the sense of it here. The word derives from the verb "s'asseoir", to sit down, and refers to the way someone or something sits, or is placed (see also "this doesn't sit well with me"). Someone who "n'est pas dans son assiette" is temporarily floundering, not feeling quite stable, a bit off-track and out of it. "Assiette" as a technical term denotes a number of different things, from the way a train "sits" on the rails or the top-soil on whatever is below, to the the figure for your annual income that serves as a base to asses taxes.

- "Tu me fais tourner le sang en boudin" -- I am more familiar with blood turning to water (eau) or bleach (eau de Javel).

Verlan, by the way, is not recent, but it is becoming more widespread. Even Le Monde, at least in opinion pieces or big "society" panorama articles, uses Renoi (verlan of "noir") and Rebeu (verlan of "beur", which is verlan of "arabe"). Words like "meuf" (femme) or "keuf" (flic) are used nearly everywhere in the informal register. The real inhabitants of the banlieues tend to move away from the mainstreamed verlan terms and invent new ones.

Oh well.

As another example of national (and gender) stereotypes in the American press, Chis offers this:

Only yesterday, in a NYT review by William Grimes of Joan DeJean's "The Essence of Style" on Louis XIV's world (he claims she is an eggcorn user, thus my interest) I found this little example of casual stereotyping:

It all seems so contemporary. Parisian women submitted to the cruel attentions of a hairdresser known as Monsieur Champagne, who rewarded his faithful clients by insulting them to their faces or simply walking out in a huff, leaving his work half done.

Contemporary? I can testify to the fact that the 21st century Parisian woman's relationship with her hairdresser is nothing like that.

In Grimes' defense, it's possible that he was stereotyping New York City women (and their hairdressers) rather than Parisian women.

Posted by Mark Liberman at 03:26 PM

Zoning in

NPR's Morning Edition today interviewed ABC correspondent Ann Compton on the history of relations between the White House and its press corps. At one point, Compton says (with line divisions at breaths and pauses):

So the focus is switched from
what exactly Karl said
to what did Karl Rove do
after this came under investigation. That is where
Washington investigations often
end up. It is the cover-up, it is the handling
of a scandal, not the original act
that now the White House press corps is zoning right in on.

I've previously wrote about hone in as substitution for home in (here and here). Zone in is an interesting further development, with plenty of web support:

It means that we should be smart about issues of the candidates running for office and not pick one issue to zone in on because it is easier.
...in a frenzied plunge that will rescue us all from a rapidly deteriorating situation, he, tiger-like, zones in on the horses that everyone has forgotten.
Zoning in on an explanation for rashes, blisters or lesions might seem a fairly straightforward task.
"I don't know if I zone in on him. I think you just kind of know what's going on in the tournament. If his matches are on, I definitely watch. For sure."
He is zoned in on the Patriots and, like many of his teammates, supremely confident in his and their ability to finish what they started last July.
Zone in on those oversized sandwiches built with hot corned beef or pastrami served on soft, sturdy, seedless rye.
Finally, dispatchers were able to zone in on the Almas’ phone signal, Wright said

None of the dictionaries that I checked had "zone in (on)", though most of them had "zone out" in the sense of "become inattentive". Also, none of the strings "zone in on", "zoned in on", "zones in on", "zoning in on" occur in the example sentences in the OED. So I suspect that zone in is a more recent development from home/hone in (And of course from zoom in as well.) Though perhaps Ben Zimmer will write in with a citation from a sports story published in 1893 or so....

As is also the case with hone in, zone in makes a reasonable amount of sense on its own terms -- to narrow the zone of consideration, and thus to focus on something. But it seems to have arisen as by eggcornish analogy to earlier phrases, themselves derived from even earlier ones, and therefore (perhaps unfairly) may stigmatize a user.

[Update: Ben Zimmer writes

Sorry, I can only take it back to 1977, in a feature by Tony Kornheiser about tennis player Jeff Borowiak:

(New York Times, Jan 29, 1977, p. 13) Some people might call him a Space Cowboy; he's a musician, and maybe a remnant of the Haight-Ashbury hippie days. He talks about "zoning in" on things, and he elongates his words in the California prose of Joni Mitchell. And now, this week, he's in a zone of his own. Jeff Borowiak has made the semifinal round of the United States pro indoor tennis tournament, and people are beginning to take notice.

I'm not quite sure that's the same zone in. Ben continues:

In sports usage, the expression appears to be influenced not only by "home/hone in (on)" but also "in the zone". (The OED draft entry defines this sense of "zone" as "A state of perfect concentration leading to optimum mental or physical performance.") So "zoned in" (without the "on") appears in contexts where one might also expect "dialed in", "locked in", or other expressions of supreme concentration.

(Chronicle Telegram (Elyria, Ohio), Aug. 14, 2001, p. D5) "I never realized how much people were rooting for me," May told his mother. "I was so zoned in."

(Tribune (Ames, Iowa), Jan. 18, 2003, p. B4) It didn't appear to affect him during a 17-point second half, but he said earlier this week it was still bothering him a little. Even so, with Price on his mind, Sullivan will be zoned in.

(Fox Sports (AP), July 8, 2005) "The first one, I was zoned in," Marchese said while talking to friends on his cell phone. "I had it locked. But the speed was faster than I thought."

(Staunton News Leader (VA), July 15, 2005) "I just felt it today," Wright said of his hitting. "I was zoned in and couldn't miss."

I agree with Ben about the sports sense, but this doesn't seem exactly to be the same "zone in" that Ann Compton used. ]

[And by the way, the earliest OED citation for "zoom in on" is

1962 Daily Tel. 8 June 23/7 The lens is capable of ‘zooming-in’ on a set target up to a mile distant.

which does seem like more or less the same sense, though not the same word. ]

[Update #2: Arlo Faria sends in an interesting compilation of Google counts:

"home in on the issue" (81 hits)
"hone in on the issue" (79 hits)
"zoom in on the issue" (30 hits)
"zone in on the issue" (3 hits)

and suggests that

Based on the relative infrequency and recentness of "zone in", I'd guess that "zone" is derived from "zoom" and not vice-versa (or speaking of eggcorns: vice-a-versa).

Interestingly, just as "home in" seemed to gain popularity in the domain of WWII air warfare, "zoom" seems to originate in aeronautics slang from WWI:

OED: 1917 Daily Mail 19 July 4/5 'Zoom'..describes the action of an aeroplane which, while flying level, is hauled up abruptly and made to climb for a few moments at a dangerously sharp angle.

This aerial zoom extends metaphorically to the photographic zoom by 1948, from which it probably more recently acquired the "home/hone in" usage.

]

[Update #3: Jesse Clark wrote in with an interestingly different (contextual) meaning of "zone in"--

In online games where the world is divided into zones, the act of moving from one zone to another is called "zoning" and typically leaves the user unable to do anything else.

Exiting a zone is called "zoning out" and carries the familiar meaning of "becoming inattentive" but also "departing". Entering a zone is called "zoning in" and means "becoming inattentive" but also "approaching".

]

Posted by Mark Liberman at 10:44 AM

She's working from her coffeepot

While Pascale Riché was explaining at length about Tocqueville and the importance of community in the U.S. and the bond created by children, he surmised that the American reporter who interviewed him about his block party was thinking "I wish he'd stop blathering on with his pop sociology... When is he going to tell me that he's French and that he loves food?"

Well, here's another piece of evidence that he was right. Leslie Brenner's Bastille Day story in the LA Times is all about how French people's "love for food is equal only to their love for slang, and French slang, to an amazing degree, is food related".

To an "amazing degree"? Well, amazement is in the amygdala ("almond") of the beholder, I guess. Brenner may be a big cheese in the LA Times food section, but I'm tempted to say that as a matter of quantitative fact, her claim about the role of food in French slang is nuts. What's my beef? I've got two problems with this pea-brained farrago of cultural stereotypes and Whorfian clichés: first, much of every language's slang comes from words for familiar things, and Benner gives no evidence that food is higher on the list for the French than it is for anyone else. Second, one of the most striking things about recent French slang is the role of verlan, which has nothing whatever to do with food.

Anyhow, Brenner gives 40 examples of French food slang [or really, in most cases, food-related idioms, as Chris Waigl pointed out in response to this post]. They're cute -- though my favorite is "Je pourrais manger un curé frotté d'ail" (="I could eat a parish priest rubbed with garlic", meaning "I'm very hungry"), which is an idiomatic phrase about hunger, but precisely not food-related slang, it seems to me.

But really, anyone whose command of their native language is worth a hill of beans should be able to think of a similar number of similar food-slang examples in about 3 minutes. So starting the timer:

small potatoes, a hot potato, cheesecake, a bean counter, have a bun in the oven, freeze your buns off, a piece of cake, easy as pie, that's the icing on the cake, a fine kettle of fish, toffee-nosed, she's a peach, in a pickle, in cherry condition, the car's a lemon, a smart cookie, a tough cookie, a tomato can, hotdogging, like white on rice, put some mustard on it, the rotten apple that spoils the barrel, like a turd in the punchbowl, in apple-pie order, a plum assignment, that old chestnut, (the endearments pumpkin, cupcake, sugar, honey, lambchop), he's baked, stewed to the gills, out of his gourd, a rhubarb, grilling a suspect, to waffle, ham-fisted, ham-handed, mutton-chop whiskers, he's a marshmallow, upper crust,

Ding...

[LA Times link via email from Mark Seidenberg]

[Update: perlentaucher.de points out that the Guardian's survey has determined that 14 of the world's 50 best restaurants are in England, and 10 more in the U.S., vs. only 10 in France (might the distribution of their 600 experts surveyed might have had something to do with the results?) Anyhow, Wolfram Siebeck's description of his meal in the Guardian's #1 restaurant, the Fat Duck, near London, makes it clear that no fair-minded person can accuse the French of an unusual level of concern with food. Snail porridge? Sardine-on-toast sorbet? Lime tea mousse in liquid nitrogen? Mustard ice? Now I know what Monty Python have been up to recently...

In my limited experience, the English high-concept vocabulary for food and wine is much more elaborate (and more pretentious) than its French counterpart. I'm reminded again of an experience

at dinner in an upscale Los Angeles restaurant with Jean-Roger Vergnaud. The waiter delivered a long, poetic description of a wine that Jean-Roger had chosen, including the phrase "with a hint of earth in the nose." Jean-Roger paused for a carefully calculated moment, and then pointed to another choice. "And what about this one? Does it also have dirt on its nose?"

In fairness to Leslie Brenner, she's listing ordinary-language slang expressions dealing with food. But I'm not convinced that there is any linguistic register in which the French are in fact unusually preoccupied with food-related terminology.]

Posted by Mark Liberman at 07:37 AM

Fingertip search

Reading the news stories about the London bombings has taught me several new English compound words. A couple of days ago it was cleanskin, and today it's fingertip search. 7/16/2005 Guardian story by Hugh Muir and Ian Cobain headlined "Loving father, bad neighbour, Piccadilly line bomber"

Germail, believed to be of Jamaican origin, had lived with his partner, Samantha Lewthwaite, and their baby in the small red bricked house in Aylesbury, Buckinghamshire, for seven weeks.

Yesterday it was obscured by blue plastic sheeting as anti-terrorist squad officers conducted a fingertip search. [emphasis added]

I knew about a white-glove inspection, but a fingertip search was new to me.

Web examples suggest that it's an expression used by the British police, e.g. in this BBC story from August 19, 2002, "Fingertip search for clues", which describes such a search of a scene where bodies were found:

In investigations of this kind, scenes of crime officers cordon off the area and decide what type of experts are needed to help in the operation.

The atmosphere in the copse would be both quiet and efficient, according to Professor Anthony Busuttil, of Edinburgh University's department of pathology.

With 33 years experience in this highly specialised area, he knows better than most the difficulties involved in such cases.

"It's very hard work, emotionally, physically and mentally draining but someone must do it," he said.
Forensic investigations are strictly timetabled and co-ordinated.

"There is a lot of activity at the scene, but it's very quiet because everyone needs to concentrate extremely hard." said Professor Busuttil.

Wearing goggles, gloves and disposable sterile paper or plastic suits, officers crawl towards the bodies from a radius of several metres.

They pick up everything found with the naked eye while a police photographer and cameraman take pictures and video to help psychologists build a character profile of the killer.

An exhibits officer separately collects and places articles in sealed and labelled polythene bags.
Once the fingertip search is complete, raised aluminium or wooden platforms are erected to reach the bodies without disturbing the earth beneath.

So the fingertips are the least of it, it seems.

In the first few pages of returns from Google (where my favorite is the job opening in the West Midlands where "you will oversee the installation of amphibian fencing, undertake a fingertip search for great crested newts, check traps, oraganise bat surveys and ensure that there is no negative effect to the badger population"), I don't find any examples of the phrase "fingertip search" being used by American sources. I'm sure that American law enforcement teams carry out searches of the same kind -- I wonder what they call them.

Posted by Mark Liberman at 07:10 AM

July 14, 2005

No splitting in court

The British tend to be severe about split infinitives. Case in point: a passage from John Mortimer's story "Rumpole and the Sporting Life", in the collection Rumpole and the Golden Thread (originally published 1983; from the reprint The Second Rumpole Omnibus, p. 379):

'That is all I have to say in opening this sad case, members of the jury,' [prosecutor Mervyn] Harmsway finished. 'And now, with the assistance of my learned friend Mr Gavin Pinker, I hope to fairly put the evidence before you.'
'You are causing me a great deal of pain, Mr Harmsway.' A dry voice came from the Bench.
'I'm sorry, my Lord?' Harmsway looked puzzled.
'Please. Don't split them.' The Judge was looking extremely pained.
'Don't split what, my Lord?'
'Your infinitives!' his Lordship cracked back. 'This is a distressing case, in all conscience. Do we have to add to the disagreeable nature of the proceedings the sound of you tormenting the English language?...

Hmmm... accusative plus gerund. Dubious on your own standards, my Lord.

[This was originally posted under the title "Splitting infinitives in court", but then I had a better idea.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:13 PM

Learning the ropes in the trenches with Dan Brown

A renowned male expert at something dies a hideous death and straight away a renowned expert at something quite different gets a surprise call and has to take an unexpected plane flight and then face some 36 hours of astoundingly dangerous and exhausting adventures involving a good-looking (and of course expert) member of the opposite sex and when the two of them finally get access to a double bed she disrobes and tells him mischievously (almost minatorily) to prepare himself for strenuous sex. Where are we?

We're in a Dan Brown novel. If you could view film clips of the last minutes of Deception Point and Angels and Demons you would not be able to tell the difference. The adventures Brown relates are formulaic to the point of being robotic. But... what can I tell you? It's summer again, I took another long plane flight with Barbara again, and... amazingly, I read another one.

I guess by now a watching anthropologist or psychologist would claim to have overwhelming behavioral evidence that Dan Brown is my favorite novelist for summer reading. And indeed, my latest Dan Brown adventure, Deception Point, really rips along (spying, robots, sex, scandal, exobiology, secrets, science, politics, rocket planes, sharks, volcanos, news conferences interrupted by dramatic helicopter arrivals, brilliant deductions telegraphed two chapters earlier — if this plot doesn't grip you, check with your doctor, you may be dead). But as a Language Log staffer, I had a duty to take a few notes, so I spent my time not only reeling and rocking with the surprises of the plot but also observing the grossly incompetent use of language.

Dan Brown's writing is so clumsy and inept that I am definitely (God help me) beginning to enjoy the experience of poring over it.

The thing is, it's all the same unmistakable features. Utterly mysterious attempts to describe people's eyes, for example (my article on Angels and Demons in the book Secrets of Angels and Demons had a whole bunch of these). A soldier on an Arctic glacier has "eyes as desolate as the topography on which he was stationed" (would that mean totally white?); an intelligence director has eyes "which despite having gazed upon the country's deepest secrets, appeared as two shallow pools" (do some people's eyes change depth after viewing a few classified memos?); and the president has eyes which "mirrored sincerity and dignity at all times." Dan doesn't mean mirrored in that last one, not in either of its two senses — look it up. He might have meant displayed, or perhaps even reflected, but he didn't mean "mirrored"; once again he has picked a word out of his thesaurus that he doesn't know how to use.

There's also another ghastly misuse of one of Dan's favorite words, precarious, and it's combined with another eye description. Dan has no idea what precarious means (in Angels and Demons he used it to describe a tone of voice). On page 72 of Deception Point we get this (thanks to Ben Zimmer for helping me to find it again — I really must remember that Amazon's search-inside feature can be used for things like this):

Her gaunt six-foot frame resembled an Erector Set construction of joints and limbs. Overhanging her precarious body was a jaundiced face whose skin resembled a sheet of parchment paper punctured by two emotionless eyes.

The only two meanings of precarious that are at all common today are the senses that Webster calls 4a, "dependent on chance circumstances, unknown conditions, uncertain developments", and 4b, "characterized by a lack of security or stability that threatens with danger". The other senses are rare or archaic ("dependent on the will of another" is certainly archaic; "dependent on uncertain premises" is not far from it; and according to Webster, the word can also mean "importunate"). I simply have no idea what Dan is trying to say in the quoted passage about Marjorie Tench's body.

Fans of Dan Brown syntax would be disappointed without any instances of the familiar disiplinary descriptors used without articles as if they were titles, and we do get those: "Geologist Charles Brophy", p. xi; "Prize-winning astrophysicist Corky Marlinson", p. 93.

We also get stock phrases repeated almost verbatim. I got tired of attempts to manipulate me into vicarious shocks through cataphoric pronouns referring to Rachel's sensory experiences that were about to be described in the next paragraph: "It was then that Rachel saw it" (p. 69); "Then Rachel heard it" (p. 71); "Then Rachel saw it" (p. 127)...

Sometimes he uses phrases so creakingly strange that you remember them vividly from a single occurrence five hundred pages earlier. Notice "His voice had a lucid rawness to it" (p. 15). Describing the same person on p. 507 Dan writes again: "The man's voice had a lucid rawness to it." The connection between being lucid (suffused with light) and being raw (not cooked) does not make for an effective description. One searches for metaphorical uses that can be matched with each other and with voice timbre (lucid can mean "readily intelligible"; raw can mean "unpleasantly cold and damp"; neither applies very well to vocal quality).

As Lady Bracknell might have said, to use such an odd phrase once may be regarded as a misfortune; to use it twice looks like carelessness.

But the acme of inexpertly crunched metaphors in Deception Point is on page 27 (and I swear I'm not making this up): he uses the expression "learning the ropes in the trenches". Think about that for a while. Learning the ropes is a naval metaphor; it's about rigging and sails and mooring. Being in the trenches is an army metaphor. You can hardly be in both services simultaneously — hauling up sails on a naval frigate while dug in with the infantry on the western front. Dan has to make his military metaphor mind up.

I'm sorry, but this man is simply not competent to write prose for public consumption. He should be dictating his wild, action-filled plots to a literate ghostwriter who knows how to string description and dialog together. Deception Point, by Dan Brown as told to Henning Mankell.

I have another long plane flight at the end of August, and I'll be reading Digital Fortress. Watch this space.

Posted by Geoffrey K. Pullum at 12:29 PM

A few players short of a side

On Chris Weigl's Eggcorn Database, Language Hat has contributed short-sided: "another one of the American English /t/-flapping eggcorns, like deep-seeded, centripetal » centripedal etc."

The phrase "American English /t/-flapping" refers to the fact that in nearly all varieties of American English, instances of the phoneme /t/ that don't precede a stressed vowel (within the same word) become "voiced" (i.e. pronounced as [d]) and "flapped" (i.e. produced with a quick tap of the tongue rather than a full "stop" articulation). As a result, sighted and sided become homophones -- except for the minority of Americans like me, who have different sounds for the vowel, as discussed here with respect to the pronunciation of tighty/tidy whities.

It's easy to find other examples of short-sided on the web:

Does a short-sided, commercially motivated reform policy perpetuate regional arms races?
Short sided on my part, I am sure.
That digital editions of newspapers will fail because they're not interactive newspaper websites is a short sided argument.
To use firemen about a drug that can cause problems with the liver is very short sided, of the drug company and the advertisment.
"This would be a very short-sided action," Miller said. "As trustees, we are required to protect the coast."
It is my opinion that taking short-term savings is short sided.

There's one mistake in the DB entry: it says that "in soccer, short-sided refers to a game that is played on a field of different dimensions". But according to this American Youth Soccer Organization page, "AYSO recommends that all children under the age of 12 play short-sided (less than 11 players per team) soccer".

This also helps explain the origin of the eggcorn, which as usual has a semantic as well as a phonetic dimension. A "short-sided" team is one that is playing without its full complement of players, and this feeds into what may well be the commonest single Snowclone of Foolishness: <small quantity of essential items> short|shy of a <desirable collection>. With short, a brief web search turns up:

A few cards short of a full deck
A few fries short of a Happy Meal
A few beers short of a six-pack
A few ants short of a picnic
A few straws short of a picnic basket
A few noodles short of a Happy Bowl
A few threads short of a sweater
A few clowns short of a circus
A few guns short of a posse

With (the more British?) shy:

A few bricks shy of a load
A few oats shy of a haggis
A few sandwiches shy of a picnic
A few birds shy of a flock
A few straws shy of a bale
A few links shy of a chain
A few rounds shy of a full clip

There are also plenty of examples with different quantifiers:

One green maraschino cherry short of a fruitcake
Three pickles short of a barrel
Five quarks short of a sub-atomic particle
One can short of a 6 pack
One can short of a candelabra

New examples are invented all the time. Googling "a few * short|shy of a picnic" will turn up sandwiches (or "sangers", "sammies", "sarnies", "sammiches", "sambos" etc.), ants, buns, strawberries, apples, corndogs, butties, baguettes, rolls, loafs, plates, panini, hampers, baskets, snacks, forks, ribosomes, cans, and so on.

[Update: Ben Zimmer points us to the "The Canonical List of Fulldeckisms", available here and elsewhere.]

There's another whole class of examples where it's implied that the lack or failure prevents disaster rather than inducing foolishness, like "One wave short of a shipwreck".

Finally, I've pointed out previously that it's possible to have a predictive theory of eggcorns, and here again we can predict that short-sightedshort-sided ought to be matched by near-sightednear-sided and far-sightedfar-sided. The frequency might be lower, though, since the semantics does not seem to fit as well. (Though the metaphorical sense of being concerned with things on the near side or on the far side should help out -- and the corresponding meaning may also be involved in the genesis of "short sided".)

Google turns up the evidence, right on schedule:

I have astigmatism and I am also far sided.
There he was looked at and I was told that he not only had lazy eye but was also suffering from being far sided. This is when one can not see things very well from a close range!!
May we not be near sided and focus on our flesh to keep the law and may we not be far sided and focus on some future judgment of an angry God.
im near sided in one eye and far sided in the other
Fidel's baby brother is a spry 70, still considered the heir-apparent and, despite the consistent trashing that he gets among Cuban-Americans, he proved quite intelligent and, in the words of MSNBC, "far-sided" in the restructuring of the army after the economic crisis which followed the collapse of the Soviet Union.
If it is clear I wanna try naked eye, but I'm near sided so I will have my scope and binos near.

 

Posted by Mark Liberman at 09:35 AM

July 13, 2005

Fidditch forever

The most recent Lulu Eightball strip by Emily Flake:

Looking on the bright side, evidence of intense interest in language is all around us. A couple of days ago, spock posted a MetaFilter link on the spread of y'all, and as grouse pointed out,

Today on MeFi's front page we have Karl Rove, London terror attacks, and MSG. And the post with the most comments is this one. Amazing.

Apparently Miss Fidditch will never die, even though she would rather parse than eat. The MeFi discussion is interesting, by the way, with Language Hat performing his usual yeoman service.

[Lulu Eightball link via Lauren Squires at Polyglot Conspiracy]

Posted by Mark Liberman at 02:59 PM

I been there before

After the adventure is over, Huck Finn tell us that

... I reckon I got to light out for the Territory ahead of the rest, because Aunt Sally she's going to adopt me and sivilize me, and I can't stand it. I been there before.

One of the things that I like best about the Enlightenment is the sense of intellectual exploration without strong disciplinary boundaries. Although 18th- and 19th-educational and social structures were harsher and more rigid than ours, individual thinkers found it relatively easy to light out for the intellectual territories, whose forests and prairies of thought were not yet divided and subdivided into the townships and neighborhoods of today's academic disciplines.

This work was not so much interdisciplinary as antedisciplinary, and in the inaugural issue of PLoS Computational Biology, Sean Eddy argues that ante- is still better than inter-.

He starts by quoting an NIH Roadmap on the need for interdisciplinary science:

"The scale and complexity of today's biomedical research problems demand that scientists move beyond the confines of their individual disciplines and explore new organizational models for team science. Advances in molecular imaging, for example, require collaborations among diverse groups — radiologists, cell biologists, physicists, and computer programmers."

and comments that

Reading this made me a little depressed. For starters, the phrase “organizational models for team science” makes me imagine a factory floor of scientists toiling away on their next 100-author paper under the watchful gaze of their National Institutes of Health program officers, like some scene from Terry Gilliam's movie Brazil. It's also depressing to read that the National Institutes of Health thinks that science has become too hard for individual humans to cope with, and that it will take the hive mind of an interdisciplinary “research team of the future” to make progress. But what's most depressing comes from purely selfish reasons: if groundbreaking science really requires assembling teams of people with proper credentials from different disciplines, then I have made some very bad career moves.

I've been a computational biologist for about 15 years now. We're still not quite sure what “computational biology” means, but we seem to agree that it's an interdisciplinary field, requiring skills in computer science, molecular biology, statistics, mathematics, and more. I'm not qualified in any of these fields. I'm certainly not a card-carrying software developer, computer scientist, or mathematician, though I spend most of my time writing software, developing algorithms, and deriving equations. I do have formal training in molecular biology, but that was 15 years ago, and I'm sure my union card has expired. For one thing, they all seem to be using these clever, expensive kits now in my wet lab, whereas I made most of my own buffers (after walking to the lab six miles in the snow, barefoot).

Uphill. Both ways. Right.

Sean makes the frontier analogy explicit:

Perhaps the whole idea of interdisciplinary science is the wrong way to look at what we want to encourage. What we really mean is “antedisciplinary” science—the science that precedes the organization of new disciplines, the Wild West frontier stage that comes before the law arrives. It's apropos that antedisciplinary sounds like “anti-disciplinary.” People who gravitate to the unexplored frontiers tend to be self-selected as people who don't like disciplines—or discipline, for that matter.

Thomas Kuhn wrote that "Normal science, the activity in which most scientists inevitably spend almost all their time, is predicated on the assumption that the scientific community knows what the world is like." And also that we know how to divide problems up and assign the pieces to different departments and subdepartmental specializations.

This kind of normal, disciplinary science should not be scorned. Industrial production is much more efficient than handicrafts, and (stereotypes aside) it also usually produces better-quality goods. And some people prefer stabler, safer and more predictable social contexts. The Territory held no charms at all for Aunt Sally, and probably not for Tom Sawyer either, once he grew up a little. But for others, the antedisciplinary frontier is a lot more fun. And I think Sean is right that the most effective exploration of new areas is usually done by individuals who learn what they need to know in order to find their way.

[Link to Sean Eddy's essay via Ernie's 3D Pancakes]

Posted by Mark Liberman at 10:28 AM

Cleanskins

As explained in a story by Duncan Campbell and Sandra Laville in the Guardian:

Four men, between 18 and 30, three of them with West Yorkshire addresses and all of them British, met up at Luton station before boarding a Thameslink train to King's Cross last Thursday morning.

It appears that the four, described by security sources as "cleanskins" - with no convictions or known terrorist involvement - reached their rendezvous via two or three hired cars, one of which had been located yesterday at Luton station. Explosives were found in the car, police revealed last night.

Cleanskin is a new word for me, but the OED has "clean-skin (Austral.), an unbranded cow; (slang) a person with a clean police record (see also quot. 1941)":

1881 A. C. GRANT Bush Life in Queensl. I. xv. 209 All hands are anxious to try their luck with the *clean-skins.
1931 F. D. DAVISON Man-Shy (1934) ix. 130 She was not a cleanskin; the Mirramilla brand was on her rump.
1934 Bulletin (Sydney) 1 Aug. 46/3 Lifted them cleanskin micks while Morney was in town. 1936 M. FRANKLIN All that Swagger ix. 83 Delacy began the trapping and branding of cleanskin cattle.
1941 BAKER Dict. Austral. Slang 18 Cleanskin, a person of integrity, esp. in a political sense.
1945 -- Austral. Lang. vii. 141 A man who has had no convictions recorded against him is a cleanskin.
1950 ‘N. SHUTE’ Town like Alice 263 A poddy's a cleanskin, a calf born since the last muster that hasn't been branded.
1967 C. DRUMMOND Death at Furlong Post x. 126, I just dictated a report that they seem clean-skins.
1969 Daily Mirror (Sydney) 12 Mar. 11/4 Had he been a clean-skin..Mr Byrne might have..not recorded a conviction.

These days in Australia, cleanskin is apparently used for "unbranded" in a different sense, specially with respect to wine. Thus "Cleanskin Kings Australia is a business dedicated to offering premium cleanskin wines, from Australia's finest wine regions", explaining that "Cleanskin wine (unlabelled wine) take [sic] the pretension out of 'label drinking' & emphasise the quality of the wine rather than valuing the wine on it's [sic] price". And from a recent feature in Australia's The Age:

In Milawa, the Brown Brothers Winter Festival spills over three days with a feast of (yes, you guessed right) wine, food and jazz. On Sunday, visitors to Gapsted Wines can taste from the barrels of more than 40 wines, including old museum stock, current releases and previously unreleased cleanskins.

The announcement of the bombers' identities came too late to save Sarah Boxer from embarrassing herself in a July 12 NYT article on the We're not afraid website:

But more and more, there's a brutish flaunting of wealth and leisure. Yesterday there were lots of pictures posted of smiling families at the beach and of people showing off their cars and vans. A picture from Italy shows a white sports car and comes with the caption: "Afraid? Why should we be afraid?"

A few days ago, We're Not Afraid might have been a comfort. Today, there's a hint of "What, me worry?" from Mad magazine days, but without the humor or the sarcasm. We're Not Afraid, set up to show solidarity with London, seems to be turning into a place where the haves of the world can show that they're not afraid of the have-nots.

But as Sandra Laville and Ian Cobain explained in the Guardian a few hours later:

Ten days ago Shahzad Tanweer, a 22-year-old British Asian, was playing cricket in the local park with his friends. It was something he loved to do. He was a sporty young man who loved martial arts, drove his dad's Mercedes and had many friends in the Beeston area of Leeds.

"He is sound as a pound," said Azi Mohammed, a close friend. "The idea that he was involved in terrorism or extremism is ridiculous. The idea that he went down to London and exploded a bomb is unbelievable.

They don't mention what color his dad's Mercedes is.

Update: apparently the wine-industry meaning has taken over entirely from the unbranded cattle and first-time criminals, at least among some Australians. Danielle McCredden writes from down under:

"Cleanskins" in the sense of unbranded wine is now in very widespread usage in Australia - perhaps due to a glut in the wine market, vintages and blends which do not quite measure up to the high standards of the vineyard are sold unlabelled (with only the required information about region, blend, alcohol content etc) so as to save on marketing and other expenses. In turn, cleanskins are universally seen as a source for wine which is better quality than its price.

The use of "cleanskins" in the sense of without criminal conviction is unheard of, in fact I had never heard this meaning of the word before you mentioning it in your post. The word does have a vague association with youth and innocence - if you were to describe someone as "clean-skinned" it would imply more than just good facial hygiene, also suggesting a notion of innocence or naiveté).

Well, the OED treats clean-skinned separately. And a roughneck isn't necessarily rough necked, while a roughhouse needn't have anything to do with a house at all. But Danielle's note suggests that the compound cleanskin is still somewhat compositional for Australians. ]

[Update #2: David Nash, also from Oz, has different reactions--

In contrast to Danielle McCredden, I was only faintly aware of the wine sense of 'cleanskin'; whereas I disagree that "'cleanskins' in the sense of without criminal conviction is unheard of".

Well I suspect Danielle is in a younger generation -- I'm sure the "no convictions" sense is familiar to me, as borne out by the quotations you have from the OED. And I recall using it myself in a context of bringing into a court case a fresh expert witness with no prior involvement in the particular case. And, as you say, it is quite separate from 'clean-skinned'.

]

[Update #3: a 7/14/2005 story in the Washington Times by Paul Martin says that the Mercedes was red, and it was Tanweer's own car, not his dad's:

The previous evening, Tanweer, who had been studying sports at college, had greeted friends cheerfully as he drove his brand-new red Mercedes-Benz.

The car had just been given to him by his adoring father, Mohammed Tanweer, who immigrated to Britain from Pakistan 30 years ago.

Other stories cited the red Mercedes are here and here.]

Posted by Mark Liberman at 12:05 AM

July 12, 2005

Department of rarely-used cliches

According to an AP wire story by John Leicester that ran yesterday, Bobby Julich of Team CSC said that Lance Armstrong

rode to a second-place finish in the opening time trial, building big time gaps over Ullrich, Basso and others.

"That was scary," Julich said. Such a strong starts shows that "he's ready to rock some cages in the mountains."

Today was the first mountain stage, and Armstong indeed regained the lead in the race, increasing his margin over Julich by 5:18.

When Julich (a native speaker of American English) predicted that Armstrong was "ready to rock some cages", he seems to have meant something like "ready to shake some people up". This way of putting it is not quite unique, but there's only one Google hit, which was surely not his inspiration:

Shame about Wisenut, when it was forst released it looked like it could rock some cages, but unforunately its not doing to well on keeping current.

The expression "rock some cages" is an idiom blend. One of the sources is "rattle * cage(s)", which metaphorically evokes the taunting of captive animals:

I need to rattle some cages to get the permissions to do my job.
He's an innovator who is not afraid to rattle some cages just to get people thinking.
...the al Qaeda operatives who rattled the American cage via the murderous attacks of 9/11 were not, in fact, denizens of Iraq.
Sounds like the Lakers rattled some cages in Michigan again!

The other source is "rock <someone's> world". This started as reference to romantic impact, as in Michael Jackson song "You rocked my world, you know you did...", but has come to be a general phrase for psychological impacts of all sorts, as in "Joe Perry's Rock Your World Hot Sauce", or the headline "New TVs .. Will Rock Your World", or the slogan "... watch how you rock their world with this awe inspiring garden stone". The verb has been generalized from this source in expressions like "rock the vote", with a meaning something like "act so as to have a dramatic impact on <something>".

Because it makes a certain amount of sense on its own terms, the combination "rock some cages" is more subtle than other idiom blends featured in previous LL posts: "it isn't rocket surgery"; "page burner"; "the way the cookie bounces"; "he flipped his cork"; "that's a different cup of fish". You can find some related structures in Saul Gorn's Compendium of Rarely Used Cliches: "This subject is so important that I'd like to see it deserve considerable study."

[Update: Several readers wrote to suggest that another source might be the very common expression "rock the boat". This could well be in the mix -- and especially in the background of "rock my world", "rock the vote" (which rhymes with it), and so on. When I first read Julich's expression "rock some cages", I thought that the ideas of "psychological impact"and "taunting beasts" were dominant, but maybe the sense of disturbing an equilibrium is just as important. It's hard to allocate credit (or blame) in an individual case like this.]

Posted by Mark Liberman at 06:03 PM

Flaccid designators

Unfogged.com: "I was using a definite description! It's got different modal properties!" This is a "teachable moment" for philosophers of language: how about a tutorial?

[via Neurath's Boat]

Posted by Mark Liberman at 05:14 PM

Vocatives we doubt ever got vocked

I have a long-standing interest in vocatives and other free-standing uses of NPs -- I am the author of "Hey, whatsyourname!" (Chicago Linguistic Society 10.787-801 (1974)), after all -- so I was entertained to see the vocatives that Merck & Co. advises its sales reps to use in pitching its drugs to M.D.s at social events organized by the company. This from training materials put together by Merck, as quoted in the July 2005 Harper's Magazine, pp. 16-7.

Merck provides "scenarios" for "transitioning" from small talk to a "HEL [Health Education Learning] situation" during a "dinner program". Two of the scenarios come with vocatives:

Scenario 3
Physician says: "What a great footgame yesterday. Did you see how effective Drew Bledsoe was in the fourth quarter? That guy is amazing."
Possible rep response: "Bledsoe is effective on so many levels. He's a leader, you feel safe with him carrying the ball, and he's a proven winner. You know who else sounds like that? Zocor, a market leader with an eight-year safe record, proven to save the lives of your patients. Physician, what concerns do you have about Zocor leading your team in the fight against congenital heart disease?"
Scenario 4
Physician says: "So, what plans do you have for the holidays?"
Possible rep response: "Well, my wife and I are going to visit my grandmother. It should be a lot of fun, though I feel so bad for her. She really has advanced osteoporosis and can't travel at all. She wasn't on any treatment plan for the longest time. Physician, what do you think the reasons are that some physicians don't do much about osteoporosis until it's in its advanced stages and nearly too late?"

No doubt the vocative "physician" is just a stand-in for a personal name -- "Dr. Krankheit" or "Otto" -- or the title that real people (rather than fictional sales reps) use when addressing M.D.s, namely "doctor". Anyone who used "physician" as a vocative (other than in the quotation, "Physician, heal thyself") would be looked at very oddly. But then anyone who uses the human question word who in reference to the drug Zocor is already not fully in the real world. Not to mention the lame twisting of the topic of conversation to suit Merck's interests.

The other two scenarios are equally entertaining, but lack vocatives.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:15 PM

Modified limited out hanging

Ed(itor) at the Blawg Review (motto: "Welcome to a world where inexperienced editors make articles about the wrong topics worse") emailed a link to this sentence in a law student's blog:

In miscellaneous other news, I just want to drop a shout out to the Dancer, out with whom I hung for an hour last night.

Ed's verdict: "an interesting blend of the casual, the colloquial and the formal".

A few others out there on the web have seen the same linguistic opportunity for hip irony:

tom thumb's blues - mr. dan boger, a cool guy from Des Moines, out with whom I hung at the Joliet show.

But it seems to me that there ought to be a way to get "modified limited" in there:

PRESIDENT: You think, you think we want to, want to go this route now? And the--let it hang out, so to speak?
DEAN: Well, it's, it isn't really that--
HALDEMAN: It's a limited hang out.
DEAN: It's a limited hang out.
EHRLICHMAN: It's a modified limited hang out.
PRESIDENT: Well, it's only the questions of the thing hanging out publicly or privately.

A witticism sometimes attributed to Winston Churchill uses hanging out in its literal sense. Given the scholarly debunking of the famous "up with which I will not put" rejoinder -- which is grammatically illogical and also not due to Churchill -- and the fact that the same witticism is sometimes attributed to Samuel Johnson, I suppose that this one may also be apocryphal; but I hope not.

Posted by Mark Liberman at 10:01 AM

July 11, 2005

(Hallucinatory) etymology as argument

Clare Girvan spotted another example of (false) etymology as argument, following up on earlier LL posts:

Essentials of Faith - a current UK TV programme about comparative religion - interviewed Leo Rutherford, a Shamanic Practitioner. He justified the use of hallucinogenic drugs in religious ritual on the grounds that "'hallucinogenic' comes from the word 'hallowed'".

Nice try, Leo. But the AHD sez the etymology of hallucinate is

Latin hallūcinārī, hallūcināt-, to dream, be deceived, variant of ālūcinārī.

whereas hallow is from

Middle English halwen, from Old English hālgian.

The trail diverges further as we track back past Old English and Classical Latin. The AHD suggests that hālgian comes from the Indo-European root kailo, meaning "whole, uninjured, of good omen", whose other current reflexes in English include hale, whole, wassail, holy, halibut, holiday and hollyhock. And according to Lewis & Short, ālūcĭnor means

to wander in mind, to talk idly, prate, dream

and is "[prob. from aluô, alussô; alê, alukê; cf. Gell. 16, 12, 3]", which are Greek words referring to uneasiness or restlessness or wandering, especially of people who are sick.

So Leo would have been etymologically (if not logically) correct in supporting the use of halibut in religious rituals, but for magic mushrooms he'll have to look to other arguments.

Curiously, my fingers insist on typing hallucinate as hallunicate. Probably due to -- oh, never mind.

Posted by Mark Liberman at 07:18 AM

The I states

From The Advocate of 7/5/05, p. 10, a correction:

Author Vestral McIntyre, whose book You Are Not the One was featured in our June 7 issue, is from Idaho, not Iowa. The Advocate regrets the errors.

Oh, those I states! Especially the dactylic ones. Especially in the middle of the country. So hard to keep apart. Sometimes Ohio gets in there. And Indiana and Illinois.

Then there are the middle M and W states (W is just M upside-down). Michigan, Minnesota, maybe Montana, certainly Wisconsin. Who can keep them straight? Badgers, wolverines, whatever.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:51 AM

Lexical reversal

Substitution of ancestor for descendant or vice versa is well known, and people occasionally say yesterday when they mean tomorrow or vice versa. Add to these and similar examples: subsequent for prior:

The days subsequent to Kurt Kobain's apparent suicide in April 1994 have been a constant source of mystery and speculation. In indie film provocateur Gus Van Sant's latest film, Last Days, the director fictionalizes those days...(Jared Abbott, "Seeking Nirvana", Genre of July 2005, p. 18)

Some prescriptivists deprecate both subsequent and prior -- see Merriam Webster's Dictionary of English Usage -- on the grounds that they are highfalutin' latinisms deployed when honest Anglo-Saxon after and before would do as well. I don't necessarily subscribe to that, but I do think that if you're going to use latinisms, you should pick the right one.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:38 AM

July 10, 2005

Still more Declaration of Independence

On the occasion of America's national day, the Fourth of July, Geoff Pullum has pointed out a dangling modifier in the Declaration of Independence.  In follow-up e-mail, a correspondent notes a restrictive relative which in this document -- not a surprise, really, but still gratifying.  And now I supply a striking failure of parallelism, in one of the senses of parallelism that appears in the advice manuals.

Close to the end of the Declaration comes the bold

(1)  We must... hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends.

Yes, yes, this was deliberate.  It's an instance of CHIASMUS, a rhetorical figure in which elements are inverted in the second of two matched phrases, so as to foreground, and emphasize, these elements.  (It's also an example of ASYNDETIC COORDINATION, lacking an explicit conjunction, but that's not my topic for today.)  Still, it falls foul of Strunk & White's 19th principle in their Elements of Style, "Express coordinate ideas in parallel form"; in fact, it's a much bolder violation of this principle than the example that Strunk & White begin their discussion with:

(2)  Formerly, science was taught by the textbook method, while now the laboratory method is employed.

Strunk & White continue their exposition on maxim 19 with various types of questionable coordinations, mostly of the sort we've been calling "WTF coordination" here at Language Log Plaza (most recently, in Eric Bakovic's report of a possible Bushism and in my discussion of two specific cases, one involving recipe register features, the other coordinate questions).  That is, they lump together rhetorical parallelism and the requirements of syntax.  As it turns out, they also work with an implicit, unexamined theory of coordination that's seriously confused.  And they cast their advice in very general terms, without seeming to realize that their rules actually make predictions about what's acceptable English, many of which they would surely not welcome.

Strunk & White aren't alone in these respects.  As I'll illustrate briefly from two recent manuals, the advice literature on parallelism exhibits all three of these problematic features: a fuzzy notion of parallelism (more generally, a failure to distinguish grammar, usage, and rhetoric), a seat-of-the-pants syntactic theory, and wildly overgeneralized prescriptions.

But first, restrictive relative which.  Language Log reader Mike Jacovides reminded me, on July 5th, that the very first sentence of the Declaration of Independence has one:

When in the course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another,...

This is scarcely a surprise to scholars of English grammar.  Occurrences of restrictive relative which are all over the place.  Still, it's nice to see one right up front in the Declaration.

On to parallelism.  The Declaration's striking chiasmus, in (1), has contrasting phrase orders (with the syntactic functions of these phrases held constant).  Strunk & White's example, in (2), is subtler, involving contrasting syntactic functions for the phrase X methods -- oblique object in the first clause, subject in the second (both clauses being passive, and so in some sense entirely parallel in form); the main verb is also varied (taught in the first clause, employed in the second), but that pretty much comes with the territory.

Let me remind you: (2) (from the 4th edition of The Elements of Style (1979), p. 26) is supposed to be BAD.  Those of you who thought this was an effective deployment of NPs within the matched passive clauses are just wrong, according to S&W.  (Well, I thought it was pretty good, and so did my sophomore seminar students who were assigned S&W in writing classes at Stanford.  But what do WE know?  I'm a linguist, and linguists are widely believed to be enemies of all that is good, true, and beautiful about language.  And they're just kids.)  S&W want you to rewrite (2) as (3), which strikes me, and my students, as clunky and flat-footed:

(3)  Formerly, science was taught by the textbook method; now it is taught by the laboratory method.

S&W explain their maxim 19, "Express coordinate ideas in similar form", as follows:

    This, principle, that of parallel construction, requires that expressions similar in content and function be outwardly similar.  The likeness of form enables the reader to recognize more readily the likeness of content and function...

    The unskilled writer often violates this principle, mistakenly believing in the value of constantly varying the form of expression.  When repeating a statement to emphasize it, the writer may need to vary its form.  Otherwise, the writer should follow the principle of parallel construction.

At least they leave some room for chiasmus and for lexical variation, in cases of extreme exigency.  Like the Declaration of Independence, maybe.  But (2) won't do.

Note that S&W don't just deprecate (2).  They also attribute motives to the writer, who, they suggest, is guilty of "mistakenly believing in the value of constantly varying the form of expression".  Now, pointless variation is in fact a sin of unpracticed writers, but lots of variation has a point, so there's no justification for viewing all variation with suspicion.

The attribution of motives gets worse.  On pp. 26-7, S&W, quite extraordinarily, tell us:
    The [first] version [(2)] gives the impression that the writer is undecided or timid, apparently unable or afraid to choose one form of the expression and hold to it,.  The [second] version ([3]) shows that the writer has at least made a choice and abided by it.
My student who wrote his course paper on faulty parallelism, Jerry Zee, was offended by this bit of psychoanalyzing.  So was I when he drew my attention to it.  I mean, S&W, where do you guys get off?

But let's get specific.  We're supposed to express coordinate ideas in parallel form.  In particular, no shifting phrases from one syntactic function to another.  What about  the ordering of constituents?  Is the following bad writing?
They first decide by vote that -orum is to be the plural ending of the 'genitive' case ('of the cactuses'), and then they start arguing about the plural ending for the 'dative' case ('to the cactuses').  (Guy Deutscher, The Unfolding of Language (NY: Metropolitan Books, 2005), p. 4.  I'll have some quotes from Deutscher just because I happen to be reading the book right now.)
The non-parallel elements are they first decide... and then they start..., with varying orders of adverb and subject.  Is this bad?  I think not.

What about conjoining a passive with an active?
... in the latter days of the language, when the endings on nouns were worn away and disappeared.  (Deutscher, p. 6)
I wouldn't have noticed this "non-parallelism" if I hadn't been looking for such things.

Once you start taking S&W's bald notion of "parallel form" seriously, you realize just how silly it is.  Are transitives conjoined with intransitives bad?  Main verb VPs conjoined with auxiliary verb VPs?  Positive constituents with negative ones?  NPs with a determiner conjoined with ones without?  NPs with adjectives conjoined with NPs that have none?  As in the following examples, most of them invented:
They drank the poison and died.
They knew the answer and couldn't wait to show off.
They could see the creature but couldn't identify it.
I sing of arms and the man.
She craved boys and strong drink.

What about conjuncts with ellipses?  Are they parallel to conjuncts with explicit elements?  Why or why not?  Consider the following:
The language machine allows just about everybody... to tie these meaningless sounds together into an infinite variety of subtle senses, and all apparently without the slightest exertion.  (Deutscher, p. 2)
Two expressions connected by and.  They sure don't look parallel in form.  Is this bad?

I could go on like this forever.  Most kinds of variation in the internal form of matched phrases are entirely innocuous.  S&W haven't thought for a minute about what counts as relevant parallelism in syntactic structure and when it's relevant to rhetorical grace and effectiveness.  What they're doing is arguing from particular cases: identifying examples they find unacceptable in some way. assigning a label to the "error" in question, and then proscribing all instances of that sort.  In so doing, they overgeneralize drastically, barring all sorts of unexceptional examples; their "rules" make predictions, many of which are just wrong.

So far this is all about rhetoric.  No one would, I hope, claim that any of the examples I've given so far are UNGRAMMATICAL.  But at this point S&W pass on to a collection of examples where many people, including several Language Log posters, would entertain that possibility.  We have reached the borders of WTF-land.  S&W see no boundary here; for them it's all problems with "parallelism" in some sense or another, who cares which.  This is a disservice to their readers.

And they have no vocabulary at all for talking about the errors they exhibit, beyond saying that expressions are non-parallel because words in them are misplaced or missing or incorrectly chosen.  They have no vocabulary because they have no conceptual apparatus for syntactic structure, in particular the structure of coordinate expressions.  I'm going to have to take the first steps towards remedying these lacks, just so we can talk about the data.

What we're looking at, mostly, is "factorable coordination", in which conjuncts Y and Z (I'll restrict myself to two conjuncts for simplicity) are associated with a factor X, as in the following:

Mary and her mother shouted. 
    (conjuncts Mary, her mother; factor shouted)
Mary shouted and got her mother's attention.
    (conjuncts shouted, got her mother's attention; factor Mary)
Mary was tired but happy.
    (conjuncts tired, happy; factor was)
Mary bought a screwdriver and hammer.
    (conjuncts screwdriver, hammer; factor a)

There's a huge literature on factorable coordination and the conditions it must satisfy.  (Here I tell you that I'm not going to refer to this literature, or explain where my somewhat idiosyncratic terminology comes from.  This is a Language Log posting, not a journal article.)  Among these are a category likeness condition, a distributivity condition, and a factor constancy condition, all of which could be seen as requirements on "parallelism":

Category Likeness: the factors must belong to the same syntactic category.

Distributivity: the factor must be well-formed in combination with each of the conjuncts.  (Crudely, X (Y + Z) = (XY) + (XZ).)

Factor Constancy: the factor must have the same semantics in combination with each of the conjuncts.

According to these conditions, the following coordinations are ill-formed (factors are in small caps, conjuncts in bold face):

*Mary BECAME enamored of the law and a judge at age 30.
    (Category Likeness violated: AP vs. NP)

*Analysts say that's evidence Microsoft should -- and likely is -- TAKING GOOGLE MUCH MORE SERIOUSLY.
    (Distributivity violated, because *should taking Google... is ill-formed)

*He PUT OUT his cigar and the cat.
    (Factor Constancy violated: put out 'extinguished' vs. put out 'expelled')

It's long been known that these formulations appear to be too strong; Category Likeness, in particular, is apparently violable in certain contexts, as in an example I caught on NPR this morning:
... no proof that the man is dead, or even a Taliban prisoner.
(In such cases, it's reasonable to ask -- as a great many investigators have done -- whether the problem lies with the formulation of Category Likeness or with the notion of SYNTACTIC CATEGORY itself.)

The formulations might be too weak; Distributivity, for instance, might be strengthened by requiring Relational Constancy, that is, by requiring that the factor bear the same syntactic relation to each of the factors.

And there may well be further conditions that don't have to do with "parallelism" -- a Constituency condition, for instance, requiring that each conjunct be a single syntactic constituent.

In addition, there are plenty of uses of coordinating conjunctions that are not, or at least are not obviously, instances of factorable coordination (as in tags: Mary is a lawyer, and so is Emma), and there are several types of factorable coordination, not all subject to the conditions in the same way.

Back to S&W.  What's supposed to do all the work for S&W is not the conditions above, but, so far as I can make out, a condition of Structural Likeness, requiring not identity of category for the conjuncts -- that is, identity with respect to their external distribution -- but identity of internal structure.  As we saw above, Structural Likeness (as a principle of syntax) massively makes the wrong predictions about acceptability (as well as rhetorical effectiveness).  There are a few cases, which I'll look at in a little while, where it seems to do some work that Category Likeness doesn't do, but on the whole it's a bad idea.  Once you bring out S&W's implicit assumptions, you can see that they're not very plausible, and that S&W's discussions are likely to confuse and mislead students.

S&W's successors have not improved on things.  Here I'll look briefly at the way two recent manuals treat parallelism: Diana Hacker's Rules for Writers (5th ed., Bedford/St. Martin's, 2004) and Andrea Lunsford & Robert Connors's The New St. Martin's Handbook (4th ed., Bedford/St. Martin's, 1999).  (S&W, Hacker, and Lunsford are not entirely random choices on my part.  These were among the authors my Stanford students consulted in their writing classes.)

Hacker's main instruction is justified on rhetorical grounds:
9  Balance parallel ideas.  If two or more ideas are parallel, they are easier to grasp when expressed in parallel grammatical form.  (p. 87)

This is familiar enough, and not very useful.  But then Hacker introduces a principle not in S&W:
Single words should be balanced with single words, phrases with phrases, clauses with clauses. (p. 87)

Oh dear, this is seriously bizarre advice.  Not joining phrases and clauses (*I asked a few questions and if I could be allowed to leave early) is generally good advice, which would follow from Category Likeness, or possibly Factor Constancy, as well as from Structural Likeness.  But not joining single words with multi-word phrases is terrible advice; it would, however, follow from Structural Likeness, though not from any of the more credible conditions above.

Hacker seems willing to stick to her (largely unarticulated) principles, though.  But at what cost?

Oddly, there are no bad examples of word-phrase mixing given.  There is, in fact, no example at this point in the text of single words joined with single words.  Instead we get (p. 87):
A kiss can be a comma, a question mark, or an exclamation point. (Mistinguett)
This is a fine sentence, but is she telling us that there would be something wrong, even just rhetorically, with a version in which the indefinite article is factored out?
(4) A kiss can be a comma, question mark, or exclamation point.

This is a subtle point, subtler than Hacker realizes, I suspect.  To start with, question mark and exclamation point are written as two-word sequences, but as far as English structure goes, they're just compound nouns.  That is, they're single words.  In which case, (4) involves a coordination of three single words -- comma, question mark, and exclamation point -- and should be fine by the word/phrase/clause rule. 

Well, fine only if the structure of the predicative in (4) has a in combination with a coordination of comma, question mark, and exclamation point.  But maybe Hacker is thinking of the conjuncts as the things that are separated by commas and/or conjunctions in writing, so that the three conjuncts are a comma and then question mark and then exclamation point.  If so, then (4) is not so good rhetorically.

But Hacker probably thinks that question mark and exclamation point are two-word phrases, not single words.  Then if (4) has a structure with a factored out, it's not so good rhetorically, because it would have the single word comma coordinated with the phrases question mark and exclamation point.  On the other hand, if a comma is a conjunct in (4), then (4) is rhetorically fine, since it would be a coordination of three two-word phrases.

So, according to what you think about the structure of (4) and what you think about the word status of compound nouns, (4) is either rhetorically fine (in two cases) or not so fine (in two cases).

I'm going on at such length about this one poor example because we -- and the students trying to use Hacker's handbook (let's not forget about them) -- don't know what her assumptions are on these matters, so that as soon as we try to go beyond the Mistinguett example, even to a closely related one, we don't know what to do.

Back in the real world, people do, of course, coordinate single words with multi-word phrases, all the time.  Here's an example from Deutscher, his second sentence in fact:
Other inventions -- the wheel, agriculture, sliced bread -- may have transformed our material existence, but... (p. 1)
There it is: a single word conjoined with a determiner-noun phrase and an adjective-noun phrase.  Sounds great to me.

As with S&W, Hacker starts out with purely rhetorical matters but then, still marching under the banner of parallelism, advances into the neighborhood of syntax.  By page 89 we're up to the sub-principle:
9a  Balance parallel ideas in a series.  Readers expect items in a series to appear in parallel grammatical form.  When one or more of the items violate readers' expectations, a sentence will be needlessly awkward.

It's stated as rhetorical advice, but the examples slide into some very dubious syntax, in particular a coordination of nominal gerunds with an infinitival VP:
Hooked on romance novels, I learned that there is nothing more important than being rich, looking good, and to have a good time.  [Hacker tells you to alter to have to having.]

The gerund-infinitive combination is way down on the acceptability scale for me, whether as object (above) or subject:
*Winning the race and to get the prize are my goals.
*To win the race and getting the prize are my goals.
The combination of an -ing-form VP and an infinitival VP in verb complements is equally bad:
*It started raining and then to snow.
*It started to rain and then snowing.

This is not some lack of rhetorical finesse.  This is rotten syntax.  The question is whether Category Likeness (or some other credible condition) would bar such combinations.  At first glance, nominal gerunds and infinitival VPs in subject and object positions look like they should just be NPs from the point of view of their external syntax, and -ing-form and infinitival complements look like they should just be VPs from the point of view of their external syntax, but these guys just don't like to coordinate.  Structural Likeness would predict this, but it's pretty much a dead loss otherwise, so the obvious tack to take is to examine the assignment of syntactic categories to these phrases.  Another technical question of syntax -- technical, but very important.

As it happens, (bad) coordinations of -ing-form and infinitival phrases turn up in discussions of parallelism in virtually every handbook, S&W being a rare exception.  Here's one from Lunsford & Connors (p. 266), an exercise to correct:
To need a new pair of shoes and not being able to afford them is sad.

I cannot believe that such coordinations are ubiquitous in the manuals because they occur with great frequency in student (or any) writing.  I've been editing other people's writing for about fifty years now, and I don't recall coming across any such examples.  The ones in the manuals all look invented; certainly, they're not attributed, though failures of Distributivity (a very frequent type of "faulty parallelism", and not only in student writing) sometimes are.  I can't help thinking that the coordinations of -ing-form and infinitival phrases found their way into the handbooks BECAUSE they were clear violations of Structural Likeness.

However, Structural Likeness is a blunt instrument, and Hacker wields it to bash some relatively innocent types of examples.  On page 90, under "parallel ideas linked with coordinating conjunctions", she disapproves of an -ing-form phrase coordinated with an ordinary NP:
At Lincoln High School, vandalism can result in suspension or even being expelled from school.  [Hacker recommends altering being expelled to expulsion.]
And from an exercise on p. 92:
Activities on Wednesday afternoons include fishing trips, computer training, and learning to dance.
Similarly, from Lunsford & Connors (p. 263):
The duties of the job include baby-sitting, house-cleaning, and preparation of meals.  [L&C recommend altering preparation of to preparing]

And on page 91, under "comparisons linked with than or as", she similarly disparages matching of nominal gerunds with infinitivals, in either order:
It is easier to speak in abstractions than grounding one's thoughts in reality.  [Hacker says grounding should be altered to to ground.]

Mother could not persuade me that giving is as much joy as to receive.  [Hacker says to receive should be altered to receiving.]

The first three of these sound just fine to me.  The other two are a bit more awkward, but not unacceptable, I think, and the problem with the Mother's persuasion sentence has as much to do, I think, with the enormous formality of infinitival VP subjects (To receive is a great joy) as with any failure of parallelism. 

The larger point is that it's a mistake to treat these examples together with the really awful coordinations of -ing-form and infinitival phrases. 

Then Hacker turns to the repetition (or omission) of function words in parallel constructions, instructing the student to:
9c  Repeat function words to clarify parallels.
...Although they can sometimes be omitted, include them whenever they signal parallel structures that otherwise might be missed by readers.

Think of this as the Mistinguett Rule: Go for a comma, a question mark, or an exclamation point, not for a comma, question mark, or exclamation point. The problem here is that the student is supposed to be able to judge when parallel structures might be missed by readers.  Well, if the students knew that, they wouldn't need Hacker's advice.  In the absence of such knowledge, they are in effect being urged by Hacker to always repeat function words, just to be safe.  This can make for some mighty clunky writing.

Granted, sometimes the repeated function word could be helpful, as in Hacker's example from page 91:
Many smokers try switching to a brand they find distasteful or a low tar and nicotine cigarette. [Hacker says to insert to after or.]
But on other occasions, it's not so clear that repetition is is the right strategy, as in Hacker's exercise on page 92:
Jan wanted to drive to the wine country or at least [to] Sausalito.
(Surely a comma after wine country would be enough to fix this one.)

In any case, prepositions omitted in second conjuncts are routine:
There is no longer fierce debate, for instance, about whether the earth is round or flat, and [about] whether it revolves around the sun or vice versa.  (Deutscher, p. 16)

In one case, infinitivals, Hacker's advice just goes against the grain.  Omission of to in second conjuncts is incredibly common.  Two examples from Deutscher:
... we'll have to dig beneath the surface of language and [to] expose some of its familiar aspects in an unfamiliar light.  (p. 10)

... it will be possible to synthesize all these findings into one ambitious thought-experiment, and [to] project them on to the remote past.  (p. 11)
(I hope you'll have noticed that all of the Deutscher examples come from the first few pages of this 358-page book.  This stuff is gut-easy to find.)

Finally, there are Distributivity violations of several types, mostly of the sort discussed in the faulty parallelism article in Merriam Webster's Dictionary of English Usage, where they are viewed as venial sins, if sins they are.  For instance, placement of matched coordinators like either... or, both... and, and (Hacker, p. 92) not only... but also:
During basic training, I was not only told what to do but also what to think.

The streets were not only too steep but also were too narrow for anything other than pedestrian traffic.

Some Distributivity violations arise from letting the second conjunct determine the form of the factor that follows, when the first conjunct would require a different form in the factor; that is, these violations are a species of "determination by the nearest", related to the very common phenomenon of agreement with the nearest (... my admiration for her versatility and artistry have continued to grow -- Bay Area Reporter, 5/26/05, p. 38).  A very common subtype (in real life, not just advice manuals) involves government of verb forms by auxiliaries:
Mayor Davis never has and never will accept a bribe.  (Hacker, p. 93)

I had never before and would never see such a sight.  (Lunsford & Connors, p. 263)
Another common subtype involves selection of prepositions by verbs or other head words:
Many South Pacific Islanders still believe and live by ancient rules.  (Hacker, p. 93)
I mostly judge instances of determination by the nearest to be straightforwardly ungrammatical, in contrast to misplaced matched coordinators, many examples of which I accept without question.  In any case, determination by the nearest is, in general, a more serious violation than misplaced matched coordinators, so that it's a mistake for advice manuals to just lump them together.

Both handbooks identify the problem in determination by the nearest as omission of words: "Add words needed to complete compound structures" (Hacker, p. 93); "including all necessary words" (Lunsford & Connors, p. 263).  Now, adding words is the fix for the problem, but the advice to add words as necessary is deeply unhelpful.  Students have to figure out when these extra words are required, what they are, and where they go, and for this they need much more specific advice.

In any case, Distributivity is a principle of grammar (concerning well-formedness), not of rhetoric (concerning effectiveness), while the Mistinguett Rule is a principle of rhetoric, not grammar.  In both cases, you can argue about when the principles apply and when they do not, but they're principles of very different sorts, and it's a mistake to think of them simply as species of "faulty parallelism".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:18 PM

They have ears, but they hear not

Well, I haven't gotten any answer about the bet. If I can't make the point with respect to Bob Stacy's voice, I'll have to make do with the voices of Tennyson, Yeats, HD, Churchill, Plath and Frost. But first, let me set the stage.

The tension between tradition and innovation raises valid and important questions, in language as in other areas of human culture. In linguistic matters, however, the people who are most passionately concerned with these questions are often deeply and bizarrely confused about what the traditions and innovations really are.

I invite you to imagine a concerned father who posts something like the following, to one of the many web forums dealing with wedding arrangements:

I'm worried about the institution of marriage. I like to see a bride wearing the traditional hollowed-out pumpkin on her head, not one of those new-fangled gauze veils. The groom should be dressed in a cloak of bright yellow felt, as of old. The ring-bearer should carry the ring scotch-taped to his upper lip, not resting on a cushion.

This would be easier if I could provide diagrams.

I know it's not always necessary or appropriate, but I feel that some ceremonies are sacred. This really hit home when my daughter came home telling me that her wedding advisor said that if you wear a hollowed-out pumpkin on your head, it looks stupid.

Is it time for me to give up on what I've learned about how a wedding should be performed?

What do you think would happen?

For my part, I hope and expect that the forum moderator would quickly and quietly delete this painful display of hallucinatory crankiness. If not, I hope that most of the other forum participants would take it as a bad joke, and politely ignore it. Those who responded would say things like "um, have you checked the level of your medications recently?" or "a hollowed-out pumpkin? Geez, what planet are you from?"

It's inconceivable that such a post would generate hundreds of responses along the lines of "It's not a lost cause as long as we keep fighting"; "Most people don't know any better. You clearly do. Live by example."; "There is a balance between upholding ceremonial tradition and having such specific standards that you can't possibly hope to be anything but disappointed. The gripes expressed in the question are incredibly minor, and yeah, you should probably let go a bit. Fight for something interesting and substantial like limiting tuxedos to weddings held after 6:00 p.m."; "The wedding ceremony isn't like a ripe apple which will rot if left on the windowsill. It doesn't 'deteriorate.' It changes. And it has been changing for a long time and will continue to change. And no single point in that history will be the 'best' point."

These positions are all reasonable ones, in the abstract, but adopting them in response to a complaint about the decline in bridal head-pumpkins could only happen in a play by a slightly second-rate surrealist. Unfortunately, this level of discourse is the everyday norm in matters of language. I'm serious -- please go check it out.

The problem seems to be that everyone thinks that everyone can plainly see what the facts about the sound, form and meaning of language are, since after all, everyone can speak and understand. On the contrary, untrained perceptions about language seem to be roughly as accurate as untrained perceptions about chemistry and physics are, and perhaps even more subject to effects of suggestion. Everyone can see what brides wear on their heads; but almost no one, apparently, can hear what people say.

Let's take the recent MetaFilter thread on the prounciation of the and a as an example. The premise, announced by snsranch, is that the correct pronunciation of these words is [ði] and [ei], rhyming with "me" and "bay", and that the reduced forms [ðə] and [ə] are a modern degeneracy that traditionalists should resist.

Without making any judgment whatsoever on the value of adherence to tradition, everyone should immediately see that this premise is completely false. As far as I know, all standard versions of English, in formal as well as informal registers, normally render the and a as [ðə] and [ə] whenever a word beginning with a consonant follows. When a vowel-initial word follows, the standard practice is to use a higher, more [i]-like vowel in the definite article (which is an instance of a more general phenomenon known as vowel-before-vowel tensing), and to use the form "an" for the indefinite article. Emphatic forms -- for example in case of contrastive stress, or a particularly strong emphasis on every word of a phrase -- are [ði] and [ei] before consonant-initial words, but these are rare.

This pattern is not some sort of new-fangled emblem of rebellious youth. I don't know the deep history of the pronunciation of the articles in English -- at some point they must merge with the common ancestor of German die and ein -- but I believe that in living memory, and in the memory of the parents and grandparents of those now living, all standard versions of both American and British English have followed the pattern described in the previous paragraph.

We have direct evidence of this, in the form of recordings going back more than a hundred years.

Here is Tennyson, reading the opening of The Charge of the Light Brigade:

Half a league, half a league,
Half a league onward,

Listen to it!

There are three instances of the indefinite article "a", and every one of them is the normal reduced form [ə]. If Tennyson had read this passage with the emphatic form [ei], his listeners would have thought him insane, or at least deeply eccentric.

Now here is Yeats, discussing why he reads his poems in a style that some people find artificial:

I remember the great English poet, William Morris, coming in a rage out of some lecture hall, where somebody had recited a passage out of his Sigurd the Volsung. "It gave me a devil of a lot of trouble", said Morris, "to get that thing into verse". It gave *me* a devil of a lot of trouble to get into verse the poems that I am going to read, and that is why I will not read them as if they were prose.

There are three the's and six a's in this passage. Please listen to them and ask yourself how they are pronounced.

Here is the American poet HD, reading a few lines from Helen in Egypt. Along with Ezra Pound and William Carlos Williams, she invented modern poetry in 1905 in Philadelphia.

I drew out a blackened stick,
but he snatched it,
he flung it back.

"what sort of enchantment is this?
what art will you wield with a fagot?
are you Hecate? are you a witch?

a vulture, a hieroglyph,
the sign or the names of god?"

Again, please listen to the two the's and five a's. HD reads in a very formal style, with more British influence than might be expected for someone born in Bethlehem, PA, in 1886 -- but she uses the normal reduced pronunciations of those seven words.

Here is a passage from Churchill's famous "fight on the beaches" speech.

We shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air. We shall defend our island, whatever the cost may be. We shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills. We shall never surrender! And if, which I do not for a moment believe, this island or a large part of it were subjugated and starving, then our empire beyond the seas, armed and guarded by the British fleet, would carry on the struggle; until in God's good time, the new world, with all its power and might, steps forth to the rescue and the liberation of the old.

Listen to Churchill, and then listen again, committing the linguistic sacrilege of imagining that he had pronounced all his 15 the's as [ði] (rather than just the two pre-vocalic instances "the air" and "the old") and his two a's as [ei]. Can anyone believe that this would have been a rhetorical improvement, rather than an unexpected, unnatural and bizarre intrusion?

Here is Sylvia Plath reading three tender lines from her poem Daddy:

Every woman adores a Fascist,
The boot in the face, the brute
Brute heart of a brute like you.

Here is Robert Frost:

Two roads diverged in a wood, and I--
I took the one less traveled by,
And that has made all the difference.

Using the emphatic pronunciations of the and a in these poems would not only sound bizarre and unnatural, it would also would also spoil the rhythm.

It's a bit depressing that so few people ever pay careful attention to the language that they've heard all their lives. It's nothing short of outrageous that those who can't spare the time to listen to how people talk are nevertheless so happy to carry on at length in public about it.

This is not an appeal for deference to experts. On the contrary, I'm suggesting that the many people who are so passionately interested in speech and language should be offered a basic education in the methods of linguistic description and the habits of analytic observation, so that they can explore and express their passion in an informed way.

Posted by Mark Liberman at 09:53 AM

The lethal laugher

Lameen Souag at Jabal al-Lughat has posted a fascinating analysis of the language of the (Arabic) web forum post claiming responsibility for the London bombings, starting with this:

The first interesting thing about this statement is the bizarre phrasing of its opening: والصلاة والسلام على الضحوك القتال سيدنا محمد صلى الله عليه وسلم. The Guardian renders this as "may peace be upon the cheerful one and undaunted fighter, Prophet Muhammad, God's peace be upon him." The doubling of "peace be upon him" (a formula added to the prophet's name as a matter of course) is unusual [because of its redundancy] and stylistically flawed, suggesting an imperfect command of Arabic literary style. The phrase الضحوك القتال (ad-Ḍaḥûk al-Qattâl), rendered by the Guardian as "the cheerful one and undaunted fighter", is composed of two words in apposition which Hans Wehr's dictionary renders as "frequently, or constantly, laughing; laugher" and "murderous, deadly, lethal". This extremely unusual epithet is so weird that at first sight I assumed it must be some kind of prank; it may potentially provide some clues to the identity of the killers.

Other LL posts on analyses and translations of the same message here and here.

Update: and also see Tony Badran's analysis of the associations of the references to "heroes" and "arabism", which disagrees with the conclusions of Juan Cole and Shibli Zaman, to whom Lameen links.

Posted by Mark Liberman at 07:15 AM

July 09, 2005

Robin Cook tries his hand at linguistic analysis of Arabic

Robin Cook, former British foreign secretary, has recently published a bit of political linguistics, or perhaps linguistic politics, in the Guardian:

Bin Laden was, though, a product of a monumental miscalculation by western security agencies. Throughout the 80s he was armed by the CIA and funded by the Saudis to wage jihad against the Russian occupation of Afghanistan. Al-Qaida, literally "the database", was originally the computer file of the thousands of mujahideen who were recruited and trained with help from the CIA to defeat the Russians. Inexplicably, and with disastrous consequences, it never appears to have occurred to Washington that once Russia was out of the way, Bin Laden's organisation would turn its attention to the west.

But as Tim Buckwalter explained here yesterday,

The word "qa'idah" ("qa'idat" in construct cases) is a neutral word meaning "base," as in "military base" (qa'idah 'askariyah) or "database" (qa'idat bayanat), and it collocates naturally and frequently with "jihad." So, al-Qa'idah is really "the Base for Waging Jihad," or "the Base" for short.

The full, and highly frequent, Arabic phrase for "al-Qa'idah in Iraq" is "tanzim qa'idat al-jihad fi bilad al-rafidain," or "(the) organization (of the) base (of) the-jihad in (the) lands (of) the-two_tributaries" (i.e., the Tigris and Euphrates).

So al-qa'ida doesn't mean "the database" any more (or any less) than English base means "database". I should think that Cook would be in a position to get better linguistic advice, if he wanted it.

Posted by Mark Liberman at 10:35 PM

What a difference a comma makes

On July 2, The Economist said that insecticide-impregnated mosquito bednets "cut the risk of infants dying by 14%, to 63%." The truth (as explained in a correction on p. 19 of the July 9 edition) is that the risk is cut by 14% to 63%. Quite a difference. Don't underestimate commas.

Posted by Geoffrey K. Pullum at 06:21 PM

Internal inconsistency

I should have remembered the lessons from Mark's discussion late last month and earlier this month about journalists and quotations -- just as Mark figured, an actual transcript of what Bush really said at the Gleneagles G8 knocks some of the air out of my accusation that Bush may have Freudian-slipped.

And reading a bit more of Friday's NYT today, I see that not even the same section of the same newspaper can get the same quote right.

First: according to the transcript Mark made, here's what Bush actually said immediately after the quote I originally cited:

the contrast couldn't be clearer between the intentions uh and the uh hearts of those of us who care deeply about human rights ((and)) human liberty and those who kill those who got such evil in their heart that they uh they will uh take the lives of innocent folks uh this is uh the war on terror goes on

In the same Alan Cowell story that I cited, this passage is rendered like this:

"The contrast couldn't be clearer between the intentions and the hearts of those who care deeply about human rights and human liberty, and those who kill, those who've got such evil in their hearts that they will take the lives of innocent folks." Mr. Bush said. "The war on terror goes on."

In this Richard W. Steveson story, the same passage is rendered like this:

"The contrast couldn't be clearer between the intentions and the hearts of those of us who care deeply about human rights and human liberty, and those who kill, those who've got such evil in their hearts that they will take the lives of innocent folks," Mr. Bush said. "The war on terror goes on."

The difference is just in the absence vs. presence of those two bolded words in the second quote: those of us (and in the period vs. comma before Mr. Bush said, but I'll assume that's just a typo in the first case). Presumably the same editor is responsible for these two stories, both in the International section, separated from each other by two pages in the paper edition (A11 and A13). In both cases, the passage was a single-column-width lone paragraph, so there was no need to remove of us in the first story in order to make things fit better or anything like that. And this removal makes a subtle difference: with of us included -- which is what Bush actually said -- Bush is including himself in the relevant group, especially by contrast with the bare those referring to the London bombers (and, ultimately, "the terrorists"). With of us excluded, on the other hand, Bush is made to appear to be excluding himself from both groups.

[ Comments? ]

Posted by Eric Bakovic at 04:56 PM

A small wager

To: Bob Stacy
Subject: Pronunciation of English articles

In a post on "Ask MetaFilter" today, you wrote:

I'm worried about speach. I like to say "A car." instead of "Uh car." "The car." instead of "Thuh car." I say "because" and not "becuz".

This would be easier if I could illustrate the long and short vowels.

I know it's not always necessary or appropriate, but when I'm reading to a group of kids I like to sound like I know what I'm doing. It really hit home when my 1st grader came home telling me that his teacher said that if you say "the" and it sounds like "thee", that's stupid.

Is it time for me to give up on what I've learned about speaking and writing words meant to be spoken?

To start with, I can offer a bit of assistance: the International Phonetic Alphabet allows you to "illustrate the long and short vowels", and other aspects of pronunciation that you might want to discuss. There are several IPA tutorials available on line. After offering this modest token of sincere concern, however, I can't avoid being somewhat rude. The problem is, I don't believe you.

Your MetaFilter profile says that you are a "disgruntled civil servant" who lives in San Diego and "used to have a small ranch ... in the middle of a suburban neighborhood". Based on this, and your name, I'm going to assume that you're a native speaker of American English. If this is true, and if you really always pronounce the definite article the as [ði], and the indefinite article a as [ei], your version of our native language is so strangely distorted that the case deserves clinical documentation.

I very much doubt that this is true, however. It's much more likely that your belief about how you talk is entirely at variance with the way that you actually talk. This is a very common phenomenon, and nothing to be ashamed of.

In either case, there's an easy way to settle the matter. Let's have a telephone conversation, about some topic of mutual interest. For example, my late (and much beloved) maternal aunt lived in San Diego, and I first visited her family there about 50 years ago, so we could discuss how much the San Diego area has changed over the past half-century. Or, if you prefer, we could discuss language standards, pronunciation norms and so on. We'll record the call, and I'll transcribe a few random bits of it. This will enable us to determine whether your pronunciation really matches your beliefs about your pronunciation. The fact that you're aware of this agenda should give you the best possible chance to support your beliefs. If you do, I'll undertake to buy you dinner, at a restaurant of your choice, the next time I'm in San Diego or you're in Philadelphia.

If you want to take up the challenge, you can reach me here.

[Update: more here.]

Posted by Mark Liberman at 12:26 PM

Linguists beware

In a previous post, I noted that in the reports of President Bush's remarks at Gleneagles on the events of 7/7, there are roughly as many different versions of the quotes as there are sources. I focused on a particular phrase that Eric Bakovic found politically revealing. My sample of journalistic versions of that phrase:

New York Times: you have people working to alleviate poverty and rid the world of the pandemic of AIDS and ways to have a clean environment
Financial Times: we are working on solving the pandemic of Aids and ways to have a clean environment
Cox News Service: we have people here who are working to alleviate poverty, to help rid the world of the pandemic of AIDS, working on ways to have a clean environment
Ireland On-Line: you’ve got people here who are working to alleviate poverty, help rid the world of the pandemic of Aids, and are working on ways to have a clean environment
Australian Broadcasting Corporation: you've got people here who are working to alleviate poverty, to help rid the world of the pandemic of AIDS, working on ways to have a clean environment
White House: we have people here who are working to alleviate poverty, to help rid the world of the pandemic of AIDS, working on ways to have a clean environment

I found an audio clip of the statement on CSPAN, and was not surprised to learn that all of these versions are wrong.

My transcription of the relevant phrases

you got people here who are workin' to alleviate poverty
and uh help uh rid the world of the pandemic of AIDS
and they're workin' on ways to have a clean environment

Normalizing the morphology, eliminating disfluencies and adding standard punctuation -- as I'd expect a journalist to do -- we get

you've got people here who are working to alleviate poverty and help rid the world of the pandemic of AIDS, and they're working on ways to have a clean environment

None of the sources got this right, though the Financial Times was the worst, with the New York Times in second-worst position. We can line up the NYT (blue) with the truth (red) as follows


you    have   people               working to alleviate poverty
you've got    people here who are  working to alleviate poverty

and      rid the world of the pandemic of AIDS 
and help rid the world of the pandemic of AIDS

and                    ways to have a clean environment
and they're working on ways to have a clean environment

If Alan Cowell (the NYT reporter) were a speech recognition system, I'd score him as having 2 substitutions and 7 deletions in a phrase with 30 lexical tokens, for a WER ("word error rate") of 9/30 = 30%.

The content of Bush's remarks is preserved, in my opinion, but the form is pretty seriously mangled. Linguists -- and Jacob Weisberg -- beware!

For the record, here's my transcript of the whole statement. I've put in a newline wherever there was a silent pause.


spent some time recently with the Prime Minister
uh Tony Blair
had an opportunity to express our
heartfelt condolences
to the
people of London
people who lost lives
I appreciate uh Prime Minister Blair's
steadfast determination
and strength
he's on his way now
uh to London
((give)) from the G8 to speak directly to the people *of* London
he'll carry a message of solidarity with him
((uh)) this morning I
been in contact with our homeland security folks
I instructed them to be in touch with
((uh)) local and state officials
uh ((uh)) about
the facts of what took place here and in London
uh and uh
to uh
be extra vigilant
uh as uh our folks
uh start heading to work
you know the contrast uh
between what
we've seen on the TV screens here
uh what's taken place in London
what's taking place here is
incredibly vivid to me
on the one hand
you got people here who are workin' to alleviate poverty
and uh
help uh
rid the world of the pandemic of AIDS
and they're workin' on
ways to have a clean environment
and on the other hand you got people killin' innocent people
the contrast couldn't be clearer
between the intentions
uh and the uh hearts
of those of us who
care deeply about human rights ((and))
human liberty
and those who kill
those who got such evil in their heart
that they uh
they will uh
take the lives of innocent folks
uh this is uh the war on terror goes on
I was much impressed by the resolve of the
all the leaders in the room
uh their resolve is as strong as *my* resolve
and that is we will
uh not yield to these people
will not yield to the terrorists
we will find them, we will bring them to justice
and at the same time we will spread uh
uh an ideology of
hope
and compassion
that will overwhelm their ideology of hate
thank you very much

 

Posted by Mark Liberman at 11:29 AM

Quotes from journalistic sources: unsafe at any speed

Eric Bakovic has just added a particularly cute example to the inventory of Bushisms. But before aligning the syntax and politics of President Bush's remarks on poverty, AIDS and the environment, we need to ask another question: what did he actually say? As Linda Seebach explained, commenting on posts discussing the liberties that reporters take with quotations from interviews,

...most reporters don't have transcripts; we have notebooks.

Eric Bakovic started from a story by Alan Cowell in the New York Times, July 8, 2005, according to which

In Gleneagles, Mr. Bush drew the comparison between the aims of the summit and the bombers.

"On the one hand, you have people working to alleviate poverty and rid the world of the pandemic of AIDS and ways to have a clean environment and, on the other hand, you have people working to kill people," he said.

"The contrast couldn't be clearer between the intentions and the hearts of those who care deeply about human rights and human liberty, and those who kill, those who've got such evil in their hearts that they will take the lives of innocent folks." Mr. Bush said. [emphasis added]

Eric's point had to do with the apparent incongruity of the three phrases in the italicized conjunction. But if we poke around on Google News, we can find many other different versions of the president's remarks, offering several different solutions to the conjunction problem. According to a story by Caroline Daniel in the Financial Times, 14:10 GMT July 7, Bush

said he had been in contact with "our folks to get in touch with local and state officials about the facts of what took place in London and the need to be extra vigilant as our folks start heading to work. The contrast between what we see on the television screens and what is taking place here is incredibly vivid to me...we are working on solving the pandemic of Aids and ways to have a clean environment and you have people killing innocent people. The difference between intentions and hearts between those who care about human liberty and those who kill could not be clearer." [emphasis added]

According to a story by Don Melvin carried by the Cox News Service, July 8, 2005:

Shortly after noon, Bush stood grim-faced amid a phalanx of national leaders who silently expressed solidarity with Britain as Blair read a joint statement on behalf of them all, vowing that the terrorists would never win.

Afterward, Bush spoke to reporters again – this time in a much more somber vein.

"The contrast between what we've seen on the TV screens here, what's taken place in London and what's taking place here, is incredibly vivid to me," he said. "On the one hand, we have people here who are working to alleviate poverty, to help rid the world of the pandemic of AIDS, working on ways to have a clean environment. And on the other hand, you've got people killing innocent people. And the contrast couldn't be clearer between the intentions and the hearts of those of us who care deeply about human rights and human liberty, and those who kill — those who have got such evil in their heart that they will take the lives of innocent folks." [emphasis added]

According to the version carried by Ireland On-Line, datelined 7/7/2005- 14:07:25

He said: “The contrast between what we see on the TV screens and what’s taking place here is incredibly vivid to me.

“On the one hand, you’ve got people here who are working to alleviate poverty, help rid the world of the pandemic of Aids, and are working on ways to have a clean environment, and on the other hand you’ve got people killing.

“The contrast couldn’t be clearer between the intentions and the hearts of those of us who care deeply about human rights and human liberties and those who kill, who have such evil in their hearts, they will take the lives of innocent folks.” [emphasis added]

The Australian Broadcasting Corporation offers this "TV program transcript", based on a broadcast 8/7/2005:

GEORGE W.BUSH, US PRESIDENT: I have an opportunity to express our heart-felt condolences to the people of London. People who lost lives, I appreciate PM Blair's steadfast determination, his strength. He's on his way now to London from the G8 to speak directly to the people of London to carry a message of solidarity with him.

This morning I've been in contact with our homeland security folks and I instructed them to be in touch with local and state officials about the facts of what took place here and in London and to be extra vigilant as our folks start heading to work. The contrast between what we've seen on the TV screens here, what's taken place in London and what's taken place here is incredibly vivid to me.

On the one hand you've got people here who are working to alleviate poverty, to help rid the world of the pandemic of AIDS, working on ways to have a clean environment and on the other hand you've got people killing innocent people. The contrast couldn't be clearer between the intentions and the hearts of those of us who care deeply about human rights and human liberty and those who kill, those who've got such evil in their heart that they will take the lives of innocent folks.

The war on terror goes on. I was most impressed by the resolve of all the leaders in the room. Their resolve is as strong as my resolve and that is we will not yield to these people. We will not yield to the terrorists. We will find them, we will bring them to justice and at the same time we will spread an ideology of hope and compassion that will overwhelm their ideology of hate. Thank you very much.

[emphasis added]

There is yet another version on the White House web site:

PRESIDENT BUSH: I spent some time recently with the Prime Minister, Tony Blair, and had an opportunity to express our heartfelt condolences to the people of London, people who lost lives. I appreciate Prime Minister Blair's steadfast determination and his strength. He's on his way now to London here from the G8 to speak directly to the people of London. He'll carry a message of solidarity with him.

This morning I have been in contact with our Homeland Security folks. I instructed them to be in touch with local and state officials about the facts of what took place here and in London, and to be extra vigilant, as our folks start heading to work.

The contrast between what we've seen on the TV screens here, what's taken place in London and what's taking place here is incredibly vivid to me. On the one hand, we have people here who are working to alleviate poverty, to help rid the world of the pandemic of AIDS, working on ways to have a clean environment. And on the other hand, you've got people killing innocent people. And the contrast couldn't be clearer between the intentions and the hearts of those of us who care deeply about human rights and human liberty, and those who kill -- those who have got such evil in their heart that they will take the lives of innocent folks.

The war on terror goes on. I was most impressed by the resolve of all the leaders in the room. Their resolve is as strong as my resolve. And that is we will not yield to these people, will not yield to the terrorists. We will find them, we will bring them to justice, and at the same time, we will spread an ideology of hope and compassion that will overwhelm their ideology of hate.

Thank you very much.

After these six versions, no two of them the same, I've lost patience with the process, but I haven't run out of alternative versions available from journalists on the web. I imagine that further research could turn up several dozen other variants.

Based on previous experience, my guess is that the ABC transcript is the most accurate of those I've cited, though it also might be tidied up a bit. If I can find the audio clip on the web somewhere, I'll make a correct transcript for comparison. Meanwhile, if anyone is interested in a small wager on the outcome, I'll be happy to put a few dollars behind the view that the NYT got it wrong. And not because Alan Cowell is any worse than the rest of the press corps -- when it comes to quotes, none of them tell the truth.

Posted by Mark Liberman at 09:09 AM

July 08, 2005

Maybe not so WTF after all

Here's another grammatically interesting quote from a story in today's NYT:

In Gleneagles, Mr. Bush drew the comparison between the aims of the summit and the bombers.

"On the one hand, you have people working to alleviate poverty and rid the world of the pandemic of AIDS and ways to have a clean environment and, on the other hand, you have people working to kill people," he said.

Take a close look at the three things that Bush says people are doing at the G8 in Gleneagles.

The first two appear to be conjoined like this:

1. working to [ alleviate poverty ] and [ rid the world of the pandemic of AIDS ]

But then where does the third one fit? I think the only grammatical attachments are:

2. rid [ the world of the pandemic of AIDS ] and [ ways to have a clean environment ]

3. rid the world of [ the pandemic of AIDS ] and [ ways to have a clean environment ]

We might of course assume that what Bush meant to say was something like either of the following, each filling in a word that Bush hypothetically missed:

4. working [ to alleviate poverty, etc. ] and [ on ways to have a clean environment ]

5. working to [ alleviate poverty, etc. ] and [ find ways to have a clean environment ]

Given the Bush Administration's track record on environmental issues, I'm inclined to think Bush spoke grammatically. This time.

[ Comments? ]

Posted by Eric Bakovic at 11:48 PM

Disclosing classified information, salva veritate

The Valerie Plame story is all about referential opacity and felicity conditions for speech acts and other issues in philosophy of language, it seems to me.

Here's one example among many. Michael Isikoff tells us that Karl Rove's lawyer Robert Luskin

told NEWSWEEK that Rove "never knowingly disclosed classified information" and that "he did not tell any reporter that Valerie Plame worked for the CIA." Luskin declined, however, to discuss any other details.

As a non-philosopher blogger has observed,

"[H]e did not tell any reporter that Valerie Plame worked for the CIA", may be a perfect non-denial denial - did Rove say, for example, that Joe Wilson's wife worked for the CIA, but omit her name (which was available on the internet as part of Ambassador Wilson's on-line bio, now long gone)?

There's also the question of whether confirming a rumor is "disclosing" the information involved; and if you tell someone something that you think they should already know, have you "knowingly disclosed" it? Practically every information-bearing statement from everyone associated with the case requires this level of exactness in construal. But the philosophers are falling down on the job: we're not getting an analysis from Brian Weatherson or Brian Leiter or Matt Weiner or Kai von Fintel -- or anybody else with a union card in semantics or philosophy of language.

Posted by Mark Liberman at 10:04 PM

It had to happen sometime

Reading through the coverage of the bombings in London in today's New York Times -- and trying to distract myself from the continuing coverage of Hurricane Dennis on television while I'm "on vacation" on the Southwest coast (a.k.a. "the wrong coast") of Florida -- I happened upon the following in this story:

Online photo-sharing sites and Web blogs began chronicling the attacks soon after they occurred, posting material often gathered before professional news organizations arrived on the scenes.

Web blogs? All I could think was: Wow, that was quick.

Posted by Eric Bakovic at 06:16 PM

More on tanzim qa'idat al-jihad etc., etc.

I asked Tim Buckwalter about the Arabic name of the organization claiming responsibility for the London transit bombings, which was given in transliteration in some news reports as Jama'at al-Tanzim al-Sirri, Tanzim Qa'idat al-Jihad fi Urupa, and in English translation as dozens of different forms, from "the Secret Organisation Group of al-Qaida of Jihad Organisation in Europe" to "the Secret Cell of al Qaeda of Jihad Group in Europe" and "the Secret Organization of Al-Qaeda's Jihad in Europe".

Tim observed that the various translations are tragicomically reminiscent of Garrison Keillor's National Association of Organizations, aka National Organization of Associations; or the proliferation of permutationally-named political sects in Life of Brian (the Judean People's Front, the People's Front of Judea, the Judean Popular People's Front, and so on). More seriously, Tim explained the Arabic name and explored some of its collocational associations.

I monitor some 50 [Arabic] websites daily and I was surprised to find only one citation in which al-Qa'idah and "sirri" (secret) are mentioned in close proximity, and it came from the Reuters Arabic website. The phrase is "al-tanzim al-sirri - tanzim qa'idat al-jihad fi urubba." The literal word-for-word translation is "the organization the-secret - (the) organization (of the) base (of) the-jihad in Europe.

More idiomatically: The Secret Organization - The Organization of Qa'idat al-Jihad in Europe. The word "qa'idah" ("qa'idat" in construct cases) is a neutral word meaning "base," as in "military base" (qa'idah 'askariyah) or "database" (qa'idat bayanat), and it collocates naturally and frequently with "jihad." So, al-Qa'idah is really "the Base for Waging Jihad," or "the Base" for short.

The full, and highly frequent, Arabic phrase for "al-Qa'idah in Iraq" is "tanzim qa'idat al-jihad fi bilad al-rafidain," or "(the) organization (of the) base (of) the-jihad in (the) lands (of) the-two_tributaries" (i.e., the Tigris and Euphrates).

If the word "tanzim" is treated as part of the official name, then the word "jama'ah" ("jama'at" in construct cases) or "group" is often appended, as in "jama'at al-tanzim al-sirri," or "the Secret Organization" group.

Tim notes, however, that "a Google search for the Arabic phrase 'jama'at al-tanzim al-sirri' currently turns up zero hits".

By the way, when Tim says that qa'idah is qa'idat and jama'ah is jama'at "in construct cases", he's talking about a syntactic structure typical of semitic languages, know as the "construct state".

A term in Semitic grammar for a reduced form of a noun when it indicates a thing possessed. In European languages generally, it is the possessor who is marked by a special genitive case, and the thing possessed is untouched: the man's horse, Latin equus viri. In Semitic the word 'horse' is put in the construct state.

The construction used is schematically 'horse the-man-of'. The possession precedes the possessor (in typology this is symbolized NG, as opposed to GN), and it has no definite article on 'horse'. The horse is necessarily definite already: it is the one possessed by the noun.

When possessive relationships are nested, all but the last element are construct and all but the first are genitive: 'head horse-of the-man-of'.

Tim also sent a link to a page from the Federal Register entitled "Foreign Terrorists and Terrorist Organizations; Designation: Jam'at al Tawhid Wa'al-Jihad et al.", which is worth quoting in full as another example of the problem of variable nomenclature in this domain:

In the Matter of the Amended Designation of Jam'at al Tawhid wa'al-Jihad, also known as The Monotheism and Jihad Group, also known as the al-Zarqawi Network, also known as al-Tawhid, also known as Tanzim Qa'idat al-Jihad fi Bilad al-Rafidayn, also known as The Organization of al-Jihad's Base in Iraq, also known as The Organization of al-Jihad's Base of Operations in Iraq, also known as al-Qaida of Jihad in Iraq, also known as al-Qaida in Iraq, also known as al-Qaida in Mesopotamia, also known as al-Qaida in the Land of the Two Rivers, also known as al-Qaida of the Jihad in the Land of the Two Rivers, also known as al-Qaida of Jihad Organization in the Land of the Two Rivers, also known as al-Qaida Group of Jihad in Iraq, also known as al-Qaida Group of Jihad in the Land of the Two Rivers, also known as The Organization of Jihad's Base in the Country of the Two Rivers, also known as The Organization Base of Jihad/Country of the Two Rivers, also known as The Organization of al-Jihad's Base in the Land of the Two Rivers, also known as The Organization Base of Jihad/Mesopotamia, also known as The Organization of al-Jihad's Base of Operations in the Land of the Two Rivers, also known as Tanzeem qa'idat al Jihad/Bilad al Raafidaini, as a Foreign Terrorist Organization pursuant to Section 219 of the Immigration and Nationality Act.

Based upon a review of the administrative record assembled in this matter, and in consultation with the Attorney General and the Secretary of the Treasury, the Deputy Secretary of State has concluded that there is a sufficient factual basis to find that Jam'at al Tawhid wa'al- Jihad, also known as the Zarqawi Network and other aliases, has changed its name to Tanzim Qa'idat al-Jihad fi Bilad al-Rafidayn, and that the relevant circumstances in section 219(a)(1) of the Immigration and Nationality Act, as amended (8U.S.C. 1189(a)(1)) still exist with respect to that organization.

Therefore, effective upon the date of publication in the Federal Register, the Deputy Secretary of State hereby amends the 2004 designation of that organization as a foreign terrorist organization, pursuant to Sec. 219(a)(4)(B) of the INA (8 U.S.C. 1189(a)(4)(B)), to include the following new names:

Tanzim Qa'idat al-Jihad fi Bilad al-Rafidayn,
The Organization of al-Jihad's Base in Iraq,
The Organization of al-Jihad's Base of Operations in Iraq,
al-Qaida of Jihad in Iraq,
al-Qaida in Iraq,
al-Qaida in Mesopotamia,
al-Qaida in the Land of the Two Rivers,
al-Qaida of the Jihad in the Land of the Two Rivers,
al-Qaida of Jihad Organization in the Land of the Two Rivers,
al-Qaida Group of Jihad in Iraq,
al-Qaida Group of Jihad in the Land of the Two Rivers,
The Organization of Jihad's Base in the Country of the Two Rivers,
The Organization Base of Jihad/Country of the Two Rivers,
The Organization of al-Jihad's Base in the Land of the Two Rivers,
The Organization Base of Jihad/Mesopotamia,
The Organization of al-Jihad's Base of Operations in the Land of the TwoRivers,
Tanzeem qa'idat al Jihad/Bilad al Raafidaini.

 

[Update: Karen Davis sent in a comment from an Arabic linguist friend:

"A very interesting analysis of the difficulties of translating Arabic into English. The only comment I would make is that al-tanzim al-sirri is not in construct with tanzin qa'idat al -jihad fi Urubba, but in apposition with it. And I might translate jama'at as people. It does mean group but generally refers to a group of people. So then it would be "The people of the secret organization, Al-Qaeda's Jihad Organization in Europe" That's my two cents."

]

Posted by Mark Liberman at 05:41 PM

Copy-editing terrorism

This morning, the Guardian reprinted "the full text of the statement claiming responsibility [for the bombings in London] from the Secret Organisation Group of al-Qaida of Jihad Organisation in Europe".

Their version of this statement reads:

In the name of God, the merciful, the compassionate, may peace be upon the cheerful one and undaunted fighter, Prophet Muhammad, God's peace be upon him.

Nation of Islam and Arab nation: Rejoice for it is time to take revenge against the British Zionist Crusader government in retaliation for the massacres Britain is committing in Iraq and Afghanistan. The heroic mujahideen have carried out a blessed raid in London. Britain is now burning with fear, terror and panic in its northern, southern, eastern, and western quarters.

We have repeatedly warned the British government and people. We have fulfilled our promise and carried out our blessed military raid in Britain after our mujahideen exerted strenuous efforts over a long period of time to ensure the success of the raid.

We continue to warn the governments of Denmark and Italy and all the Crusader governments that they will be punished in the same way if they do not withdraw their troops from Iraq and Afghanistan. He who warns is excused.

God says: "You who believe: If ye will aid [the cause of] Allah, He will aid you, and plant your feet firmly."

The Guardian fails to note that this is a translation from the Arabic. And the curiously redundant name attributed to the organization claiming responsiblity -- "the Secret Organisation Group of al-Qaida of Jihad Organisation in Europe" -- is only one of many variant translations of what must be a single web occurrence of a single Arabic phrase. Elsewhere in the news this morning, we can find references to the name of this same organization, in stories citing its claim of responsibility for the London bombings, rendered as:

Al Qaeda of Jihad Organization in Europe
Al-Qaida Jihad Europe
Group of al-Qaida of Jihad Organization in Europe
Organisation for al-Qaeda Jihad Secret Organisation in Europe
Organisation of al-Qaeda Jihad in Europe
Secret Al Qaeda Jihad Organization in Europe
Secret Cell of al Qaeda of Jihad Group in Europe
Secret Group of Al Qaeda Jihad in Europe
Secret Group of al Qaeda - Jihad in Europe
Secret Group of al-Qaeda's Jihad in Europe
Secret Group of al-Qaida's Jihad in Europe
Secret Organisation -- al-Qaida in Europe
Secret Organisation Group of al-Qaida of Jihad Organisation in Europe
Secret Organisation Group of al-Qaida of Jihad in Europe
Secret Organisation of Al Qaeda in Europe
Secret Organisation of al-Qaeda in Europe
Secret Organisation of the al-Qaida Jihad in Europe
Secret Organization -- Al Qaeda Jihad in Europe
Secret Organization Al Qaeda in Europe
Secret Organization Group of al Qaeda of Jihad in Europe
Secret Organization Group of al-Qaida Jihad in Europe
Secret Organization of Al-Qaeda's Jihad in Europe
Secret Organization of Qaedat al-Jihad in Europe
Secret Organization of al-Qaida and Jihad in Europe
Secret Organization of al-Qaida in Europe
a "secret group" of Qaida al-Jihad in Europe
al-Qa'eda's Secret Organisation Group of Jihad Organisation in Europe
al-Qaeda Organization in Europe
al-Qaeda of Jihad in Europe

and many others.

The Age reprints a BBC story that gives more detail, specifically that "'Nur al-Iman' participant, identified as a 'new guest', posts to the jihadist website Al-Qal'ah (Fortress), a statement issued by 'The Secret Organization Group of Al-Qa'ida of Jihad Organization in Europe'", and offers this "translated text of the statement" which includes a transliteration of the Arabic version of the name of the group claiming responsibility:

The Secret Organization Group of Al-Qa'idah of Jihad Organization in Europe (Jama'at al-Tanzim al-Sirri, Tanzim Qa'idat al-Jihad fi Urupa) In the name of God, the Merciful, the Compassionate, may peace be upon the cheerful one and the dauntless fighter, Prophet Muhammad, God's peace be upon him.

O nation of Islam and nation of Arabism: Rejoice for it is time to take revenge from the British Zionist Crusader government in retaliation for the massacres Britain is committing in Iraq and Afghanistan.

The heroic mujahidin have carried out a blessed raid in London. Britain is now burning with fear, terror and panic in its northern, southern, eastern and western quarters.

We have repeatedly warned the British government and people. We have fulfilled our promise and carried out our blessed military raid in Britain after our mujahidin exerted strenuous efforts over a long period of time to ensure the success of the raid.

We continue to warn the governments of Denmark and Italy and all the Crusader governments that they will be punished in the same way if they do not withdraw their troops from Iraq and Afghanistan. He who warns is excused.

God says: " (O ye who believe!) If ye will aid (the cause of) Allah, He will aid you, and plant your feet firmly."

I think this is recognizably a less edited version of the same translation used by the Guardian, and contains essentially the same English version of the organization's name, which also seems to be roughly as redundant in Arabic as in the English translation. But where did all the other versions of the name in the news come from? Were they independent translations from the Arabic? Perhaps in some cases. But I suspect that copy editors (and journalists themselves) have also been busy improving the terrorists' nomenclature.

In this case, the many variant English versions of this group's name have all arisen in the course of less than a day from a single brief Arabic-language text source. It's a good example of the natural proliferation of terms of reference, even when it's entirely clear that there is a single referent, and people are trying to refer to it by name rather than by description.

The ontological puritans of the Semantic Web movement, confronted with this evidence of human nature, will be muttering "you see -- I told you so..." Those interested in practical techniques for analysis of natural language will find in this case an especially pure and concentrated example of the problem of tracking entity references across documents and languages.

[Update: more here and here.]

[Update #2: according to this 2002 article in the Guardian, Isaac Asimov's Foundation series (the first of which was first published in 1951) may have been published in Arabic under the title Al-Qa'ida, and may have inspired Osama bin Laden to think of himself as Hari Seldon. At least, this is the claim of a Russian named Dmitri Gusev. However, scholarly searches have failed to find any evidence that an Arabic translation of Asimov's works ever existed.]

Posted by Mark Liberman at 08:01 AM

July 07, 2005

Illiterate lottery swindle spam

A staggeringly incompetent fake lottery-winnings spam email just arrived from WINNERSLOTTO@terra.es (with the Content-Language field showing "es", meaning Spanish, incidentally, a rather implausible setting for a supposedly official announcement from a British national lottery agency). Notice that not even the Subject field is free of linguistic mistakes   they only managed to type the first 8 characters correctly):

From: WINNERSLOTTO 
Subject: Congratualtion/Award anotification

UK NATIONAL LOTTERY
Support Centre
Bevan House
51 Bevan Avenue
Conwy LL28 5AF
United Kingdom

FROM:THE DESK OF THE PROMOTIONS MANAGER,
INTERNATIONAL PROMOTION/PRIZE AWARD DEPARTMENT,

REF: UNL/26510460037/02
BATCH: 24/00319/IPD

ATTENTION:Winnwers
RE/AWARD NOTIFICATION;FINAL NOTICE
We are pleased to inform you of the announcement
today, 5th July 2005, of winners of the UK NATIONAL
LOTTERY,THE UNITED KINGDOM INTERNATIONAL PROGRAMS
held on 30th July  2005 in Croydon,London.

You email address was attached to ticket number
023-0148-790-459, with serial number 5073-11 drew
the lucky numbers 43-11-44-37-10-43, and consequently
won you the lottery in the 1st category.
You have therefore been approved for a lump sum pay
of £100,000,000 british pounds in cash credited to file REF NO.
UNL/26510460037/02 . This is from total prize money of £500,00,000.00
(GBP) shared among the ten international winners in this category.

All participants were selected randomly from World Wide Website through
computer draw system and extracted from over 100,000 companies from
Austraaalia,NewZealand,America,Europe,
NorthAmerica,Africa and Asia as part of International Promotions
Program, which is conducted annually. Please note that your lucky
winning number falls within our European booklet representative office
in Europe as indicated in your play coupon. In view of this, your
£100,000,000 british pounds would be released to you by our preferred
payment center in London.

Our agent will immediately commence the process to facilitate the
release of your funds as soon as you contact him.For security reasons,
you are advised to keep your winning information confidential till your
claims is processed and your money remitted to you in whatever manner
you deem fit to claim your prize. This is part of our precautionary
measure to avoid double claiming a nd unwarranted abuse of this program
by some unscrupulous elements.Please be warned.

This is part of our security protocol to avoid
double claiming or unscrupulous acts by participants
of this program.
To file for your claim, please contact our fiduciary
agent;

AGENT:ALAN C BROWN
Claim agent/Payment coordinator
The UK NATIONAL LOTTERY
EMAIL : sir_alancbrown@myway.com or alancbrown1@yahoo.co.uk


UK NATIONAL LOTTERY
For due processing and remittance of your prize
money,please remember to quote your
reference and batch numbers in every one of your
correspondences with your agent.

Furthermore, should there be any
change of your address, do inform your claims agent
as soon as possible.
Congratulations again from all our staff and thank
you for being part of our promotional lottery program.

Sincerely,
Jox White Smith
Zonal Co-ordinator
www.national-lottery.com

From the Subject line on, it just gets worse. It is hard to count the errors. The word "winners" is misspelled; the second paragraph begins ungrammatically ("You email address" should begin with "Your") and gets worse (there seem to be two main verbs fighting each other like two weasels in a sack); the supposed total amount of money in the lottery ("£500,00,000.00") is not a well-formed number expression; "british" should have been capitalized; "Austraaalia" should have not quite so many a's in it (three or four is ample); numerous spaces are skipped after punctuation marks...

The sheer pig ignorance of criminals today boggles the mind. Whatever happened to the literate masterminds and gentleman jewel thieves of yesterday? Where are the brilliant confidence tricksters going to extraordinarily lengths to set up an elaborate scenario for a perfect sting? What I keep reading in the news is stories about totally dumb criminals, like the bank robber who wrote his gimme-the-money note on the back of an envelope correctly addressed to him and let the teller keep the note. The senders of the above spam are of similar intellectual caliber. Crime is apparently being taken over by complete idiots. Learn your grammar, Jox, or not many people are going to fall for the part of the caper where they hand over their bank account details. (By the way, is a British lottery administrator really likely to be called "Jox"?)

Added later: People keep mailing me to tell me smugly that I'm wrong, I didn't realize that spams contain deliberate misspellings in order to fool spam filters. But that's not it. Can anyone really believe that the people whose work is reproduced above spelled giveaway phrases like "promotional lottery" and "been approved" and "security reasons" without changes, but added extra letters to "Australia", removed the space from "New Zealand" and "North America", decapitalized "British", and wrote ungrammatical phrases like "your claims is processed", and "a lump sum pay" to try and avoid spamassassin? Nonsense. I have a better theory, one that works. The spammers were drooling idiots who slept through their English classes and are now not qualified even for crime. Old-timers will recall that I have previously written on Language Log about other evidence for the existence of spammers who can't write well enough to spam: here and here and here.

Posted by Geoffrey K. Pullum at 06:34 PM

Get 'em while they're young


Starting in 1987, Ruth Heller published a series of little books for young readers, under the series title  World of Language.  The books, delightfully illustrated by Heller, introduce topics in English grammar in brief, rhyming texts.  Of course, the content is pretty much a grade-school version of high school and college manuals.

Here's Heller on Dryden's Rule (No Stranded Prepositions), from Behind the mask: A book about prepositions (Grosset & Dunlap hardback 1991, PaperStar paperback 1998), in the book's final bit of text:


PREPOSITIONS, in this modern day
at the end of
a sentence
are sometimes okay.
So it isn't an error ... it isn't a sin
to say,
"It's the room that I was playing
in."
But those who are graced
with
impeccable taste
will insist upon saying,
"It is the room in which I was playing."

Ah, it's like CliffsNotes, only a lot cuter (and with MUCH better illustrations): the presupposition that stranded prepositions used to be absolutely banned but now can be used if you know what you're doing; the unclarity about what makes for a stranded preposition (which leads the naive and the smart-assed to think that they can fix things by adding something, like an adverbial, after the stranded preposition); the assertion that fronted prepositions are more felicitous than stranded ones.

The eagle-eyed reader will have noted that upon is not highlighted in Heller's text.  I'm entirely sure this isn't something she just overlooked. 

Back at the very beginning the kids are told, "Of PREPOSITIONS have no fear.  They help to make directions clear."  There follows a series of examples suggesting that what prepositions are for is to indicate location and direction of motion; Heller is implicitly defining prepositions by their semantics.  Eventually, she explains: "PREPOSITIONS tell you where.  They tell you how.  And when."  So upon in upon saying... doesn't count; it doesn't denote spatial location/direction, manner, or temporal location.  (Neither does the infinitive marker to in to say, though if you look it up in most dictionaries you'll find it categorized as a preposition, for historical reasons. On the other hand, the metaphorical motion in "said the spider to the fly" counts.  Go figure.)

But what of the syntax?  On the syntax Heller is pretty cagey.  She tries to convey the distinction between preposition and particle by explaining, about prepositions, "They're never alone.  They're always in phrases."  (This excludes inside, in, and around in her example "Please step inside, come in, and look around."  She doesn't rise to cases like "The staff sent up a sandwich.")  And she foreshadows Dryden's Rule early on, by allowing that occasionally prepositions don't precede their objects: "They almost always start the phrase ..." except that on occasion "at the very end they're found."  (Disastrously, to illustrate her point, she uses "The World Around" as a poetic alternative to "Around the World".)

As usual, I'm baffled as to what students -- in this case, people in the 8-12 age range -- make of any of this.

After the conceptual underpinnings of the first few pages, she moves on to specific cases and tells the kids what to do, flat out (here I abandon the bright blue highlighting):
    - into for entering, in for location;
    - be angry with a person, at a thing;
    - between for two, among for more than two;
    - different from, not different than;
    - where, not where... at or where... to;
    - near, not near to; off, not off of.

And then come a few pages about "phrasal prepositions" (in front of, etc.), after which we reach the heights of Dryden's Rule, and release.

There are also three mazes, and a "which one of these is different from the others?" puzzle.

This is a lot to cover in a book that has only 32 pages with words on them, and then mostly only a few words, sometimes just one or two.  It's a picture book, after all.

zwicky at-sign csli period stanford period edu
Posted by Arnold Zwicky at 04:53 PM

Latin quiz: everyone gets an A, especially Chris Waigl

There's no way to run a carefully monitored quiz online for the tens of thousands of readers of Language Log, is there? On serendipity, Chris Waigl rapidly posted a beautiful model answer, with careful reasoning about how one answer is probably best but another one could conceivably be argued for, and then she mailed me about it. She was the first person to submit an answer, and would have won the Dan Brown novel if I had decided to go with actual prizes. But all of you other readers could now copy from Chris's site, of course. So rather than report all of you for plagiarism for looking at it, I've decided to give you all an A. That's what grade inflation has come to. Congratulations: you are all way above average.

In fact one of you, namely Alex Smaliy at Johns Hopkins University, even found a source showing that The Economist had not been the first to think about how one might turn the familiar motto around; a scene from Stanislaw Lem's Eden welled up in his memory:

"A colony of some kind…?" the Chemist asked uncertainly. He pressed his hand to his eyes, still seeing black spots.
"E pluribus unum," the Doctor replied. "Or, rather, e uno plures, if my Latin's right. This must be the sort of multiple monster that divides in an emergency...."
"It stinks to high heaven," said the Physicist. "Let's get out of here."

To see whether the Doctor's Latin was exactly right, you go see Chris Waigl.

Posted by Geoffrey K. Pullum at 04:47 PM

Today's annoying bureaucratic noun-noun compound

I know this is like shooting fish in a barrel, but here's today's annoying bureaucratic noun-noun compound (from the city of Palo Alto), which I noticed on El Camino Real while walking back to my office from lunch. It was affixed to several newsracks on the street:

NEWSRACK ORDINANCE COMPLIANCE VIOLATION WARNING AND FIXTURE IMPOUNDMENT NOTICE; CORRECTIVE ACTION REQUIRED

A second sheet, underneath, appeared to supply details.

But what does the top sheet say that couldn't be said by "This newsrack violates city ordinances and will be removed unless the violation is fixed"? (I'd prefer "... unless you fix things", but I'm willing to admit there might be legal issues with that one.) Surely this would be sufficiently stern and formal.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:48 PM

July 06, 2005

E pluribus unum: a Latin quiz

Recently The Economist, in the course of an editorial arguing that the EC is too diverse a collection of nations to be politically unified, used a waggish subhead created by switching around the USA's motto E pluribus unum ("out of many, one"). The switched-around version said E unum pluribus. If that's grammatical at all in Latin, it has the same meaning as the original; it doesn't mean "out of one, many", though it was intended to suggest that thought to the English-speaking reader.

Several pedantic readers wrote in to say "Gotcha!" in various rather pompous terms ("As one of the few remaining undergraduates studying Latin, I fell compelled..."; "It might be a good idea for The Economist to stick to English..." "Didn't anybody in your office do classics?"). But the funny thing was that the four snooty classics experts whose letters were published all proposed different corrections! A couple of weeks later another reader wrote in to correct all of them with a fifth version. All of which makes for a nice little naturally-occurring Latin quiz. Here are twelve possible translations of "out of one, many" into Latin. Six use e for "from or out of", and six choose the variant form ex. Five of these twelve were suggested in the published letters; the other seven are just decoys. At least one is correct (possibly two are).

1.e unibus plura 7.ex unibus plura
2.e unibus plures 8.ex unibus plures
3.e unibus pluria   9.ex unibus pluria
4.e uno plura 10.ex uno plura
5.e uno plures 11.ex uno plures
6.e uno pluria 12.ex uno pluria

So which is correct? I was going to have a prize for the first correct entry received on a postcard at Language Log Plaza: first prize, a Dan Brown novel; second prize, two Dan Brown novels... But I dropped the idea when I read that the percentage of Americans who cheated at least once in high school has now reached 75% and is fast rising. I'm pretty sure lots of people would cheat (not you, of course, but other less scrupulous people): they would look up The Economist or fetch out a Latin grammar or seek out a classics librarian or get a medium to channel a dead Roman or something. So no prizes. The only reward is the satisfaction of an intellectual accomplishment. Watch this space later in the week for the answer.

Posted by Geoffrey K. Pullum at 12:45 AM

July 05, 2005

Danger! Gullible fieldworker!

Unfortunately for me, the elders I work with in my efforts to understand and help document the Montana Salish language have long since figured out that I am one of the world's most gullible people. A few weeks ago I was working with the usual congenial group of speakers -- some of the last few fluent speakers of the language -- and the word for `white man, white person' came up. The language has two words for this, the descriptive piq-sqe, literally `white person' (possibly a calque from English), and the etymologically opaque suyapi, which has the standard phonetic values for the vowels, like the vowels in English boot, pot, and beat, respectively.

This language (which is a combination of two main dialects, [Bitterroot] Salish -- also known, for mysterious reasons, as Flathead -- and Pend d'Oreille) has a regular though partly optional rule that deletes everything after the stressed vowel, as long as no semantically crucial material follows that vowel. Suyapi is stressed on the second syllable, so it often appears simply as suya. When we were discussing this word, one elder remarked, `We used to just say suya. But then someone followed a white man's tracks, and saw a yellow spot in the snow, and he said: "Hm! Suya pee!" And that's how we got the longer word, suyapi.'

I hate to admit this, but it took me a moment (with all the elders already giggling) to get it. I try not to wonder whether less obvious jokes might have made their way undetected into my dictionary files. But I don't delete the bilingual puns that I do detect from my field notes: they have their own interest for any linguist, and besides, they help make our all-day sessions a lot of fun.

Posted by Sally Thomason at 10:13 PM

All your letters are belong to us

The following Bizarro cartoon caught my eye today -- it was on a relative's fridge, and happens to be the one currently featured on the Bizarro website (from 6/21/05).

(I found it particularly interesting in the context of Bill Poser's two posts from yesterday about trademarking and copyrighting words/letters/sentences.)

[ Comments? ]

Posted by Eric Bakovic at 08:08 PM

Avoidance

Mark Liberman has pointed us to the hilariously awkward musings of  "Starcreator" on the site forum.wordreference.com about stranded prepositions, which this poster deprecates.  But if you go back to the beginning of the thread, something else interesting turns up, namely a discussion of circumstances where no choice among alternatives seems satisfactory.  When people reflect on their choices, none of them seems to work, so they claim to opt for some totally different expression.  It's not at all clear that this is the way people actually behave; quite possibly, in their unmonitored moments they use one or another or both or the alternatives.  But the act of reflection itself calls up attitudes that make both alternatives problematic.


It started with "suzzzenn" (a self-described speaker of American English, from New York), who appealed to the forum:

I could use some help with a paper I am writing. I have looked at so many examples evrything is starting to sound OK, even sentences that I know are wrong! Could native speakers give me thier judgments as to which sentences sound natural and which sound strange? I know that many of us were taught in school to never end a sentence with a preposition, but please ignore that rule for these examples! All the linguists that I have read say that there are some situations where it is possible to end a sentence with a preposition and the rule is an overgeneralization.

suzzzenn asked for judgments on a collection of sentences, which are entertaining in themselves:

1. What a curvy road we are driving on!
2. On what a curvy road we are driving!
3. On the kitchen table, the man is sitting.
4. The kitchen table, the man is sitting on.
5. He's the one who I bought it from.
6. What a dirty room the children are playing in!
7. In what a dirty room the children are playing!
  He waited for the crosstown bus.
8. For which bus did he wait?
9. Which bus did he wait for?
  She left the conference after the second lecture.
10. Which lecture did she leave the conference after?
11. After which lecture did she leave the conference?

The ensuing discussion revealed respondents all over the map: people who were generally happy with stranded prepositions, people who rigidly insisted that they were always wrong, people who said you could always go either way, people who said that sometimes there were alternatives, sometimes not.  AND people who said that neither of the alternatives -- 6 vs. 7, for instance -- were acceptable; instead, they said, they insist on something like The children are playing in a really dirty room!.  (Fronted prepositions are really hard to live with in exclamations.)

It's a conflict:  what the Wh Exclamation construction calls for vs. what Dryden's Rule (a.k.a. No Stranded Prepositions) insists on.  If you're consciously attentive to Dryden's Rule you're in a bind, and neither alternative will do.  So you feel you have to go for something else.

Probably there's no issue until you actually ask people which variant they would choose.  Probably, in real life they just do what they do.  (And, as linguists, we'd really like to know what that is.)  But when you ask, they're caught in a vise.

Another example/anecdote:  one of my graduate students innocently asked her mother whether she preferred How big a dog did you see? or How big of a dog did you see?  -- asking about the two variants of "exceptional degree modification" (EDM, on which there's a considerable literature; the most recent reference to the phenomenon in these precincts is here).  Her mother said: neither was acceptable.  One was too fancy, the other too nonstandard.  What you say is: You saw a dog; how big was it? or How big was the dog you saw? or You saw a dog that was how big? or whatever.

I doubt that in real life she avoids all variants of EDM.  But we can't ask her; we have to listen.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:54 AM

July 04, 2005

That's American

Suppose you wanted to track changes in the relative usages of syntactic variants by writers in, oh say, the past three or four decades.  You would, of course, take advantage of tagged corpora (with part-of-speech information) that have become available in the past half-century and compare rates of occurrence in earlier and later corpora.

This is just what Geoffrey Leech and Nicholas Smith (of Lancaster University) have been doing recently, looking at a collection of variables in both American and British English and using corpora made available in 1961 and 1991-92.  Overall, they find significant increases in several colloquial vs. more formal variants, with more extreme changes in American English vs. British.  On one variable, relative that vs. which, the change in American English writing has been enormous; the change in British English is in the same direction, in favor of that, but is very much smaller.

The obvious interpretation is that the forces of prescription have been winning big in the U.S., at least with respect to this one variable.  (Meanwhile, stranded prepositions, contracted auxiliaries, and contracted negatives, among other variables, have risen in frequency as against their more formal alternatives -- this despite the strictures of the advice literature on standard written English.)  American writers are, apparently, conforming more and more to what the advice books say: use that rather than which in restrictive relatives (the That Rule, a recurrent theme in the halls of Language Log Plaza, most recently discussed here).

But then a passing remark by Anne Fadiman in her valedictory editor's column in the American Scholar reminded me that there's another factor at work here, and it's hard to assess its role in these corpus statistics:  what appears in the corpora is not, exactly, what people wrote; instead, it's what got published, and in the U.S. there's an almost religious attachment to the That Rule in the editorial establishment, which intervenes between the writer's original text and the version that appears in print.

Already published is an article by Leech, "Recent grammatical change in English: data, description, theory", in K. Aijmer & B. Altenberg (eds.), Advances in Corpus Linguistics (Papers from the 23rd International Conference on English Language Research on Computerized Corpora, Göteborg 22-26 May 2002), Amsterdam: Rodopi (2004), 61-81.  Still in press is Leech & Smith, "Recent grammatical change in written English, 1961-1992: some preliminary findings of a comparison of American and British English", in Antoinette Renouf & Andrew Kehoe (eds.), The Changing Face of Corpus Linguistics, Amsterdam: Rodopi.  My summary here relies on further discussion by Leech in e-mail to Rodney Huddleston and to me.

Though restrictive vs. non-restrictive relatives are not entirely factored out, Leech did some recent quick calculations that factored out prepositional relatives (obviously a potentially important consideration, given the decline in fronted prepositions vs. stranded prepositions), and came up with, in the U.S. data, a decrease of 41.5% in the frequency of non-prepositional which relatives and a corresponding increase of 48.5% in the frequency of non-prepositional that relatives.  This is pretty stunning, and exceeds the changes in British English by roughly a FACTOR of 5.

Now, this effect is in the direction of "colloquialization", given that restrictive which is less frequent in spoken vs. written English.  But the effect is, in Leech & Smith's words, "dramatic", way beyond simple American colloquialization.  Leech & Smith conclude: "This preference [for restrictive that], amounting to an increasing taboo against which as a restrictive relativizer, is now built into grammar checking software, and we can expect it to be making even greater headway at present than in the early 1990s."

But.  But.  What we're looking at here is what comes out of the publishing enterprise.  We don't know what went into it.  Here Anne Fadiman's passing reference to copyediting suddenly becomes relevant:

"Letter from the Editor: The Thanksgiving Table", American Scholar, Autumn 2004, p. 9:

I also read through many of the folders in my twenty-two linear feet of SCHOLAR-related files.  One of them was labeled "Checking."  It contained research material faxed by Jeanie [Stipicevic, managing editor] and Sandra [Costich, associate editor], who not only format our pages and enforce the sacred distinction between which and that but also check our pieces for accuracy.

Oh dear, "the sacred distinction".  And it comes up in connection with the formatting of pages, a matter of the mechanics of publication.  As I've noted here before, U.S. publishing establishments (even those that are arms of British publishing establishments) tend to view the That Rule as a mechanical stipulation, like spellings in -or (rather than -our), rejection of the serial comma, and placing commas and periods inside (rather than outside) closing quotation marks.  It in enshrined in house style sheets, in the very influential Chicago Manual of Style, and in Microsoft Word's grammar checker.  I find this bizarre, but there it is.

What's important here is not that all these sources of advice subscribe to the That Rule -- after all, real-life writers happily, regularly, and systematically fail to adhere to the proscriptions in the manuals, as should be clear from studies like Leech's -- but that those who mediate between what people write and what gets published subscribe to the rule.  Who knows what people actually write?  Whatever you type in, Microsoft Word or a copyeditor will silently alter it, at least if you're in the U.S.  It's out of your hands.

So what do corpus linguists make of the results?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:36 PM

Dangling modifier in the Declaration of Independence

Language Log wishes you a happy Independence Day. And for those who listened this morning as the staff of NPR's Morning Edition read the Declaration of Independence all the way through on the radio, notice that there is a dangling modifier in there, in the list of complaints against the king:

He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them.

Even with the preceding context, one tends to get a wrong reading on that last finite clause. Listen to it on its own: "When so suspended, he has utterly neglected to attend to them." It sounds as if it's talking about occasions on which the king has been suspended (by his thumbs, perhaps). But it's supposed to be referring to occasions on which the laws were suspended prior to royal assent. It's a dangling modifier. It's just not a very bad one — not as baffling as the one by Ivan Watson that I commented on here, or the stunningly inept example that Arnold Zwicky spotted: unlike those, it's fairly easy to interpret.

The point to take away is not that the Declaration of Independence is badly written. It is a bit wordy, but it's a masterly and beautiful piece of writing. No, the point is merely that dangling modifiers have been around in English for hundreds of years, and even the best writers sometimes write them, and to say that they are syntactically barred simply doesn't account for the facts. It is a much subtler restriction than ordinary syntactic rules like that the precedes the head of its phrase. It's more like some kind of rule of linguistic etiquette, oft violated, as rules of etiquette usually are. One of the things that makes rules of syntax interestingly different from most kinds of rules is that they are mostly obeyed without hesitation or reflection, nearly all the time.

Posted by Geoffrey K. Pullum at 03:30 PM

Documenting snowclones, dating them

It's one thing to document a phenomenon, another thing to trace its history.  Words, idioms, syntactic constructions, verbal formulas, morphological forms, phonemic distinctions, pronunciations, etc. -- all can be documented, in their full variety, as they are now, and we can investigate the paths that led to this state.  Often investigating history will allow us to understand why some puzzling synchronic details are the way they are.  On the other hand, changes not infrequently make details of the history unrecoverable from current states.  In any case, documenting a phenomenon and dating it are two different enterprises.


My interest in snowclones (and eggcorns and syntactic variation and much else) is mostly on the documentation side, though it's often important to point out that phenomena that people think are very recent have been around for a long time.  In doing so, though, I make no claims about the "original" versions of these phenomena -- if, indeed, there can be said to be any such things.  In providing cites of the WHAT IS THIS 'X'? snowclone that go back a while, as I did here yesterday, I'm making no claims in the antedating game, just noting that it wasn't born yesterday.  So I'm somewhat annoyed when I get e-mail that presupposes that since Ben Zimmer provided cites going back to the earliest days of the Internet, he and I were claiming that the snowclone originated then.  Silly Ben and Arnold!

Still, it's intriguing to see earlier occurrences and possible antecedents, and I'll report on these here.  While noting that snowclones are often frozen versions of sentiments that people have been expressing for millennia.


Ben provided a pile of Internet cites (which are easy to come by, after all) going back to 1983 and speculated that the snowclone might have originated in cheesy science fiction flicks of the '50s and '60s.  All this merely establishes that the snowclone's been around for a while.  From John Kozak in e-mail and Grant Barrett on ADS-L comes the reference to the wonderful (and enormously popular) Hitchhiker's Guide to the Galaxy by Douglas Adams.  HGG started as a radio show (the relevant episode was broadcast on 22 March 1978) and then appeared in book form in 1979.  In Grant Barrett's re-telling from the book:

the hero, Arthur Dent, is taken to a the bowels of a hyperdimensional factory floor where a new Earth is being built. He is told by a scientist named Slartibartfast that the hyperdimensional beings in charge are mice (at least, that's how they look in the factory's dimension). Arthur replies,

"Look, sorry, are we talking about the little white furry things with the cheese fixation and women standing on tables screaming in early sixties sitcoms?"

Slartibartfast coughed politely.

"Earthman," he said, "it is sometimes hard to follow your mode of speech. Remember I have been asleep inside this planet of Magrathea for five million years and know little of these early sixties sitcoms of which you speak."

This exemplifies the variant with a demonstrative ("these" in this case) and with the stilted "of which you speak", but not the interrogative form.  Barrett concludes, "I do think that both the radio version and the book were popular enough to act as the blasting cap for the larger explosion of the term's popularity, at least among the geek set."

Barrett's reference to an "explosion" is important.  Linguistic variants, of all sorts, start out small, hang around at low frequencies, and then (sometimes) spread rapidly.  From the point of view of the resulting state of the language, what's most important in this history is the point at which the variant takes off.  From the point of view of history for its own sake, what's most important is the mechanisms that gave rise to the variant in the first place. 

In the case of snowclones, the sentiments expressed are usually pretty banal, of the sort that people might have been uttering since the dawn of language.  Surely, people have been forever observing that foreigners -- space aliens included -- might be ignorant of aspects of the language and culture they are visiting and might express themselves awkwardly, or even ungrammatically, in inquiring about these aspects.  There are many ways of couching this observation.  At some point, one or more of these ways attract attention and a fashion for them takes hold.  There's an explosion.  The snowclone Big Bang.

Several correspondents have provided quotations that might come from the days before the Big Bang.  Language Hat's posting on WHAT IS THIS 'X'? on his blog elicited a cite (from "aldiboronti") from the 1943 English translation of Saint-Exupery's The Little Prince: "My little man, where do you come from? What is this 'where I live,' of which you speak?"  This diverges significantly from the original French: "D'où viens-tu mon petit bonhomme? Où est-ce 'chez toi'? "

And Cameron Majidi takes it back to the 1794 gothic horror novel The Monk, by Matthew Lewis: "'Father, you amaze me!  What is this love of which you speak?  I neither know its nature, nor if I felt it, why I should conceal the sentiment.'"

Soon, no doubt, there will be cites in Latin and Greek, maybe even a quotation from Gilgamesh.  Cultural contact goes back a long long way.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:21 PM

Copyrighting everything

The other day while looking for something else I chanced upon this suggestion in the archive of a discussion list about the work of the Union for the Public Domain, a non-profit organization whose purpose is "to protect and enhance the public domain in matters concerning intellectual property".

How hard would it be to program a computer to eventually write every possible sentence possible in every language, then patent AND copyright both the input AND the output of such a device?
Thus copyrighting everything possible in every language!!!

The context is a discussion of how it might be possible to use intellectual property law to subvert itself and reduce what many people consider to be an abusive privatization of what should be in the public domain.

From a linguistic perspective there are two striking things about this proposal. One is that it assumes that each language contains finitely many sentences. That most if not all human languages are infinite is one of the central observations of modern linguistics. It isn't possible to generate all of the sentences of a language because you can always construct a new, longer sentence.

The other point is that it suggests that there aren't all that many languages and that they are sufficiently well understood that it would be possible to write computer programs enumerating all of their grammatical sentences. The task becomes quite implausible if one knows that there are around 7,000 oral languages plus an unknown number of signed languages.

The task becomes even harder if one realizes how few languages are sufficently well documented. Here there are no reliable statistics. There is no comprehensive survey of the state of the documentation for the world's languages. In fact, even regional surveys are hard to come by. When I did one for the native languages of British Columbia five years ago, it was as far as I can tell the first one ever done. Even in the absence of reliable figures, it is clear that the state of documentation is not very good.

For the native languages of British Columbia, I found that half had a reasonably adequate grammar and that less than a third had a decent dictionary. A recent survey of 150 of the 340 languages with over 1,000,000 speakers found no dictionary for 17. Most of the world's languages are not adequately documented.

My point here is not to berate the author of this suggestion. Both the author and many of the others in the discussion are very distinguished people. The author of the suggestion is Michael Hart, the founder of Project Gutenberg. Another participant in the discussion is Richard Stallman, founder of the Free Software movement and the GNU Project. The message is cc-ed to Brewster Kahle, creator of the Internet Archive and the associated Wayback Machine. These are intelligent, well-informed people. Yet not only did Michael Hart make what from a linguistically informed view was a wild suggestion, but not one of the points I have mentioned was picked up on in the subsequent discussion. What this goes to show is how very little most people, even intelligent and knowledgable people, know about language.

Posted by Bill Poser at 01:50 PM

Couch Potatoes and Canola

The attempt by British potato farmers to get the term couch potato removed from the Oxford English Dictionary discussed recently by Mark has a parallel of a sort in the term canola, from which canola oil is made. The traditional name for this plant is rape; the oil is known as rapeseed oil. Canola is a trademark for a particular cultivar of rape developed in Canada by Keith Downey and Baldur Stefansson. Their cultivar was an improvement because it contains much lower levels of the glucosinolates, which give the oil a bitter flavor, and of erucic acid, an amino acid, which causes heart lesions. The name Canola reflects this fact. It stands for Canadian Oil Less Acid.

Although Canola strictly speaking designates only one of many varieties of rape and is a trademark to boot, it has become very widely used in the United States as the generic term for the plant, evidently because rape was felt to be unappealing.

Etymologically, the plant rape has nothing to do with the crime. The name of the plant descends from Old English rapum "turnip", while the crime is a derivative of the Latin verb rapere "to seize".

Posted by Bill Poser at 02:34 AM

stealthistitle

According to this article in tomorrow's New York Times, Leo Stoller, a Chicago businessman, claims to own all uses of the word "stealth" and has made a practice of sending cease-and-desist letters to anyone else who uses the word. In many such cases, the recepient gives in rather than spend the thousands of dollars that litigation would cost. In some cases, the recipients of his letters have paid him thousands of dollars to avoid the threatened lawsuit.

Mr. Stoller's company Rentamark does own the trademark stealth for some products, and in those markets he is entitled to enforce the trademark. Part of the problem is that he thinks that he owns stealth in all markets. That is false. The only circumstance in which a trademark becomes universal is when it is a so-called famous mark, one that has become so widely known and so closely associated with a particular company that anyone encountering it would naturally assume that the product was made by that company, even if there is little likelihood of the products being confused.. The idea underlying this is that the use of a famous mark by someone else dilutes the association between the mark and its owner. This situation arises most easily with artificial trademarks like Xerox and Reebok. When a trademark is an ordinary word, it must not only be very well known but will usually be a trademark used on a wide range of products. An example would be Sears.

Mr. Stoller's claim to universal ownership of stealth is wrong for two reasons. First, it isn't a famous mark because it isn't well enough known. I, for one, have never heard of stealth crossbows, pool cues, insurance consultations, or air conditioners. Stealth is no Coca-Cola. The second reason is that the trademark stealth is held by numerous others for a variety of other products. Indeed, the article reports, Timex won a lawsuit against Mr. Stoller for infringing its trademark on stealth watches.

Ignorance of trademark law and over-agressive defence of trademarks are common, but this guy really takes the cake because he thinks that he owns not the word stealth but the sequence of letters s-t-e-a-l-t-h. Really. He sent a cease-and-desist letter to the InterActivist Network demanding that they take down their website http://stealthisemail.com/. Stealth is email doesn't make an awful lot of sense, and even if it did, the phrase is not the name of a product of a type for which Mr. Stoller owns the trademark. The intended name, which does make sense, is: Steal this email, a nod to Abbie Hoffman's Steal This Book.

For our younger readers, Steal This Book was a Yippie classic, written by a major figure of the 1960s counterculture. Abbie Hoffman was, among other things, one of the Chicago Seven, radicals who were charged with conspiring to incite a riot at the 1968 Democratic Convention in Chicago. At the end of a widely followed circus of a trial they were convicted but the convictions were overturned on appeal. Anyone who was a teenager or older in the US in 1968 would get the reference. Mr. Stoller was 22.

steal  th is not a word at all, and even if it were, it wouldn't be the same word as stealth because it isn't pronounced the same. The <th> at the end of stealth is the voiceless interdental fricative [θ], whereas the <th> at the beginning of this is the voiced interdental fricative [ð]. The nuclei are also different. steal  th... has a long, tense, diphthongized high vowel [i], whereas stealth has a short, lax, mid-vowel [ɛ].

Posted by Bill Poser at 02:08 AM

July 03, 2005

What is this 'snowclone' of which you speak?

More news from the American Dialect Society mailing list, this time about some snowclones (a topic most recently discussed in this space here) we haven't talked about.  From Ben Zimmer, on 24 June, a report on a snowclone with two basic variants:

"What is this 'X' (that) you speak of?"
"What is this 'X' of which you speak?"

and then a reference to the related "'X'?  What is 'X'?"

Zimmer finds examples all the way back to the early days of Usenet:

There has been a lot of net discussion about "toilet paper" recently. Just what is this "toilet paper" of which you speak?  Where can I find it?  (from net.misc, 24 August 1983 (link))

In e-mail on 28 June, Zimmer supplied a pile more:

What is This "Weblog" of Which You Speak? (link)
What is this mouse book of which you speak? (link)
What Is This "Spice Market" Of Which You Speak? (link)
What is this 'constitution' of which you speak? (link)
What is this "TimeLogic" of which you speak? (link)
-----
What is this Final Fantasy you speak of? (link)
What is this "environmentalism" you speak of? (link)
What Is This "Print" You Speak of? (link)
What is this Devil that you speak of? (link)
What is this 'life' you speak of? (link)

Zimmer's ADS-L discussion continued:

The origin seems to be in the collective memory of big-screen and small-screen science fiction from the '50s and '60s. It has the sound of a clichéd line spoken by an alien to a human exploring other planets (often the vocative "earthling" is appended). In such "first contact" scenes, aliens can of course speak perfect English yet lack certain key concepts and their associated significations, which the humans can then explain. (It's also possible to imagine the line spoken in intra-human settings involving time travel, lost tribes, unfrozen cavemen, etc.)

The fronted version with "...of which you speak" adds an extra component of alien formality (cf. Yoda's inverted syntax, as discussed on Language Log [starting here]). I haven't found any firm evidence that either version was actually used in classic sci-fi on film or TV.

Closely related to this snowclone is the line, "'Kiss'? What is 'kiss'?"-- emblematic of campy interplanetary romance, which of course is invariably between a male human and a female alien. (It was a favorite catchphrase of the crew on Mystery Science Theater 3000.) The line is often attributed to Altaira (Anne Francis) in The Forbidden Planet (1956) or to one of Kirk's conquests in the original series of Star Trek.  This was investigated on the rec.arts.sf.tv newsgroup, and they've ruled out The Forbidden Planet and Star Trek.

It's probably just a spurious quotation, along the lines of "Play it again, Sam" or "Judy, Judy, Judy" (or for that matter "Beam me up, Scotty").  Star Trek did, however, have many "What is 'X'?" scenes, most notoriously in the episode "Spock's Brain", which had the immortal line, "'Brain' and 'brain'! What is 'brain'?"

Zimmer also supplied a partial transcript of another episode featuring cross-planetary misunderstanding, "The Apple", in which an alien asks "What is love?" and, on understanding that holding and touching are involved, announces: "Vaal has forbidden this."

[And now (in e-mail) Adam Linville adds the variant "'X'? What means this 'X'?", also presumably from sci-fi contexts.  Still more foreigner-talk.]

zwicky at-sign csli period stanford period edu

[Update: more here.]

Posted by Arnold Zwicky at 08:12 PM

I'monna

In response to my post on I'ma, Carrie Shanafelt wrote in to point out that for some people this may be related to another reduced form of "I'm gonna":

In my (caucasian) family this is used as the more immediate version of "I'monna," a Deep-Southernism in which the "g" of "gonna" is dispensed with. As in:

(Rising from the couch, putting on shoes) "I'ma go run."
Response: "See you later!"
(Wearing pajamas) "I'monna go run." (More often: "I'monggo run.")
Response: "Yeah, right."

I don't think I'monna can be exclusively a Deep-Southernism, because it's the normal casual form of "I'm going to" in my own speech, and I was born and raised in rural New England. However, the further reduction (if that's what it is) to "I'ma" isn't a form that I use, as far as I know.

Carrie's note continues

It might be Caribbean in origin, but it's widespread among plenty of white Southern Americans as well as (especially young) African Americans throughout the U.S. "I'monna" (spelled "Ahmowna") used to be one of Jeff Foxworthy's Redneck shibboleths.

Now I'm really curious about this. Given that the I'monna form was normal in east-central Connecticut in the 1950s, at least judging from my own speech and what I remember about that of my kid friends, I'd be surprised if it had any crucial connections to the Caribbean or the American South at all. Of course we pronounced it something like [ˈɐɪ.mə.nə], not [ˈa.mo.nə] or whatever Foxsworthy meant by "ahmowna". And like I said, I come from rural New England, so maybe we should have passed the redneck test after all.

Carrie closed with an interestingly nonstandard idea. It's routine to worry that modern media are homogenizing speech -- or to insist that local forces are resisting successfully -- but she suggests that media are actually serving as new channels to create differentiation, though by subculture rather than by geography:

A lot of hiphop slang now, St. Louis-style ("Right thurr!") and Atlanta crunk ("I'ma..."), seems to be popularizing quirks of local dialect that then appear in strange places like New York City. I don't hear older African-Americans in NYC saying "I'ma," just hiphop kids. I guess I'm resistant to the notion of a singular African American Vernacular English in a moment in which there are so many AAVE's being creatively played with, manipulated, exaggerated, and disseminated through music and television.

Posted by Mark Liberman at 07:29 PM

Not that adjective of (a) noun

Matt Hutson asked via email:

Is the following sentence grammatically correct?

"It doesn't seem like that painful of work for $50/hour."

(Context: I was describing to my sister a freelance opportunity, and this was her reply.)

We all hate wishy-washy answers ("well, that depends on what you mean by 'grammatically correct'...), so I'm happy to be able to tell Matt that I estimate his sister's sentence to be 424.7 times less grammatically correct than "It doesn't see like that painful of a job for $50/hour " is. Since "that painful of a job" is already a construction limited to informal registers, it's not surprising that the sentence triggered enough of a WTF reaction in Matt to motivate him to ask about it.

I'll say a bit about the grammatical structures involved, and then give the details of my calculation.

There's a general pattern we could describe as

"not that ADJECTIVE of a NOUN"

meaning roughly

"not such an ADJECTIVE NOUN"

Examples:

It wasn't that big of a story. (~ "It wasn't such a big story")
It wasn't that big of a deal. (~ "It wasn't such a big deal.")
It wasn't that strong of a fragrance. (~ "It wasn't such a strong fragrance.")
It wasn't that bad of a layout.
It wasn't that great of a movie.
It wasn't that easy of a walk back.
It wasn't that hard of a change for me.

Now, all of these examples involve head nouns with an indefinite article. Things like "it wasn't that big of the story" feel to me like they aren't English, and don't seem to occur on the web. Also, the cited examples have singular nouns; you can find plural examples on the web, e.g.

I am a young boy who can not pull that big of jobs.

but they seem to be rather rare and feel rather odd, at least to me.

Finally, all the examples I've given are count nouns. Again, you can find mass noun examples on the web:

You're not happy, and from looking at you, I can tell you aren't used to that strong of wine.
I didn't know you have that warm of weather out there.
You will usually not use an anchor buoy system, because you are not in that deep of water.

but again they seem to me to be rarer and odder. And the thing about Matt's sister's sentence is that work is a mass noun.

To provide empirical support for my intuition that this construction is odder with mass nouns, I'd have to do a psycholinguistic experiment. But I can show that it's rarer by doing some web searches.

 
job
work
painful ___
6,290
10,900
that painful of (a) ___
0
0
hard ___
154,000
13,500,000
that hard of (a) ___
3,900
805

Looking for "that painful of work" brings no joy -- Matt's sister is blazing new linguistic trails here.

However, hard is much commoner with both job and work than painful is, and as a result we can actually see a reasonable number of (Google) counts for both "that hard of a job" and "that hard of work". Furthermore, we can look at the relative frequency of "hard job" compared to "that hard of a job"

154000/3900 = 39.5/1

and of "hard work" compared to "that hard of work"

13500000/805 = 16,770/1

Thus "that hard of work" is

(13500000/805)/(154000/3900) = 424.7 times less common.

Now, this is a simple-minded model with slipshop parameter estimation. To make this into a serious example of Google psycholinguistics, we'd need to look at many adjective-noun combinations, and and consider a wider range of functional forms for the model. Still, I'm fairly confident that we'd learn that the "[not] that ADJECTIVE of (a) NOUN" construction is a lot rarer with mass nouns than with count nouns.

Posted by Mark Liberman at 03:36 PM

Rummy's last throe

throe
Over on the American Dialect Society mailing list, Larry Horn posted on 28 June about U.S. Secretary of Defense Donald Rumsfeld's instructions on lexicography:

"The lethality is up," Rumsfeld said. "Last throes could be a violent last throe, just as well as a placid or calm last throe. Look it up in the dictionary."

Singular throe reminded Horn of singular kudo, so he took Rumsfeld's advice to check out "the dictionary".  Which Rumsfeld should have done himself.


A fuller account of Rumsfeld's adventures in lexicography, in Horn's telling:

Rumsfeld, speaking on "Fox News Sunday," defended Vice President Dick Cheney's widely criticized remarks that the insurgency was in its "last throes," even as he predicted a possible near-term increase in violence.

The number of attacks had remained "about level," but the insurgents were becoming more deadly, Rumsfeld said. The U.S. death toll in Iraq exceeds 1,700, and last week six Americans were killed in a bomb attack in Falluja.

"The lethality is up," Rumsfeld said. "Last throes could be a violent last throe, just as well as a placid or calm last throe. Look it up in the dictionary."

Horn is on the job:

As always, one cannot be sure which dictionary is "the dictionary", but the one closest to hand, AHD4, doesn't help identify that placid throe, or indeed even the violent one, when it's used as a singular:

throe n.  1. A severe pang or spasm of pain, as in childbirth. See synonyms at pain2. throes A condition of agonizing struggle or trouble: a country in the throes of economic collapse.

Presumably, it's not the spasm of pain that's involved here, but the condition of agonizing struggle.  Unlike "kudos", "throes" did originate as a (Middle English) plural, but singular "throe" (e.g. of revolution) has long since gone the way of "kempt" or "couth" and thus now represents a reanalysis-cum-back-formation from "throes". I'm sure google would have provided the Secretary with many models for his usage, but it hasn't made it into "the dictionary" yet.

(P.S.  If you're keeping score, Rummy also allowed that this particular last throe may last up to 12 years.)

So much for Rummy's last throe. [Added 3 July: Ben Zimmer now points out that Jon Stewart also consulted a dictionary on The Daily Show, to considerable comic effect.  The clip can be downloaded here.]

zwicky at-sign csli period stanford period edu


Posted by Arnold Zwicky at 02:18 PM

I'ma

Ella at Cherrier takes a look at the contraction I'ma in hiphop lyrics. She observes, for example, that in Big Tymers' Get High, the -a form seems to be in call-and-response complementarity with gon':

Well I’ma smoke (I’m gon’ smoke)
Until I choke (until I choke)

There's a review of previous research on forms of going to in Shana Poplack and Sali Tagliamonte, "The grammaticization of going to in (African American) English", Language Variation and Change, 11 (2000), 315-342. The only reference there to I'ma is this:

AAVE, like other varieties of English, is reported to express future variably with will, going to, and the present (Labov, Cohen, Robins, & Lewis, 1968:250). Though AAVE is generally considered (Labov et al., 1968:250;Winford, 1998:113) to prefer forms of going to, Labov et al. (1968:250) noted that will is “quite secure” in contemporary AAVE, despite the fact that frequent word-final consonant deletion may render future forms with contracted will indistinguishable from present tense forms (Labov, 1972:24–25).

In fact, the few published observations on the expression of future in AAVE focus not on the opposition between will and going to, but on putative distinctions among the variant forms of going to (e.g., gonna, gon), the phonological reduction of which is said to be “highly characteristic” of AAVE (Labov et al., 1968:250). Some authors have associated these variant forms with different meanings. Joan Fickett (personal communication, cited by Labov et al., 1968:25) suggested that the reduced form I’ma denotes immediate future, in contrast to I’m gonna, which would be more remote. Winford (1998:113) suggested a distinction between AAVE gon and gonna parallel to the creole distinction between “pure future” go/gon and “prospective” future goin/gwine (cf. Winford, 1998:133n.14), basing this analogy on Rickford and Blake’s (1990:261) finding of more copula absence before gon than gonna.

One of the reasons for interest in this topic has been the question of the relationships among different Atlantic English-based creoles and AAVE ("African American Vernacular English") -- see John McWhorter, "Sisters under the skin: A case for genetic relationship between the Atlantic English-based creoles", Journal of Pidgin and Creole Languages 10.2:289-333, 1995. As Poplack and Taglimonte go on to write:

More generally, the high rate of zero copula in this context has been invoked as evidence that gon(na) originated from a creole preverbal irrealis marker go (e.g., Holm, 1984; Rickford, 1998:183) or reflects the adoption of a lone preverbal form as a result of substrate influence (Mufwene, 1996:10). In contrast to the tense distinctions that characterize English, English-based creoles are said to make a basic modal distinction between realis and irrealis. Realis refers to situations that have already occurred or are in the process of occurring, while irrealis refers to unrealized states and events, including, but not limited to, predictions about the future. Indeed, future time reference is but one possible interpretation for irrealis markers (Comrie, 1985:45); they are also used to mark conditional mood (Bickerton, 1975, 1981) as well as to convey possibility and obligation (Bickerton, 1975, 1981; Holm, 1988; Winford, 1996, among others).

Interestingly, although irrealis markers differ across English-based creoles, most, if not all, derive from an English future marker: thus, sa (< shall; or possibly < Dutch zal) in Sranan (Seuren, 1981;Winford, 1996) and Ndjuká (Holm, 1988), and we/wi (< will) in Jamaican Creole English (Bailey, 1966; Gibson, 1992), Carriacouan Creole English (Gibson, 1992), 18th- and 19th-century Nigerian Pidgin English (Fayer, 1990), and Kru Pidgin English (Singler, 1990).The most widely used marker go(n)/guo/o (< going to) has reflexes in just about every attested English-based creole (Aceto, 1998; Bailey, 1966; Bickerton, 1975; Fayer, 1990; Gibson, 1992; Holm, 1988; Seuren, 1981;Winford, 1996; see also Faraclas, 1989; Hancock, 1987). Its frequency may explain the creole origin many impute to variants of going to, particularly gon(na), in contemporary AAVE and in Gullah (Mufwene, 1996:8). If AAVE gon(na) in fact derives from this creole marker, it should show at least some parallels with it as well as some differences from English. But a closer inspection of the literature on future marking in English-based creoles reveals, as in AAVE and English, a good deal of variability. For example, both Gibson (1992:64) and Bailey (1966:46) cited wi as the future marker in Jamaican Creole English but noted that the future may be expressed by “the go periphrasis” (Bailey, 1966) as well as by the progressive marker a (Holm, 1988:164). Similarly, Gibson (1992) noted variation in Carriacouan Creole English between the “more conservative” wi and guo, as did Singler (1990:207) in Kru Pidgin English. Sranan expresses future, in some cases apparently interchangeably, with both o and sa (Seuren, 1981;Winford, 1996, to appear-a). Hancock’s (1987:290–291, 301) overview of future marking in 33 anglophone Atlantic creoles likewise reveals much variability, both across and within varieties. Here, then, is yet another case where not only the variants, but also co-variation among them, are attested in both English and English-based creoles. Only a comparative quantitative analysis of their distribution and conditioning would enable us to determine which underlying system gave rise to the surface forms in AAVE.

To our knowledge, no such analysis exists for any English-based creole, since creolists who have recognized this variability also tend to attribute to each of the variant forms a corresponding semantic function, invoking many of the same nuances that we have reviewed in connection with the English future auxiliaries, often with the same contradictory results. Thus, Winford (1996, to appear-b) ascribed to Sranan sa nuances of possibility and uncertainty as well as of posterior time, while Seuren (1981:1054) argued that it conveys “neutral predictions” and “future events or situations resulting from somebody’s insistence, order, wish, or promise,” while o “indicates a future event or situation resulting from some pre-established plan or from natural causes already at the time of speaking.”

According to Peter Patrick ("Jamaican Creole morphology and syntax", in Bernd Kormann et al., eds., A Handbook of Varieties of English, v. 2, Mouton de Gruyter 2004), Jamaican has a progressive -a that can be optionally combined with preceding tense markers:

Progressive aspect is uniformly signalled by preverbal a (6-7), while habitual aspect is often unmarked, though at an earlier stage both were marked alike in a single imperfective category with (d)a (da and de persist in western Jamaica, Bailey 1966: 138). It is still possible to mark habitual with a+Verb, just like the progressive. Aspectual a is tense-neutral in JamC, and may be preceded by tense-markers (ben+a, did+a, ben+de, was+a etc.).

  (6)     -stative  
a, de
   progressive  
  Mi a ron  
  (7)     -stative   ben/did + a/de      past progressive     Mi ben a ron   


(6) ‘I’m running’ / ‘I was running’ / ‘I (used to) habitually run’
(7) ‘I was running’ / ‘I used to habitually run’

Completive aspect is signalled by don, which unlike other TMA markers may occur not only preverbally but after the verb phrase (8-9), or even both.

(8) Him lucky we never nyam him too, for we did done cook already. (Sistren 1987: 30)
      ‘It’s lucky we didn’t eat it too, for we had already cooked.’ [of a chicken]
(9) Dem deh-deh, till she cook and we nyam done. (Sistren 1987: 82)
      ‘They stayed there until she had cooked and we had finished eating.’

I guess that an influence from -a as a progressive marker might be responsible for the feeling that I'ma "denotes immediate future, in contrast to I'm gonna, which would be more remote".

I've quoted from a couple of recent scholarly sources just to indicate that linguists do study things like this, and that the methods and results of linguistic analysis shed some light on the problems. In this case, it looks like the verdict is uncertain, though at least it's well-informed uncertainty, with plenty of footnotes... But I don't know much about it. John McWhorter and Geoff Pullum, who have both published on related topics, can probably provide a clearer conclusion.

Anyhow, the volume of transcribed hiphop lyrics is now large enough to provide an interesting new source of linguistic examples. One problem is that the transcriptions are not always accurate and (even when accurate) not always clear. For example, I imagine that what is transcribed as "shole" in the second couplet of Big Tymers' refrain

And I'ma drank (I'ma shole drank)
Until I can't (until I can't)

is either just a funny spelling of sho', or else (?) a representation of sho' plus contracted 'll. (And the existence of a pattern
/I -m -a ADVERB -ll VERB /
might be the most interesting linguistic fact about the song...).

Of course, Ella no doubt went back to the source recordings to validate (the relevant parts of) her citations, which I haven't tried to do here.

[Update: some more lyrical evidence about auxiliaries, quoted in the NYT obituary for Luther Vandross:

But the best of all the recent Luther Vandross songs wasn't really a Luther Vandross song at all. It was "Slow Jamz," a collaboration between Kanye West, Twista and Jamie Foxx. The song was built around a snippet of one of Mr. Vandross's ad-libs from the end of "A House Is Not a Home." It's a quick little vocal run - "Are you gonna be/ Say you're gonna be/ Are you gonna be/ Say you're gonna be/ Are you gonna be/ Say you're gonna be/ Well, well; well, well" - but Mr. West made it faster to emphasize the syncopation, slyly speeding up a slow-jam specialist.

The lyrics pay tribute to the history of make-out music, with Mr. Vandross as Exhibit A. For listeners who still don't know how a Luther-enhanced seduction is supposed to work, Mr. West sums it up: "I'm-a play this Vandross/ You gon' take your pants off." This was, in its own shameless way, yet another classic Luther Vandross moment, and by no means the last. The man himself was missing, but his warm, achy voice seemed closer than ever.

]

Posted by Mark Liberman at 07:06 AM

July 02, 2005

Ethnography, journalism and interview rituals

Kerim Friedman has two interesting posts at Savage Minds discussing the role of quote-selection in ethography and in journalism.

In the first post, entitled Interviews, he suggests that

One option is to make the source data – the interviews themselves – available to download. In fact, such “grey literature” may eventually become available as part of AnthroSource, but it will not be easy. For one thing, there are confidentiality concerns. How do we make our data publicly available while still protecting our sources? It is possible to do – but it would create a huge burden on researchers. In essence, one might be punished for being a good researcher and collecting large amounts of data, because then you would have to carefully censure much more data to make sure it is safe for public consumption.

I'd like to point out that it's not always so bad. In the first place, you can do some ethnography using material that is already public, like radio talk show recordings, blogs and so on. There are also interviews that deal with material that is in the public sphere (or at least is not intrinsically private), like oral history recordings, where anonymity is not expected and where it's normal to obtain informed consent for publication without anonymization. In other cases, it might make sense to offer an "interview anonymization" as a service that could be performed using special tools and specially trained people; then the burden would be on granting agencies to fund anonymization for publication, rather than on researchers to do it all for themselves as a extra chore.

Some might argue that public-sphere recordings (like talk shows and oral history interviews) engender biases of selection and self-presentation. Probably so, but the same could be said for any other sort of interview; and Kerim implicitly addresses this point in his second post, entitled Vox Populi, where he discusses the case of Greg Packer:

In 2003 Ann Coulter suggested that the New York Times had made him up because she found over a hundred posts where he was quoted “as a random member of the public.” Well, it turned out that he is in fact a real person, and that getting quoted by the press is his hobby. NPR’s show On the Media interviewed him this past weekend, and he still seems to be doing the same thing, despite a memo by the Associated Press management telling their reporters to avoid him.

It made me think about ethnographic Greg Packers. Like reporters, anthropologists often end up speaking to those informants who like speaking to us. I know that some of my informants have since ended up meeting other anthropologists working in the area, although I don’t know if they ended up in their dissertations or not. I have also twice had the experience of suddenly recognizing the description of another anthropologist’s informant as a mutual friend.

[...]

Because anthropological sources are usually pseudonymous, it isn’t possible to trace our Greg Packers across ethnographies, so we’ll never know how many of them there are.

Kerim links to an NPR "On the Media" interview with Greg Packer (mp3, transcript).

BROOKE GLADSTONE: There's a certain kind of story that calls for a few words from the man on the street, but for every time reporters call on a local man or area resident, there are a bunch of responses you never hear, such as "Please leave me alone," or "Get lost," or "How much will you pay me?" But if finding a willing voice is a problem for journalists, Greg Packer has provided the solution for more than a decade. In the last 10 years, he's been quoted at least a dozen times by the New York Post. He's been quoted at least 14 times by the Daily News, most recently just last week. He was quoted in the Atlanta Journal-Constitution two weeks ago. And Packer has been quoted or photographed at least 16 times on separate occasions by the Associated Press. But who's counting? Actually, Packer is.

There are other Greg Packers out there among the experts that journalists use to get quotes on political, financial or other specialized topics. Journalists quickly learn who can be counted on to return a call and "give good quote". Willing sources soon learn what sort of thing journalists want to hear, and will gladly trade a few minutes of their time for yet another mention in the press as an expert on X.

Posted by Mark Liberman at 01:39 PM

The elephant fights back

A few weeks ago, I posted a few comments about Derek Bickerton's new theory that our ancestors developed language so they could cut up dead elephants better. In more dignified and scientific terms, he suggests that language emerged because of the need to recruit and coordinate crews to scavenge the carcasses of naturally-deceased megafauna. Yesterday, Derek posted a reply, in which he starts off with a wink

Thanks, Mark, for spreading the word, even if I don't altogether agree with your conclusions. Maybe you're the reason my site visits doubled last month. So if I say anything personal--it's not personal! Get it?

and then throws a few hard shots behind a snappy jab:

Of course you can compare apples and oranges (or, perhaps more relevant here, shit and Shinola) and come to the conclusion that they're both round, or both brown, so hey, what's the difference, let's move on. Sure saves thinking too hard.

Since it's time for me to head out to today's EMELD sessions, I'll have to get through this round with a rope-a-dope strategy. But just wait till I'm back in Philly!

More importantly, Derek also posted a preprint entitled "Language Evolution: a Brief Guide for Linguists", a version of a paper that will appear in Journal of Linguistics at some point in the future. It's full of interesting ideas, and fun to read. For example:

Into the middle of this confused and confusing situation there appeared in the journal Science a paper (Hauser, Chomsky & Fitch 2002) aimed at setting the scientific community straight with regard to language evolution. Its magisterial tone was surprising, considering how little work any of its authors had previously produced in the field, but no more surprising than the collaborators themselves: since Hauser was known as a strong continuist and Chomsky as a strong discontinuist, it was almost as if Ariel Sharon and Yasser Arafat had coauthored a position paper on the Middle East.

I think you can tell that this isn't a balanced and dispassionate survey, but a strong presentation of an interesting individual perspective. More on it later.

Posted by Mark Liberman at 07:35 AM

EMELD

After returning from a trip to Colorado and Wyoming, I'm now at the EMELD 2005 Workshop in Cambridge, MA. "EMELD" stands for "Electronic Metastructure for Endangered Languages Data". This particular workshop is focusing on just one aspect of EMELD, namely GOLD (the "General Ontology for Linguistic Description"). So far, I've heard about some very interesting work, which hasn't quite overcome my general worries about the application of Semantic Web ideas and tools in science (or elsewhere, for that matter).

Roughly, E-MELD aims to solve three problems: how to make the documentation of endangered languages durable (so that it can still be read and used in 20 or 50 or 100 years), how to make it interpretable as data (other than to an informed human eyeball), and how to make it interoperable (so that you could search or amalgamate data across descriptions of many different languages by many different people). I hope it's obvious that these are real problems. In terms of durability, a corpus or a dictionary in a proprietary format may be difficult or impossible to use just a few years from now. One extreme example: the archive of scripts and transcripts at the Voice of America used to be kept in the storage format associated with a now-defunct Xerox multi-language word processing system, which (as I understand it) was basically a binary dump of the run-time heap of the program. In terms of interpretability, interlinear glossed text (whether expressed in a word processor file, in a typesetting format like .pdf or as plain text) may be quite readable, but can be very difficult to transform into a database that can be searched, linked to a dictionary, etc. And in terms of interoperability, the problem is that linguists use a wide variety of terminology (e.g. some linguists might use "nominative" where others use "absolutive") and an ever wider variety of abbreviations (NOM might be short for "nominative" or "nominalization" or "nominal").

The durability problem is the most important one, and it also has the easiest solution: just use open, documented standards (and archival-quality storage methods, of course). The interpretability problem is the next most important one, and it's fairly easy to solve: use (tools that produce) well-designed descriptive mark-up in a well-defined format such as XML, rather than presentational mark-up.

The interoperability problem is by far the hardest one. The proposed solutions involve connecting descriptive entities and relations to a shared ontology, or a lattice of partially shared sub-ontologies, or a set of mappings among ontologies; and using tools like RDF and OWL in all their variants to keep track of all the connections and correspondences. Some nice examples have been presented -- for instance, Scott Farrar and William Lewis discussed an experimental effort to create a cross-language database of Interlinear Glossed Text (IGT) from examples available on the web, and Gary Simons about how to use RDF metaschemas to combine entries from three apparently incompatible dictionary databases.

I'm still thinking about all this, but I have two specific concerns. First, the ontologists' focus on terminological logic distracts attention from a number of other problems that are at least as important, such as how to represent the complex connections among recordings, transcripts, texts, analyses and lexicons (even within a single descriptive framework applied to a single language). In fact, the ontologists' methods sometimes can make it much harder to solve these other problems, for example by prescribing inappropriate structures to sets of concepts or to linguistic objects. (I'll say more about these problems in a later post). Second, the process of "ontologizing" a linguistic description is complex, difficult and time-consuming, as examplified in a workshop presentation by Laura Buszard-Welcher on her "experience as a field researcher mapping morphosyntactic categories of Potawatomi, an Algonquian language, to the GOLD ontology through FIELD, an ontology-based lexical database program". I'm worried that these difficulties will delay the adoption by "ordinary working linguists" of (much more accessible) tools and practices that solve the durability and interpretability problems.

Meanwhile, with all this traveling, my net access has been frustratingly erratic, and I've had frustratingly few of those little chunks of blogging time in the interstices of my schedule. As a result, my to-blog list is growing frustratingly long. To all of you who've sent me suggestions, links and other messages: sorry, I'll get to it! (I hope...)

For an introduction to the broader controversy about what semantic-web-style ontologies may or may not be good for, see Peter van Dijck's "Themes and metaphors in the semantic web discussion". It's almost two years old, which is like a decade in internet years, but it's still relevant.

Posted by Mark Liberman at 06:07 AM

July 01, 2005

[expletive discussed]

In today's NYT story on Mahmoud Ahmadinejad's role in the 1979 U.S. embassy seizure in Tehran, there's an unusually cumbersome form of bowdlerization:

At 6:45 p.m. Monday, after seeing the picture on the Web site of The Washington Post, Mr. Daugherty sent an e-mail message to three other former hostages, Charles Scott, Donald Sharer and David M. Roeder, which began: "I assume you've noticed that the new Iranian president was one of" - here he inserted an expletive - "who was behind the takeover of the embassy and our incarceration. Not to mention having expressed a determination to pursue a nuclear program that will allow them to develop a nuclear weapon." [emphasis added]

This is not only inelegant, it's almost surely wrong. I can't think of any expletive that would result in a grammatical sentence if inserted in the gap in the phrase "the new Iranian president was one of __ who was behind the takeover of the embassy". You'd have to add some sort of non-expletive determiner as well, like "the c***s*****s" or "those a**h***s".

Anyhow, what's wrong with the old-fashioned methods? If they were good enough for Richard Nixon and the press coverage of the Watergate tapes, they should be good enough for Mr. Daughterty's email about Ahmedinejad:

The two men met in Nixon's Old Executive Office Building hideaway suite on May 18, 1973, and the president distastefully recalled how Kleindienst, "that tower of jelly," and Petersen had told him April 15 that Haldeman and Ehrlichman should resign immediately. "A bunch of [expletive] stuff," the president told Haldeman, then added:

"What I mean to say is this. We're talking in the confidence of this room. I don't give a [expletive] what comes out on you or John or even on poor, damn, dumb John Mitchell. There is going to be a total pardon."

Of course, these days most of Nixon's expletives would not even be deleted, according to Stephen Ambrose:

Nor will most viewers [of Oliver Stone's movie] realize that they are getting a cruel distortion of the language Nixon ordinarily used. In Stone's movie, he has Nixon saying "fuck" throughout -- in one scene, eight times. In fact, Nixon was a shy Quaker boy who seldom used locker-room language. The bulk of the "expletive deleted" words that Nixon blocked out on his transcript version of the tapes were "hell" and "damn." I have listened many times to the available tapes, some sixty hours' worth, recording conversations between Nixon and his closest advisers when they were in deep trouble, and I never heard him say "fuck." William Safire told me that Nixon sometimes said "asshole." He used "son of a bitch" regularly. In general, Nixon's language was mild, especially in comparison with that of Harry S. Truman, Dwight D. Eisenhower, Lyndon B. Johnson, and John F. Kennedy. Stone creates the opposite impression.

I wonder if this is entirely true. The WaPo paragraph quoted above has the sentence

I don't give a [expletive] what comes out on you or John or even on poor, damn, dumb John Mitchell.

You could substitute "damn" for that deleted expletive, but would make no sense given that "damn" is left in the clear, a mere dozen words later. Neither "hell" nor "son of a bitch" nor even "asshole" will work in that context.

I guess that particular expletive must have been "shit":

Why, Aitken agonizes, did Nixon delete the expletives, and thus highlight their presence even more? "The explanation," he says, "is that the tapes were censored with Hannah Nixon in mind." He quotes the president telling a staffer that "If my mother ever heard me use words like that she would turn over in her grave."

Aitken professes to be astounded by both the explanation and the corresponding public response, considering them examples of invincible American provincialism. He protests fervently that, after all, the worst words showing up on the tape were "shit" and "asshole," and that Nixon never vocalized "the familiar locker-room expressions for sexual intercourse..."

A search of the New York Times site suggests that the Gray Lady didn't consider Rick Perry's mofodiction fit to print, thus avoiding entirely the problem of how to describe it.

Apparently haunted by Richard rather than Hannah Nixon, the Transcript Editing Guidelines of the Presidential Oral History Project of the Miller Center of Public Affairs say under the heading of Profanity

Leave in, as these words communicate the force with which a particular point is made. Also, a transcript peppered with [expletive deleted] reminds one of the Nixon tapes and Watergate.

Posted by Mark Liberman at 10:32 AM

More comments on quotes

Several others have written in with comments on recent Language Log posts about journalists' use of quotes.

Liam Gerard Moran wrote:

On June 20th, a beat reporter covering the Reds for the Cincinnati Post blogged everything that he'd done during that day to give his readers an idea of what his job is like. It's interesting stuff in its own right, but has relevance for your recent series of posts on reporters and their quoting issues.

These posts are indexed here, and of particular interest is an excerpt from part two:

"6:05 p.m. -- Done with the meal, I'm back in the press box, theoretically to get started on transcribing. Today, though, I'm typing out this monstrosity, so I'm running a bit behind. But anyway, I use a digital voice recorder for every interview, if possible, in addition to taking notes the traditional way. Plenty of guys go strictly with a notebook and pen, but I don't feel comfortable relying strictly on that. I don't write fast enough, and considering I work for a paper with later deadlines than anyone else, I'd rather have my quotes be as accurate as possible. So I don't mind the extra time transcribing. But that explains why you'll see variations on the same quote in different papers the next day -- if you're taking notes, you're not getting every single word verbatim, and even when I record, sometimes I just am not able to hear clearly everything that is being said."

Thus some reporters -- those with enough time and enough interest -- do use digital recording technology to get accurate quotes. As my own experience shows, this is easy enough to do with current technology. After a small amount of thought, I'd like to amend my previous suggestions about possible next-generation technologies in this area. It would be nice to have a network-accessible backend for recordings, transcriptions and annotations, with user-defined control over shared access: something like Flickr or Google Video, especially oriented towards transcription and annotation of audio or of the audio tracks of video streams, with random access through convenient APIs that could be used by anybody's transcription, annotation, browsing and search tools. The front-end inputs could be fed in from pre-recorded files, real-time media streams, individual "wireless recorders", telephone calls or whatever (all consistent with the laws on audio recording, of course).

Ray Girvan (a free-lance journalist himself) wrote:

I would have commented earlier, but was in the middle of a serious deadline.

Guilty as charged over the ritual status of quotes - with qualifications. But I think it's also a matter of genre. In the area where I write (scientific computing) there's what I might call the "interview article" and the "technical article".

The former requires extensive quotes - so I'd ask permission, stick a portable recorder on the phone, and transcribe. I hate speaking to people by phone, so I don't write this kind of thing much.

For the latter, personal contact is not the primary source of information in the article. Nearly all the content comes from published texts (quite often those reporting the work of people I'd choose to quote) so an interview isn't going to add much. It's true, as you say, that the article's thrust will be the same whether the quotes are there or not.

However, the role of quotes isn't merely ritual: they serve as a vehicle for emphasis, colour, and for further background detail that I wouldn't find in the published material. Maybe Dr X has an extra unpublished anecdote, or a well-polished soundbite summarising the work, or a hint of work forthcoming.

In many kinds of reporting, the quotes really do add some extra content that the reporter could not easily present in his or her own voice. But it seems to me that the main function of quotes is usually to create at least a minimal sort of human connection between the reader and the people involved in the story, while at the same time bearing some of the weight of the exposition and giving an appearance of objectivity and factual grounding.

Richard Hershberger wrote:

The process of editing for length and clarity only works if the journalist is both unbiased and competent to judge what changes don't affect the meaning. While I expect a sports reporter quoting a post-game interview to be able to to this, any interview discussing specialized knowledge (linguistics, for example) is likely to be unintentionally butchered. And even unintentional bias is a problem. We all know that selective quoting can change the meaning of the utterance. Even if the reporter isn't setting out to do a hatchet job, a bit of editing to make the interviewee say what the reporter thought he should have, or meant to, or might have, and what is left is unrecognizable. I long ago learned that any general news story on a topic where I have particular knowledge will be wrong, often absurdly so.

A current sports story illustrates this well. Craig Biggio, of the Houston Astros, recently broke the modern (i.e. post-1900) major league record for being hit by pitches. There is a blog, http://www.plunkbiggio.blogspot.com/, that has been chronicling his bid for eternal fame, and this in turn has brought the matter to the attention of the mainstream sports media. The blogger is quite clear that this is the modern record, and the all-time record from the 1890s is still out there (though attainable). Check out the various newspapers and see how many report Biggio as simply having broken the record.

Posted by Mark Liberman at 03:35 AM