February 28, 2005

Got some dawgs runnin' around heah: a natural language processing story

The dream of computers being able to understand our spoken requests for information over the phone has been part of the artificial intelligence and natural language processing research mythos for decades. But intractable problems remain. A transcriber working for a company attempting to implement voice-driven computer directory assistance (my informant cannot be named) recently transcribed a call in which the usual "What city, please?" directory assistance greeting prompted the following from an African American woman caller in her 60s:

Well, I am in Preston, Maryland, but what I want is a number to some dawg catcher. I got some dawgs runnin' around heah, and I don't know who dawgs they is, and they pit bulls, and I want somebody to come and get 'em.

Work continues on deciding how exactly the phone number look-up system should be programmed to respond in such cases. It's probably a case for "Please wait while I connect you to an operator." Of course, it is known that when problems with animals come up, human operators can also run into intelligibility difficulties.

Posted by Geoffrey K. Pullum at 07:06 PM

The rhythm of the Remington

In response to my posts about Henry James, Ray Davis from pseudopodium.org wrote in with some biographical revelations about Henry James, and a suggestion about how to read him.

Just in case you haven't already been deluged with notes from Jamesians, here's the message I suspect you might have been deluged with: The break between "early" (or "middle", if you're an especially energetic Jamesian) James and "late" James is conventionally dated at 1896. That was when James gave up the dream of becoming a successful professional playwright, true. It was also when repetitive stress injury (probably due to his enthusiasm for the typewriter) led him to begin dictating his fiction to a secretary rather than writing the first draft himself.

He remained addicted to the rhythm of the typewriter:

"Indeed, at the time when I began to work for him, he had reached a state at which the click of a Remington machine acted as a positive spur. He found it more difficult to compose to the music of any other make. During a fortnight when the Remington was out of order he dictated to an Oliver typewriter with evident discomfort, and he found it almost disconcerting to speak to something that made no responsive sound at all."

The rhythms of his prose, gladly or sadly, however, with removal of the self-editor available to those who combine creation and self-inspection, rapidly evolved or collapsed into those sentences so offputting when attacked directly, as if firmly marching, power-tied, into a hostile board meeting, but so pleasing to those who have learned to float upon their languid, perhaps to the point of tepidity, surfaces.

Yes. Floating on languid, perhaps to the point of tepidity, surfaces. That captures the feeling exactly. I can see that I've been going about this all wrong, struggling with the arduous assembly of phrases and sentences and paragraphs. No doubt it was this old-fashioned kind of reading which led Oscar Wilde to conclude that "Mr. Henry James writes fiction as if it were a painful duty". If we just let the stream of consciousness carry us off, rather than trying to wade against the surge of words, the results will be completely different.

Let's try it:

It was not, fortunately, however, at last, that by persisting in pursuit one didn't arrive at regions of admirable shade ...

Yes. Feel those adverbs lapping languidly against the side of our little raft; and suppress your uncertainty about whether one arrives or doesn't arrive, and whether persistance in pursuit has anything to do with it, whatever the outcome. The shade is admirable in any case, and the tepid surfaces are dappled with patterns of pursuit and arrival.

Let's try again:

At this it hung before her that she should have had as never yet her opportunity to say, and it held her for a minute as in a vise, her impression of his now, with his strained smile, which touched her to deepest depths, sounding her in his secret unrest.

Now, what is it that she should have had (as never yet) her opportunity to say? That thing that hung before her? A null, unspecified object? or is the object of say "her impression of his..." Her impression of his what? Her impression of his now, i.e. his present moment? or her impression of his (now, etc.) sounding her? Or was it his smile that was sounding her? By the way, what held her as in a vise? was it the opportunity or the impression? And was it his smile or his sounding that touched her?

Hush. There are, I believe, answers to these questions. The sentence can be parsed and interpreted. But now I see that as soon as I start to ask these questions, struggling to connect subject to verb to object -- linking up appositives and backgrounding parentheticals -- reading becomes hard work. Instead, we should float on the surface of the phrasal stream: hanging before her... her opportunity to say... briefly held as in a vise... her impression... his strained smile... her deepest depths... his secret unrest. Of course. Much better.

I think I still prefer literature in which I can keep track of who did what to whom. But the only real question here is whose feelings about what are not being expressed, and the answer seems to be "everyone's feelings about everything".


Posted by Mark Liberman at 07:26 AM

February 27, 2005

Miss Gould passes

In response to an affectionate appreciation ("The Point of Miss Gould's Pencil", by Verlyn Klinkenborg, NYT 2/16/05, p. A26) of the work of Eleanor Gould Packard at The New Yorker, where she served for 54 years, Michael R. Burr (letter to NYT, 2/21/05, p. A20) elevates the magazine's "venerable arbiter of style" (Klinkenborg) to a kind of sword-wielding sainthood:

No mere proofreader or pedant, Eleanor Gould Packard was a guardian of civilization in a thankless struggle to avoid its disintegration. She upheld standards and imposed discipline, which in turn taught discipline in one's thought, and ultimately in one's actions as well.
For those of us who care about such things, Miss Gould's magnificent efforts are greatly appreciated, and she will be sorely missed.

Burr totally misses the point of Klinkenborg's appreciation (now echoed in a longer memorial by David Remnick in the 2/28/05 New Yorker, pp. 34f.) -- that what Gould was trying to do was help writers say what they were aiming for in a language with "a kind of Euclidean clarity--transparent, precise, muscular" (Remnick) -- and instead celebrates her career with ravings about the disintegration of civilization. We aim for grace and style, but somehow we get barbarians at the gates. Undisciplined barbarians, at that. Some people seem unable to think about matters of syntax, usage, logic, rhetoric, and diction except through the distorting glass of the image of the Great Decline.

Not, however, Klinkenborg and Remnick, who experienced Gould's editing first-hand.

As Klinkenborg puts it:

I learned from her neatly inscribed comments that even though I was writing correctly -- no syntactical flat tires, no grammatical fender-benders -- I was often not really listening to what I was saying. That may seem impossible to a reader who isn't a writer. But Miss Gould's great gift wasn't taking writers seriously. It was taking their words seriously.

She received the title Grammarian (a title that was retired with her), not because she was primarily concerned with grammaticality, but (presumably) because people who aren't actually grammarians use the label grammar for everything in language that is subject to regulation or judgment. She had four pet peeves, Remnick reports, two of which (failure to observe the distinction between restrictive and non-restrictive modifiers, incorrect subject-verb agreement) are matters of grammar in the narrow sense, two of which (indirection, careless repetition) are not. But it's clear from what Klinkenborg and Remnick say that her attention was almost entirely devoted to other things; after all, grammar in the narrow sense was very unlikely to be an issue in manuscripts submitted by Janet Flanner, J. D. Salinger, Pauline Kael, or Lawrence Weschler. Writers and editors valued her advice (even when they bridled at it) not because she saved them from error but because she was trying to help them realize their intentions.

I've had many experiences with editors. Some I remember with distaste even after many years; few things are quite as alarming and frustrating as an editor who comes at your manuscript like a grammar-checking program, with nothing more than a long list of Don'ts and fixes for them. But other encounters were rewarding, with editors who aimed for clarity, an effective voice, and an appreciation of the audience, and who negotiated choices and changes with me. (Most recently, Bruce Shenitz at Out magazine.) Somehow, the putative disintegration of civilization never entered into these exchanges.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:29 PM

Smart kids

You know those really smart schoolboys? Edward Cook at Ralph the Sacred River tracks them back to 1840:

I believe it was Thomas Macaulay (1800-1859) who discovered the Brilliant Schoolboy, as this quotation shows: "Every schoolboy knows who imprisoned Montezuma, and who strangled Atahualpa" (from his 1840 essay "Lord Clive").

I can do a bit better, finding them mentioned in Hugh Blair's Lectures on Rhetoric and Belles Lettres (1783), vol. I, Lecture XVII:

I spoke formerly of a Climax in sound; a Climax in sense, when well carried on, is a figure which never fails to amplify strongly. The common example of this, is that noted passage in Cicero which every schoolboy knows: "Facinus est vincire civem Romanum; scelus verberare, prope parricidium, necare; quid dicam in crucem tollere."

Well, strictly speaking, LION found them -- all I did was ask. LION also told me that in 1837, Catharine Maria Sedgwick introduced their female counterpart in chapter XV of Live and Let Live; or, Domestic Service Illustrated:

Every schoolgirl now acquires a certain facility at talking French. Mrs. Hartell was educated before
this was considered one of the necessaries of polite life, and she set an undue value upon it.

For finicky readers who insist on an exact match of words in order to register an instance of this sort of snowclone, there's chapter III of vol. I of George Gissing's 1891 novel New Grub Street:

And he hadn't even a competent acquaintance with his paltry subject. Will you credit that he twice or thrice referred to Settle's reply to "Absalom and Achitophel" by the title of "Absalom Transposed," when every schoolgirl knows that the thing was called "Achitophel Transposed"!

I have a feeling that the brilliant schoolboy ought to occur in classical literature somewhere, but the closest thing I was able to find was a passage in a letter from Cicero to his brother Quintus (3.1.1) (English translation here), where he writes

miror tibi placere me ad eam rescribere, praesertim cum illam nemo lecturus sit si ego nihil rescripsero, meam in illum pueri omnes tamquam dictata perdiscant.

(I am surprised at your saying that you think I ought to answer it [a speech of "Calventius Marius"], particularly as, while no one is likely to read that speech, unless I write an answer to it, every schoolboy learns mine against him as an exercise.)

Just a few years ago, discovering this much would have required extraordinary scholarly dedication. Today, it's the work of a few minutes with the appropriate search engines (here LION and Perseus). While this inquiry was recreational at best, it's the sort of thing that worthwhile scholarship sometimes depends on. The fact that such research is now many orders of magnitude easier, and thus accessible to many more people, is no thanks to Michael Gorman.

[Update: Mark Goodacre explores the other end of the ranking: "schoolboy errors". One of his commenters traces the concept (though not the phrase) back to Boswell.]


Posted by Mark Liberman at 07:20 AM

Save our slogans

There's the International Database of Corporate Commands. But where's the International Database of Corporate Questions ("Got milk?" "Does she or doesn't she?" "Where's the beef?" )? And the International Database of Corporate Statements ("We are driven." "It takes a tough man to make a tender chicken." "Nothing runs like a Deere."), the International Database of Corporate Exclamations ("Look, Ma, no cavities!" "Snap! Crackle! Pop!" "It's Miller time!"), and the International Database of Corporate Descriptive Fragments ("The ultimate driving machine." "The quicker picker upper." "All the news that's fit to print.") Or -- well, you get the idea. The Advertising Slogan Hall of Fame documents some famous ones, and there are other lists here and here and elsewhere, but there's a long tail of more ephemeral slogans that collectively make up a lot of the language we see and hear every day. Is anybody documenting this stuff more systematically? Maybe the Wikipedia will?

[Update: Mike McMahon points out by email that the USPTO keeps track of those slogans that are registered as one sort of intellectual property or another. But the trademark database sees these things through funny spectacles: "Got Milk?" turns up a registration by the California Fluid Milk Processor Advisory Board for use on "ORNAMENTAL NOVELTY MAGNETS, SUNGLASSES, EYEGLASS CASES, AND HOLDERS FOR COMPACT DISCS; PRINTED MATTER, NAMELY POSTERS, PRINTED PAPER SIGNS, PAPER BANNERS, BUMPER STICKERS, FOLDERS, STICKERS, NOTE PADS, PAPER TABLE CLOTHS; AND PENCILS, ERASERS AND PENS; COFFEE CUPS, PLASTIC CUPS, DRINKING GLASSES, AND INSULATED THERMAL CONTAINERS FOR FOOD OR BEVERAGES; GOLF BALLS AND STUFFED TOY ANIMALS." Although I've seen this slogan thousands of times, it was just in newspaper and magazine ads, never on any of these products for which it's been registered with the USPTO. ]

Posted by Mark Liberman at 06:52 AM

February 26, 2005

It is up to us how fast it changes

Talking (as I was) of waiting for the forces of linguistic change to take their course and surprise us all reminds me that at least one very famous philosopher thinks we have done entirely too much of this passive waiting around for linguistic change. Linguists have been content to interpret linguistic change; the point, however, is to retard it (so he might have said, though actually this perversion of Karl Marx's dictum is mine).

Sir Michael Dummett is a highly distinguished Oxford philosopher (retired since 1992). He is a very important Frege scholar, and his thinking about antirealism and intuitionism has provided some philosophers (Crispin Wright, for instance) with enough food for thought to be the basis of a substantial part of their careers. He became noted in Britain for his anti-racist activism in the 1960s. (It was been a wrenching experience for him to have spent a decade or more working on Frege's philosophical contributions only to discover at a late stage, through a suppressed fragment of Frege's diaries, that Frege had been toward the end of his life a bitter anti-Semite. Dummett wrote briefly about this in the preface to his book Frege: Philosophy of Language.) But AnalPhilosopher (February 25, 2005; thanks to Paul Postal for the reference) spotted (in a 1993 grammar and style guide that Dummett wrote for British examination candidates) a passage suggesting that on linguistic change Dummett is much more of a conservative:

There is [...] a general source of resistance to the very idea that there can be such a thing as a misspelled word, a grammatical mistake or a word used in the wrong sense. A common slogan is 'You can't stop the language from changing'. It is true enough that one should not even want the language not to change; but it is we who change it, and it is up to us how fast it changes and whether it changes for the worse or for the better. In a literate community, like our own, the language does not comprise only the words spoken in conversation or printed in newspapers: it consists also in the writings of past centuries. An effect of rapid change is that what was written only a short time ago becomes difficult to understand; such a change is of itself destructive. It cannot be helped that Chaucer presents some obstacles to present-day readers; but I have been told that philosophy students nowadays have trouble understanding the English of Hume and Berkeley, and even, sometimes, of nineteenth-century writers. That is pure loss, and a sure sign that some people's use of English is changing much too fast.

(Michael Dummett, Grammar and Style for Examination Candidates
and Others
[London: Duckworth, 1993], pp. 8-9 [italics in original].)

So remember, it's up to us. Don't go changing things too fast now; it'll be your bad if we all forget how to read Hume. Here's a piece to practice on:

It is experience only, which gives authority to human testimony; and it is the same experience, which assures us of the laws of nature. When, therefore, these two kinds of experience are contrary, we have nothing to do but subtract the one from the other, and embrace an opinion, either on one side or the other, with that assurance which arises from the remainder. But according to the principle here explained, this subtraction, with regard to all popular religions, amounts to an entire annihilation; and therefore we may establish it as a maxim, that no human testimony can have such force as to prove a miracle, and make it a just foundation for any such system of religion.

I beg the limitations here made may be remarked, when I say, that a miracle can never be proved, so as to be the foundation of a system of religion. For I own, that otherwise, there may possibly be miracles, or violations of the usual course of nature, of such a kind as to admit of proof from human testimony; though, perhaps, it will be impossible to find any such in all the records of history.

Thus, suppose, all authors, in all languages, agree, that, from the first of January, 1600, there was a total darkness over the whole earth for eight days: suppose that the tradition of this extraordinary event is still strong and lively among the people: that all travellers, who return from foreign countries, bring us accounts of the same tradition, without the least variation or contradiction: it is evident, that our present philosophers, instead of doubting the fact, ought to receive it as certain, and ought to search for the causes whence it might be derived. The decay, corruption, and dissolution of nature, is an event rendered probable by so many analogies, that any phenomenon, which seems to have a tendency towards that catastrophe, comes within the reach of human testimony, if that testimony be very extensive and uniform.

But suppose, that all the historians who treat of England, should agree, that, on the first of January, 1600, Queen Elizabeth died; that both before and after her death she was seen by her physicians and the whole court, as is usual with persons of her rank; that her successor was acknowledged and proclaimed by the parliament; and that, after being interred a month, she again appeared, resumed the throne, and governed England for three years: I must confess that I should be surprised at the concurrence of so many odd circumstances, but should not have the least inclination to believe so miraculous an event.

Got any problems with that, examination candidates? It's your own damn fault for changing your language too fast. Get a grip.

Posted by Geoffrey K. Pullum at 06:14 PM

Waiting for the forces of linguistic change

Further to my remarks on the way two troops has been coming into currency with the meaning to "two members of the armed services", John Bergmayer has informed me that the US Army appears to have decreed that in all official writings the word soldier must be capitalized (an oddly pointless typographical courtesy — how about improving veterans' health benefits instead as a way of showing them our respect?); and Keith Ivey has pointed out to me that there are a couple of hundred Google hits for people saying things like "If we had 20 000 forces coming in over the next three months" and "the allies can put more than 100,000 forces" (just try searching for "000 forces and you'll see). Again, the pace of change surprises me, and in this case, more so than with troop. The question is whether the change will ultimately lead to a new use of the singular. According to Mark, this has happened with troop: within the Army you really are called a troop once you've been trained to function as one of the troops (I didn't previously know that). But will we ever start referring to a solder — sorry, I mean a Soldier — as a force? One can only wait with interest for linguistic change to make its next move.

Posted by Geoffrey K. Pullum at 02:08 PM

Eight Days a Week

Recently, an overstressed friend of mine bemoaned the fact that there just weren't enough days in the week to get everything done. Perhaps she should move to rural West Africa, where many language groups still use their traditional eight-day week. For instance, the Yémba people of western Cameroon have an eight-day week, with the day names shown in the standard, IPA-inspired orthography on the right (consult the Yémba dictionary [PDF] for more details). The Yémba week has five days for working in the field, two market days (four days apart), and a day of rest. It's handy to have the "extra" day each week, though when I lived there I found that the mismatch between the traditional and Western weeks caused no end of confusion. This was partly ameliorated by the popular dual-system calendars and diaries published by SIL Cameroon, showing both 7- and 8-day weeks.

Posted by Steven Bird at 05:34 AM

Overgreen my bad

Caity Taylor recently noticed the expression "my bad", which is apparently rare enough in England that she's never actually heard anyone say it (!). She suggests that "if this is to become widespread, then other adjectives must be able to be used in the same way, for example, 'my good', 'his stupid' etc.". While logically sound, this reasoning seems to be empirically false, since "my bad" has been common for years in this part of the world, but I've never heard any analogous expressions in general use. An "analogous expression" would be a possessive pronoun followed by an evaluative adjective used as a noun, referring to a specific event or action.

The OED calls my bad an interjection, "U.S. colloq. (orig. Sport)". The gloss is "Esp. among high-school students, used as an exclamation acknowledging responsibility for an error: ‘(It is) my fault!’ ‘My mistake!’", and the citations start in 1986:

1986 C. WIELGUS & A. WOLFF Back-in-your-face Guide to Pick-up Basketball 226 My bad, an expression of contrition uttered after making a bad pass or missing an opponent.
1986 UNC-CH Campus Slang (Univ. North Carolina, Chapel Hill) Mar., My bad, expression to admit one has made a mistake: A: ‘You did the wrong homework set for today.’ B: ‘Oh, my bad.’
1993 Orlando Sentinel (Nexis) 3 Dec. C.3 The Litterial Green Collection... Oops, my bad... A release from the marketing hotbed of America.
1994 A. HECKERLING Clueless (Green rev. pages) 104 Cher swerves..to avoid killing a person on a bicycle. Cher: Whoops, my bad.
1997 Parenting Sept. 213 Sorry I lost your CD. It's my bad.
2000 P. BEATTY Tuff iv. 47 ‘This is the June issue of Black Enterprise.’ ‘No, not the June issue, my bad.’

I believe that I remember hearing the phrase used in basketball games in the mid-1970s, though I could be wrong. In any case, there are some much earlier precedents. For instance, Shakespeare's sonnet 112, which accomodatingly provides just the sort of additional example that Caity expects:

Your loue and pittie doth th'impression fill,
Which vulgar scandall stampt vpon my brow,
For what care I who calles me well or ill,
So you ore-greene my bad, my good alow?
You are my All the world, and I must striue,
To know my shames and praises from your tounge,
None else to me, nor I to none aliue,
That my steel'd sence or changes right or wrong,
In so profound Abisme I throw all care
Of others voyces, that my Adders sence,
To cryttick and to flatterer stopped are:
Marke how with my neglect I doe dispence.
You are so strongly in my purpose bred,
That all the world besides me thinkes y'are dead.

Of course, adjectives are often in nominal constructions ("only the brave deserve the fair"; "we are the unwilling, led by the unqualified, doing the unnecessary for the ungrateful"; "give me your tired, your poor"; "how do dolphins suckle their young?") so perhaps it's surprising that "my good" and "his stupid" have never caught on.


Posted by Mark Liberman at 12:56 AM

February 25, 2005

Psychology geek valentine

Responding to my late v-day post, Chris Brew emails that he "couldn't resist":

roses are #0000FF
violets are #FF0000
Stroop's effect
is all in your head

He added a pointer to information on the Stroop Effect.

Posted by Mark Liberman at 08:57 AM

The Log sprouts

The picture shows Language Log as visualized by Christine Sugrue's OrganicHTML, which apparently generates a pseudo-plant using "certain elements on a web site; colour, text, images, links, Flash, and primarily the table structure. This is why Flash-only sites or CSS-only sites end up generating rather sad looking plants." Too bad it doesn't get more from the words (look at the sad results from OrganicHTML-ing Language Hat). Christine's blog also led me to revisit Google Fight, where you can watch language duke it out with thought.

Posted by Mark Liberman at 08:44 AM

Language change after childhood

Until I was 21, I shared Geoff and Arnold's notion of what the word troop means. Then I got drafted, and learned different. This wasn't the most important thing that I learned in the army, but it was something that I learned early: in basic training, I was a recruit, but after that, I was a troop. It's been a long time since I even noticed uses of troop as a count noun meaning "soldier". My experience holds out hope for those who get upset about lexical innovations (or throwbacks). Though maybe there's not so much hope after all, if it takes a period of military discipline to get people to accept a new word usage after college. I can think of many new words I've learned, and new word senses as well, but it seems to be much harder to change your ideas about certain key properties of known word elements in known meanings. Perhaps the key thing is to accept membership in a culture where the new pattern is normal.

Posted by Mark Liberman at 08:44 AM

Look lively, troop!

Geoff Pullum complains about the use of the count noun troop for individual, rather than collective, referents:

Well, I heard it again today on NPR: the noun troops with a cardinal numeral. And this time with the smallest of all non-singular cardinal numerals: among the dead in one incident in Iraq today were two American troops, they said. Well, I'll tell you how that noun is in my variety of English: it's a plural-only noun that doesn't take cardinal numerals.

And in my variety of English too, but things are otherwise for other people. Some months ago, I complained on the American Dialect Society mailing list about this usage (which I too had first noticed on NPR) and was quickly informed that the count plural troops for individuals was indeed widespread. And in fact the 1993 additions to the OED have "A member of a troop of soldiers (or other servicemen)", with singular examples from 1832, 1947, and 1973.

It's actually very useful, as you'll see from the way people in the Navy, Marines, and Air Force (probably also Reservists, though I haven't actually seen this) object to soldiers as a cover term for members of the U.S. Armed Services, since they see the word as referring only to the Army. Note "or other servicemen" in the OED definition. Servicepeople, anyone?

Posted by Arnold Zwicky at 01:50 AM

Two troops killed

Well, I heard it again today on NPR: the noun troops with a cardinal numeral. And this time with the smallest of all non-singular cardinal numerals: among the dead in one incident in Iraq today were two American troops, they said. Well, I'll tell you how that noun is in my variety of English: it's a plural-only noun that doesn't take cardinal numerals. it's like alms, auspices, credentials, folks, genitals, odds, particulars, and lots of other such words listed in [34] on page 343 of The Cambridge Grammar of the English Language and discussed on pages 343-344.

I'm not prepared to say there's a mistake on anyone's part involved here, because some of these nouns show a bit of variation, but I'm surprised at the way this is catching on (I believe I have heard the usage from President Bush, which may explain the way journalists are picking it up). To me, troops is a grammatically plural way of referring to soldiers en masse (Support our troops), it's not a semantically plural version of a singular noun. For me (and NPR increasingly seems to differ), you can no more have *37 troops or *two troops than you can have *one troop — except, of course, when you're using the different lexeme troop, which means a whole group of soldiers or Boy Scouts or whatever. I can certainly accept Two troops were killed as grammatical if it means two troops of Boy Scouts, which (under the Boy Scouts of America rule that a troop must have at least five Scouts to merit its scoutmaster) implies at least ten fatalities. But if a troop of American soldiers in Iraq loses two brave men, that's two soldiers, not (mercifully) two troops. That's the way Standard English is for me. Your mileage may differ.

[Additional notes: Glen Whitman of Agoraphilia wrote about this topic last year (click here and scroll down a bit). He cites this article, in humorous mode, by ranting, tights-in-a-twist journalist Debra Lo Guercio. And Antonio Fortin has pointed out to me that the March 2003 issue of The Onion appears to take my view, giving the joke headline "Kuwait sends troop" illustrated with a lone Arab soldier in a desert.

Stephen Neale points out to me that I may be a bit hasty in saying there are two lexemes here, and I agree with him. There should be a way of saying that there is one polysemous lexeme with limitations on how you can use its singular.

Finally, Arnold Zwicky remarks that "Geoff Pullum complains...", but it seems to me that I make no complaint at all above, I merely note (admittedly, with some surprise) the increasing pace of this shift away from my own dialect on this point, and away from Standard English as The Cambridge Grammar cautiously attempted to describe it (with caveats about the possible variation). No complaints about such shifts will be heard from me (see Lo Guercio for that). I hold to the New Hampshire principle of natural languages — that a language has to live free or die. Which is not the same as the notion that anything goes, by the way. Standard English as used on NPR and in serious Anglophone newspapers has correctness conditions; and on the small point of how troop is used, those conditions seem to be in flux right now.]

Posted by Geoffrey K. Pullum at 12:58 AM

Unleash the mighty power of marketing lies!

Mr. Sun points out that Google's new movie: operator allows you to tap directly into the mother lode of blurble.

Posted by Mark Liberman at 12:30 AM

Funniest book ad of the month

... from a linguistic point of view, naturally. Found on Francis Heaney's blog:

"I am not exaggerating when I say that my book, Holy Tango of Literature, is the best book you will ever read. It's more like I'm lying than exaggerating. But it's still a pretty darn funny book, I think. Buy it, won't you?"

While you're visiting over there, check out his "six things" drawings. I'm especially fond of Green days, Why she dumped you, and Here comes the reign again. Also, Francis reports a new entry for the Fire Alarm Phrasebook: the bad news is that they have received a condition, but the good news is that the problem has been restored.

Posted by Mark Liberman at 12:29 AM

February 24, 2005

Nevertheless Hardy, however

Could it be true? Did English connective adverbs really break loose from their clause-initial moorings around 1890, and float into fashionably (post-)subjective space? Did this drift so impress Willie Strunk's youthful sensibility that he codified it as a rule of grammar? For those of you who've been on the edge of your seats since we revealed the shocking truth about Henry James' placement of nevertheless, here's another postcard from the bleeding edge of stylometry.

Thomas Hardy's novels and stories offer us a total of 2,348,934 words, published between 1871 and 1895. Across all these works, nevertheless occurs clause-initially 60 times and medially 68 times (47% initial), while however (as a connective adverb) occurs 381 times clause-initially and 436 times medially (also 47% initial!).

So the first and clearest message is that Hardy treats both of these connective adverbs in about the same way, at least as far as clause-initial proportion is concerned. (The non-initial instances of these two words have different distributions, but that's another story -- see below.) Although Hardy must have been among the authors who formed William Strunk's sense of English prose style, there is no evidence in Hardy's works, treated as a whole, to support Strunk's notorious anxiety about initial placement of however.

However, if we inspect the beginning and end of Hardy's career separately, a slightly different pattern emerges. The table below breaks things out for his first four and last four prose works separately:

Work Date Words Nevertheless initial Nevertheless medial Percent initial However initial However Medial Percent initial
Desperate Remedies
(written 1852)
Under the Greenwood Tree
A Pair of Blue Eyes
Far from the Madding Crowd
Prose 1852-1874
Wessex Tales
A Group of Noble Dames
Tess of the d'Ubervilles
Jude the Obscure
Prose 1888-1895

By the end of Hardy's career as a novelist, his proportion of clause-initial however has declined from 60% to 44%. This is a 28% relative decline (a drop of 16.6% relative to 60.1%). While not nearly so spectacular as the change in Henry James' treatment of nevertheless, which dropped from 43% clause-initial to 4% clause-initial between the first and second half of his novelistic career, it provides a little bit of additional support for the view that something was happening to English connective adverbs during the Gay Nineties.

On the other hand, Hardy's pattern for nevertheless goes in the opposite direction, from 38% initial to 50% initial. The counts are small, and the change in proportion is not enormous, but this may reflect a gradual change away from the original phrasal interpretation "never the less", which often occurs at the ends of clauses (these examples from A Pair of Blue Eyes):

But he was, in truth, like that clumsy pin-maker who made the whole pin, and who was despised by Adam Smith on that account and respected by Macaulay, much more the artist nevertheless.
Knight felt uncomfortably wet and chilled, but glowing with fervour nevertheless.

and towards a less compositional interpretation as a simple connective adverb, which tends to occur at or near clause beginnings.

As for the trend in however placement, could it be the result of authorial maturity, not the Gay Nineties after all? Perhaps after you publish a million words or so, some of your adverbs start floating. On the basis of the evidence so far, we can't distinguish the internal clock of individual stylistic evolution from the external clock of stylistic fashion. Stay tuned for more.


Posted by Mark Liberman at 07:29 AM

Ten days late for Valentine's day

But better late than never:

roses are #FF0000
violets are #0000FF
all my base
are belong to you

Posted by Mark Liberman at 07:24 AM

February 23, 2005

Vulcan phylogeny

In response to my post on jokey conspiracy theories about the history of the phrase "live long and prosper", Michael Gilleland suggests the Latin equivalent "Ad multos et faustissimos annos," which he translates as "To many and very prosperous years." The sense is the same, but the specific words are not. And in this case, the sense seems too diffuse to track. I imagine that many a cup of mead or barley wine was raised at many a neolithic celebration to express a similar sentiment, and if that sentiment alone is an indication of alien influence, we're pretty much all co-conspirators. A proper sense of historical paranoia, as Dan Brown knows, requires a more exclusive in-group.

Lance Nathan also sent in some questions and suggestions. One idea was to check out the nominal version "long life and prosperity":

It's hard to tell from a websearch, of course, the extent to which it's a phrase that isn't influenced by the Vulcans; but a casual scan of Google Groups gives it as a translation of Gaelic "Saol fada agus rath ort". 

Lance may be referencing this post on rootsweb.com, which translates "Saol fada agus rath ort" as "Long life and prosperity to you". This is a suggestive connection, and a search on LION turns up many interesting instances of the English version (which I'll spare you for now), but it's still not exact enough to convince me.

Another of Lance's ideas:

The web also suggests that "live long and prosper" was, like the hand gesture, an ad lib by Nimoy (can a gesture be ad-libbed?); and that the phrase, like the gesture, was taken from Jewish tradition. The gesture I can attest; that the phrase was spoken by rabbis I, pardon the express, take on faith. 

The web does indeed say that, most elaborately in a response by Jordan Lee Wagner on a religious Q&A site, in which he quotes from his book The Synagogue Survival Kit:

Another common decorative motif is a hand, with a wide spread between the middle and ring fingers; or a pair of such hands. This is the ancient handsign of the ko-ha-nim (priests) in the Temple. The priestly handsign is symbolic of divine immanence. The handsign was used by the kohanim when they pronounced The Priestly Blessing on the congregation. This ritual is still a part of the service, and is discussed later. This Jewish ritual has been popularized by the Star Trek TV show, which used it as the Vulcan ritual of greeting. The Vulcan ritual of greeting consists of the handsign accompanied by a blessing: "Live long and prosper," which is an abbreviated paraphrase of the original Jewish blessing.

and adds:

Birkat Kohanim (The Priestly Blessing) is also called duchening, as though there were an English verb "to duchen". This ritual is discussed extensively in the section of my book describing the structure of the Amidah. The commandment to duchen, and the text of the blessing, is found at Numbers 6:23-27. It is done during the repetition of the Amidah, just before Sim Shalom. It includes ritual handwashing of the kohanim by the levites. This is done privately after the K'dushah and before Modim -- after the kohanim have removed their shoes (or loosened their laces so as not to touch their shoes again). Then the (now shoeless) kohanim ascend the bima. There is a public benediction by the kohanim before they perform the mitzvah, the blessing itself, and accompanying meditations. All kohanim (descendents of Aharon) present in the synagogue participate in the duchening and make the "kohanic" handsign. Their taleisim are pulled way forward over their heads so as to cover their raised hands. Also, the congregation often does the same, and in any case does not look at the duchening. Ashkenazim duchen only on major holidays, while Sephardim duchen frequently. (Reform congregations are an exception. They generally do not recite The Priestly Blessing.)

The origin of Nimoy's "Vulcan" hand gesture seems securely kohanic, but the accompanying text is still elusive. Here's Numbers 6:22-27 in the King James version:

22 And the LORD spake unto Moses, saying,
23 Speak unto Aaron and unto his sons, saying, On this wise ye shall bless the children of Israel, saying unto them,
24 The LORD bless thee, and keep thee:
25 the LORD make his face shine upon thee, and be gracious unto thee:
26 the LORD lift up his countenance upon thee, and give thee peace.
27 And they shall put my name upon the children of Israel, and I will bless them.

Familiar words, and comforting ones, but no amount of faith can make me read in them the specific phrase "live long and prosper".

Lance continues:

Very possibly you knew this. It *does* raise the question, of course, of why Kerr and Burke gave the line to Rip van Winkle. Perhaps they were covert Jews, expressing their common bond with other Jews who would recognize the line and see that the Jews, like Rip van Winkle, were isolated from the culture around them because of their necessary adherence to traditions, which...or, well, not.

Indeed not. The plays' stage directions for Rip don't say anything about hand signs, and the words of his toast are not the words of the priestly blessing, so I'll conclude that this is a false trail, leading us into "Protocols of the Elders of Vulcan" territory...


Posted by Mark Liberman at 08:41 AM

Blogs disgoogled?

The river of information flows onward, and Language Log is no longer #1 for stupid ideas. That's fine with me. Last April, as I explained, "Google lists 1,260,000 pages in response to a query about stupid ideas, and and Geoff Nunberg's post about Samuel Huntington is the first of all of them". Now there are 3,940,000 hits for the same query, and Geoff Nunberg's post on Samuel Huntington is #16. Autres temps, autres bêtises.

But has there been a more serious and systematic downgrading of (some) blogs at Google? Mithras at Fables of the Reconstruction observed on 2/15/2005 that

In the past few hours, Google apparently updated its database and moved blogs way, way down the list of search results. First Atrios mentioned it and thought it strange, and then Sisyphus Shrugged found it had happened to her. That prompted me to google my own site, and it's true for me, too: this blog was the first search result for "Fables of the reconstruction" and the third (I recall) for "Mithras" until now. Now, it is the 53rd result for "Fables of the reconstruction" and does not appear at all in the first 200 results for "Mithras" (although, strangely, my profiles on other websites do show up in those results.)

Mithras' dire prediction: "No google hits, no blogs. It's that simple."

The consensus seems to be that this a reaction to the problem of comment spam, involving either a general downgrading of links in blog comments, or the implementation of the rel="nofollow" attribute for hyperlinks, or something along those lines. We're still #1 for Language Log, #2 for Dan Brown, and #3 for "Marriage Vowels", so we have apparently not been much affected yet. Some people have suggested that blogs on hosted sites like blogspot and typepad are the main victims. If this is true, then I imagine that the balance will be restored before long.

Paul Goyette at Locussolus has also noticed the effect,

[...] In my case, this blog is no longer the first hit for "locussolus" or "paul goyette" -- enter the latter right now and you won't find this page in the top 100 results. [...]

and looking beyond this (probably temporary) disturbance in cyberspace, has some more general thoughts:

Search engines, and in particular Google, are really becoming the gatekeepers for the world's information. [...]

What's so magical about the folks at Google is that even when they were tweaking their algorithm back in the early '90s, they foresaw the potential for these deep issues of speech and access. So instead of relying exclusively on content analysis, they built their model to incorporate the implicit views of the internet's readers and writers: they counted links, and they used their count to estimate a given site's authority on a particular search term, or in general. This was a profound and elegant achivement. Yes, it made search more accurate. But more than that, it codified the web's already democratic ethos -- tying search results to the actions of writers demystified search and gave content creators more power in the form of links. And while Google's new algorithm was somewhat prone to manipulation -- that's what's precipitated this whole crisis -- the very reason it could be manipulated was that it was transparent.

Today we have the blog, a phenomenon that's emerged largely because of the authority given to links and Google's transparency with respect to that authority. It's a phenomenon that takes that democratic ethos to the next level by removing virtually all the costs (financial, but also in terms of the required technical knowledge) associated with self-publication. Mix in Google's method of ranking search results, and you have a situation where millions of people have been moved to new acts of speech and are engaged in a worldwide discourse. If you hold freedom of expression dear, this is a monumental achievement.

Of course, there are also the spammers, who take the same democratizing elegance of Google's system and turn it on its head: by flooding my comment section (and yours) with links, they're able to increase their (or their client's) page rank, which means more traffic and presumably more sales.

For more on the same topic, read Jay Rosen's 2/20/2005 commentary on the second life of content:

"Frankly, they bring a lot of competencies to us. They're the leaders in search-engine optimization."

That's from an interview with Martin Nisenholtz, Senior Vice President for Digital Operations at the New York Times, who spoke with Staci Kramer of Paid Content about his company's recent acquisition of About.com for $410 million. In a conference call with stock analysts, Nisenholtz again mentioned search. He talked about "some very useful synergies such as cross marketing and search optimization expertise."

Why is the New York Times Company interested in acquiring this expertise with search engines that About.com is said to have? Ordinarily, I leave the analysis of deals to those who know the market, but the logic of this portion of the transaction intrigued me. They know how to show up in search; we don't. Let's buy them. Then we'll know too. "We own you now. Tell us what you know."

Jay argues that the secret has been right under their nose all along:

You rarely find New York Times articles in the top ten results of any Google search. The reason is simple: Search works by counting the quantity and quality of links to a page. In most cases, links to the New York Times expire after a week, the url's (web addresses) change, and the content moves behind a pay wall. Bye-bye Google. [...]

The second life of content, made possible by search, is of critical importance to journalists whose work is on the Web. (That's almost all journalists.) The very phrase "on" the Web tells us that things may land on the surface of the network and not get woven into it. These stand a very poor chance of surviving and having a second life, where there are probably more readers available than in the first.

Because NYT content is "on" the web but not "in" the web,

their work is lost to Google, lost to online forums and conversation, lost to the long tail where value is built up-- and in many ways lost to cultural memory.

This is also the reason why Open Access to the scholarly and scientific literature is, in the end, an irresistible idea.


Posted by Mark Liberman at 06:58 AM

February 22, 2005

Vulcan salutations to the Elfin Goblins of the Mountains

Greg James Robinson at the History News Network's blog Cliopatria reports a worrying discovery.

In his memoir PRESENT AT THE CREATION (p. 237), Dean Acheson—perhaps incautiously--reproduces the text of President Harry Truman’s personal letter of farewell to him, dated June 30, 1947, upon Acheson’s resignation as Undersecretary of State. The last paragraph reads:

“May you live long and prosper and may I always deserve your good will and friendship—you always have mine.”

We are required to consider the implications of Truman’s use of the phrase “live long and prosper,” which became popularly known through its use by Mr. Spock in the TV and movie series STAR TREK as the standard Vulcan expression of farewell--at least in English translation.

Applying the tools of historical scholarship, Robinson considers and rejects various benign explanations for this coincidence, and concludes that

Once these possibilities are discounted, we are left with the conclusion that Harry Truman was in contact with Vulcans and was using his letter to Acheson as code. It is undeniable that the letter to Acheson was dated THE DAY BEFORE the crash landing of an object in Roswell, New Mexico that was reported to be a Flying Saucer. Preumably then Truman was intent on telling Acheson, following his resignation, to enter in contact with the aliens and open diplomatic channels.

I'm afraid that I have grave news to impart: this conspiracy is much older and broader than Robinson imagines.

Using the LION database, I've been able to find two earlier sources for the phrase "live long and prosper".

One is a play attributed to "John Kerr", published in 1826 under the title


Early in Act I, Scene 1, "Rip enters U. E. R. with gun and game-bag, singing. He advances carelessly on R. as Rory and Vedder retire up to table and drink." Rip's first speech:

Rip. [Rip Van Winkle]
Rip! Rip! what will you make of it now? thou'rt a sad dog, and that's the truth on't. That thou art idle, there is no denying, and unlucky in everything you attempt, is a still more deplorable fact! Now did I start off this morning with full determination of being industrious, and filling my bag with game, to sell at to-morrow's market. On the road to the forest, who should I meet but Van Bryant, the one-eyed Serjeant, who insisted on my taking a glass with him; but plague take it, we drank half a dozen a piece---'twas all out of pure good nature; for had I not done so, the Serjeant would have been so drunk, that he couldn't have seen out of his other eye, and then who knows if ever he could have found his way home. Well, off I starts again, and who should I stumble over but old Dame Griskin, carrying a large basket of provisions; and I couldn't do less than help her home with her load, poor soul. She, too, invited me to drink a little drop: truly, though I only toomed out half a bottle, and the old woman drank no more than myself, yet she got so top-heavy, that I was obliged to tuck her up in bed, before taking my leave---then away I went to the mountains, and though I saw double, deuce a single bird could I shoot! Altogether, methinks, I've made a pretty day's work of it; and with the row and the rumpus that may be expected from my amiable rib at home, I shall finish the evening in the usual way! really I must make an alteration in things---I must reform, nor drink any more, saving when I'm dry. Yes, was any body now to offer me a cup of liquor, I'd say to him, in a polite manner,
[Vedder, who has in the interim advanced L. H., hands Rip a horn]
here's your good health, and your families' good healths, and may you all live long and prosper.
Vedd. [Nicholas Vedder]
Why, neighbour, we feared from your long stay, that some of the elfin goblins of the mountains had got hold of you; where the tarnashion have you been, friend Van Winkle?
Rip. [Rip Van Winkle]
Oh! very busy; had a hard day's work of it---nothing slipt through my fingers that were comeatable.
[Rory advances on R. of Rip.]
Rory. [Rory]
But they've slipt through your bag, for 'tis full of emptiness.
[having examined Rip's game-bag.]
Rip. [Rip Van Winkle]
Cut no jokes on my bag, or I shall give you the sack, nor take another glass at your house. Why, I'm the best customer you ever caught---egad, its enough to be bullied by one's wife at home, without having every pumpkin cutting capers at my expense abroad---but its all over; I'll never drink again.
Vedd. [Nicholas Vedder]
Till you're dry,
[fills Rip's cup,]
as you remarked.
Rip. [Rip Van Winkle]
Here's your good health, and your families' good healths, and may you all live long and prosper.
Vedd. [Nicholas Vedder]
And now, friend Rip, sit down and smoke a pipe, and make yourself comfortable.

Well, to make a long story shorter, the phrase "live long and prosper" recurs no fewer than twelve times in the course of the play, ending with the closing line:

Rip. [Rip Van Winkle]
Thank you, sir. Now I shall be able to set myself down, tell my stories, take my glass, and to all those who have patience to listen to my wonderful dream on the Catskill, I'll drink their health, and,
[stepping forward]
Ladies and gentlemen! here's your good health! and your families', and your future families' good health! and may all live long and prosper!

I draw your attention to the fact that this play was "PERFORMED IN THE LONDON AND AMERICAN THEATRES". If someone in 1826 had meant to send a message to observant aliens, what better method than this? And who might have wanted to send such a message? If we ask, what else happened in 1826? regular readers of this blog will be quick to answer, "the death of Thomas Jefferson". A coincidence? Perhaps.

The other "live long and prosper" citation is to another, later play, attributed to "Charles Burke (1822-1854)".


"Burke" uses the same technique as "Kerr" to slip the phrase "live long and prosper" repeatedly into his play, first here:

Rip. [Rip Van Winkle]
Rip, Rip, was is dis for a business. You are a mix nootse unt dat is a fact. Now, I started for de mountains dis mornin', determined to fill my bag mit game, but I met Von Brunt, de one eyed sergeant ---comma see hah, unt brandy-wine hapben my neiber friend; well, I couldn't refuse to take a glass mit him, unt den I tooks anoder glass, unt den I took so much as a dozen, do I drink no more as a bottle; he drink no more as I---he got so top heavy, I rolled him in de hedge to sleep a leetle, for his one eye got so crooked, he never could have seed his way straight; den I goes to de mountain, do I see double, d---d a bird could I shooted. But I stops now, I drinks no more; if anybody ask me to drink, I'll say to dem---
[Vedder comes down, R. and offers cup to him.]
---here is your go-to-hell, and your family's go-to-hell, and may you all live long and prosper.

Could the bizarre and erratic accent rendition be some sort of code?

The second repetition of the Vulcan farewell is in a more ominous context, after the "elfin goblins of the mountains" have condemned Rip to twenty years of sleep. Rip toasts them:

They're a deadly, lively, jolly set; but I wonder what kind of spirits dese spirits are drinking! surely, dere can be no harm in taking a drop along mit dem.---
[Fills a flagon.]
Here goes!---Gentlemen, here's your go-to-hells, and your broad chopped familly's, and may you all live long and prosper.
Omnes. [Omnes]
Ha, ha, ha!

The last repetition, of course, is the very last line of the play:

Rip. [Rip Van Winkle]
Is dat my baby? come here Rip, come here you dog; I am your father. What an interesting brat it is.
Knick. [Knickerbocker]
But tell us Rip, where have you hid youself for the last twenty years?
Rip. [Rip Van Winkle]
Ech woll---ech wool. I will take mine glass and tell mine strange story and drink the health of mine frients. Unt, ladies and gents, here is your goot health and your future families and may yo all live long and prosper.

There are no other instances of this simple four-word phrase in all of English poetry and drama, at least as known to LION. In particular, Washington Irving's short story of 1820, introducing the character of Rip van Winkle, does not contain the fateful Vulcan idiom. Thus it was introduced as a catchphrase into the plays of "Kerr" and "Burke". By whom, and for what audience?


Posted by Mark Liberman at 04:26 PM

Nevertheless in the gay nineties

The "gay nineties": dial telephones, practical electric power, subways, basketball. Victoria's Diamond Jubilee. The coronation of Russia's last Czar. Philadelphia's new city hall was the tallest building in the world. And something happened to adverbs.

Well, something happened to nevertheless, anyhow. At least in the writing of Henry James.

In modern writing, the connective adverb nevertheless occurs mostly in clause-initial position: in a sample of newspaper stories from Google News, 23 clause initial vs. 13 non-initial (64%); in a sample of blog writing from technorati.com, 27 clause-initial vs. 15 non-initial (64%); in a sample of MEDLINE abstracts, 28 clause-initial vs. 8 non-initial (78%). Like other adverbial words and phrases, nevertheless floats around under the joint influence of meaning, syntax and style, but it usually washes up at the start of a clause.

I looked into the placement of nevertheless and some other connective adverbs in order to compare them with however, a word that had the bad luck to attract William Strunk's attention in the early years of the 20th century (see posts here and here for the backstory). Since Strunk said nothing about nevertheless, I figured it would give a useful point of comparison. Among the other texts I checked were some of the novels of Henry James

In the end, I left the other adverbs out of my post on Strunkish however dogma. But looking over the figures I had jotted down on a napkin, I noticed an odd pattern in James' use of nevertheless:

percent initial
Watch and Ward 1871
The Europeans 1878
Daisy Miller 1879
Washington Square 1881
Portrait of a Lady 1881
The Bostonians 1886
All works checked 1871-1886
What Maisie Knew 1897
The Sacred Fount 1901
The Wings of the Dove 1902
The Ambassadors 1903
In the Cage 1908
The Reverberator 1908
The Golden Bowl 1909
All works checked 1897-1909

Even in his early work, James tends to use clause-initial nevertheless less often than the modern norm, but in the second half of his career, the proportion plummets almost to nothing. What happened?

The decade-long gap between The Bostonians and What Maisie Knew is partly due to the limited availability of digital texts for James' novels at the Victorian Literary Studies "hyper-concordance" site, but it also reflects a period (1889-1895) during which James tried unsuccessfully to establish himself as a playwright. But I don't see why trying to write plays should affect his adverb placement in novels.

Although James was born in the U.S. and ended his life in London, I don't think the change was the result of assimilation to British norms, because he took up permanent residence in London in 1876, well before Portrait of a Lady and The Bostonians.

Was this a stylistic evolution peculiar to James -- interior adverbs for interior monologue? Or does it reflect some larger linguistic (or at least literary) shift in adverbial fashion, reflected in the statistics for however that Geoff Pullum cites from the period 1897-1903, and implicated in the psychodynamics of Strunk's anxiety about that innocent word?

I agree with Geoff that the glottopsychiatry of William Strunk's usage prejudices is of limited interest (though there's some relevance to the anthropology of religious systems, just as we might learn something from delving into the reasons why Adii Ibn Mustafa prohibited eating lettuce, urinating while standing, and wearing dark blue). The nature, source and meaning of a stylistic difference between early and late Henry James will likewise be of interest mainly to specialists. But if there was really a large-scale change in adverb-placement fashions at the end of the 19th century, that would be a phenomenon worthy of serious study, as an especially accessible example of a certain kind of language change. A great deal of text of all types from that period is available (including a fair amount in digital form). And adverb placement (especially for connective adverbs) is relatively easy to study by automatic or semi-automatic means.


Posted by Mark Liberman at 07:20 AM

Fossilized prejudices about "however"

Mark establishes that William Strunk was prejudiced in favor of the word order Birds, however, can fly over the synonymous However, birds can fly. In Strunk's Horrid Little Book, the latter usage is forbidden. E. B. White revised the Horrid Little Book (which he had purchased when he took a course from Strunk at Cornell in 1919) in 1957, and kept this prohibition. I'd like to suggest that we can perhaps make a guess at where Strunk and White got their prejudices if we look at a few books that were published around the relevant time.

Strunk was born in 1869, and White thirty years later. If White ever read scary stories as a teenager, he would surely have read Bram Stoker's Dracula, first published in 1897 (when Strunk was 28 and would have very largely formed his ideas about what was good English style). We can search the text of the book for the following string to find sentence-initial occurrences of however:


And we can search for this string to get the parenthetical occurrences in second position, which White preferred:

, however,

The results: Dracula contains 79 occurrences of second-position however, and none at all of the sentence-initial ones.

A year later, in 1898, H. G. Wells's The War of the Worlds had been published. If White read any science fiction as a teenager, he surely read that. There are 10 occurrences of second-position however, and none sentence-initial.

Joseph Conrad's The Heart of Darkness came out when White was three, in 1902. It's just the sort of serious novel a young man headed for Cornell might have read, and Strunk would certainly have known it. There are 3 occurrences of second-position however, and none sentence-initial.

The next year, Jack London's The Call of the Wild was published, with 4 occurrences of second-position however, and none sentence-initial.

And so on. I won't continue; quantitative glottopsychiatric investigation of the wellsprings of curmudgeonly usage prejudices really does not interest me very much. But what I am suggesting is that if you look at works published around the time of White's birth and in the early years of his lifetime, works published when Strunk was in college and early in his teaching career, you find good statistical evidence that literary English really did favor however in second position but not first position in sentences.

Strunk, then, was simply insisting that the use of English by others ought to conform to the statistical patterns prevalent in the literature he knew. And fifty years later White was sticking to the same dogma. The grammar of however is not so simple, though: the word did sometimes occur sentence-initially in the 19th and early 20th century, as Mark's investigations showed; it just wasn't so frequent, and Strunk and White missed the subtlety of a word with two competing positional tendencies showing different frequencies.

The battle against the less frequent variant was ultimately lost, of course: in The Wall Street Journal by the late 1980s, despite the influence of the Horrid Little Book on journalists, we get about 60 second-position to 40 first-position occurrences of however. But it was a quixotic battle about nothing of any consequence — two men's desire for an utterly unimportant minor statistical detail of style concerning adverb placement in the literature they knew to stay like they once were. They had an option that most of us don't have: they could include a dogmatic injunction in a published work on how to write, a work that happened to turn into a bestseller. But it still didn't work. And they could just as well have included the opposite prescription, and perhaps have biased things the other way. This isn't about English grammar or about good writing style. It's about orneriness and crotchetiness and the petty conservativism of people who regard themselves as guardians of some sort of literary establishment but haven't really got a very good eye for syntactic generalizations.

Posted by Geoffrey K. Pullum at 12:49 AM

February 21, 2005

The evolution of disornamentation

In response to my posts on The Elements of Style and the social stratification of menus, Roger Depledge wrote:

Isn't there an anti-bourgeois aesthetic at work here? To quote Adolf Loos (1908): "Evolution der kultur ist gleichbedeutend mit dem entfernen des ornamentes aus dem gebrauchsgegenstande" [his caps] [the evolution of culture is synonymous with the removal of ornamentation from objects of everyday use]. And we have seen what ghastly buildings the mindless application of that rule has produced.

I'm sure it's not an accident that Adolf Loos wrote Ornament and Crime a few years before William Strunk advised us to "omit needless words" in The Elements of Style (first published in 1918). Nor is it just a coincidence that E.B. White rewrote and republished Strunk's pamphlet in 1958, a few years after Rudolf Flesch's Why Johnny Can't Read (1955) and Mies van der Rohe's Seagram Building (1957). There's more on the relationships among Viennese intellectuals, progressive politics, plain buildings and plain writing in a blog entry by Francis Morrone entitled The Word (and World) Made Flesch.

Some of the facts don't fit this theory, however, and the Strunkish creed about however is a prime example. Says Strunk:

However. In the meaning nevertheless, not to come first in its sentence or clause.

The roads were almost impassable. However, we at last succeeded in reaching camp. The roads were almost impassable. At last, however, we succeeded in reaching camp.

When however comes first, it means in whatever way or to whatever extent.

However you advise him, he will probably do as he thinks best.
However discouraging the prospect, he never lost heart.

This rule allies Strunk with the most elaborated varieties of High Victorian style, exemplified in American writing by Henry James, and places him squarely in opposition to the plain style of Mark Twain.

Using the wonderful Henry James Concordance (established and maintained at Nagoya University by Mitsuharu Matsuoka), I checked the placement of various types of howevers in The Ambassadors, The Bostonians, Daisy Miller, The Europeans, and The Golden Bowl. In these works, there were a total of 521 instances of however as a connective adverb. Of these, 488 (93.7%) were placed as Strunk dictates, while 33 (6.3%) were clause-initial. This means that Henry James would get red-penciled by Strunk 6.3% of the time, however-wise -- but this is nothing compared to the floods of red ink that would deface the works of Mark Twain.

I checked some pieces of Twain's writing (Life on the Mississippi, Innocents Abroad, A Connecticut Yankee in King Arthur's Court, Pudd'nhead Wilson, The Prince and the Pauper, and Tom Sawyer) at the Mark Twain Search site (kudos due to Stephen Railton and the Electonic Text Center at the University of Virginia). There were 161 instances of (the appropriate kind of) however. 52 (32%) of these were positioned according to Strunk's rule, while 109 (68%) were clause-initial. In A Connecticut Yankee in King Arthur's Court, written in the voice of Hank Morgan, a plain-spoken Yankee factory superintendent, however occurs 35 times as a connective adverb, and every single one is clause-initial.

There's a lot more to be said about where however goes when it's not clause-initial. For those of you with a taste for eXtreme adverbing, here are a few choice Jamesian samples:

"No--it's exaltation, which is a very different thing. Courage," he, however, accommodatingly threw out, "is what YOU have."
Some of the real ones, however, precisely, were what she knew.
It only meant, however, doubtless, that she was gravely and reasonably thinking--as he exactly desired to make her.

Isn't that exhilarating? OK, hang on, here's some serious fun:

It was not, fortunately, however, at last, that by persisting in pursuit one didn't arrive at regions of admirable shade: this was presumably the asylum the poor wandering woman had had in view--several wide alleys in particular, of great length, densely overarched with the climbing rose and the honeysuckle and converging in separate green vistas at a sort of umbrageous temple, an ancient rotunda, pillared and statued, niched and roofed, yet with its uncorrected antiquity, like that of everything else at Fawns, conscious hitherto of no violence from the present and no menace from the future.

It puzzles me that Strunk in effect chose Henry James over Mark Twain. Compare the way that Twain deploys however, in a characteristic passage from chapter 9 of Innocents Abroad, about a visit to Tangier:

We visited the jail and found Moorish prisoners making mats and baskets. (This thing of utilizing crime savors of civilization.) Murder is punished with death. A short time ago three murderers were taken beyond the city walls and shot. Moorish guns are not good, and neither are Moorish marksmen. In this instance they set up the poor criminals at long range, like so many targets, and practiced on them -- kept them hopping about and dodging bullets for half an hour before they managed to drive the center.

When a man steals cattle, they cut off his right hand and left leg and nail them up in the marketplace as a warning to everybody. Their surgery is not artistic. They slice around the bone a little, then break off the limb. Sometimes the patient gets well; but, as a general thing, he don't. However, the Moorish heart is stout. The Moors were always brave. These criminals undergo the fearful operation without a wince, without a tremor of any kind, without a groan! No amount of suffering can bring down the pride of a Moor or make him shame his dignity with a cry.

Perhaps Strunk's modernism -- his "ornament is crime" credo -- covers a basically Victorian sensibility?

It's less puzzling that White followed Strunk on this point, since he believed that rules of usage are as arbitrary as the rules of a game or a religion, and that examining the writing of admired authors is despised "descriptivism" and should be allowed no role whatsoever in deciding what these rules should be.

Anyhow, in modern times, it turns out that Mark Twain has won -- except among the acolytes of the Strunkish religion.

To check bloggers, I searched technorati.com for "however" and looked at the first (most recent) 42 hits (of the 1,054,678 that technorati found). Of these, 26 (62%) were clause-initial and 16 (38%) non-initial, proportions close to Twain's.

A very different pattern can be found in some publications. On the New York Times site, limiting myself to stories that have been through the NYT editorial process (as opposed to newswire copy), I found 32 non-initial instances of however, and only one lonely clause-initial case:

[link] However, 53 percent of those surveyed oppose the new stadium, which would be on the West Side and would be used for both the Jets and the Olympics.

This singleton -- no doubt the copy editor was asleep at the switch -- amounts to a mere 3% of the cases, even less than James, and suggests that the house style mandates Strunk's dictum. Among Reuters and AP stories on the NYT web site at the time I checked, there were 11 clause-initial howevers and 12 non-initial ones, for a ratio similar to what I found in a search of major English-language newspapers found via Google News -- 16 of 35 initial howevers (46%) , 19 non-initial (54%).

I also checked a sample of recent MEDLINE abstracts, and found that 18 of 34 howevers (53%) were clause-initial, as opposed to 16 non-initial instances (47%).

My guess is that the natural rate for clause-initial however in English prose is in the 60-70% range displayed by Mark Twain and the bloggers. Certain publications prescribe Strunk's rule, while other formal prose, living in the penumbra of Strunkish dogma and subject to occasional pressure from fundie copy editors, shows somewhat lower rates.

[Note: I didn't count the placement of howevers in Huckleberry Finn, in case you're wondering, because the word does not occur at all in that work. Nor does nevertheless, for that matter. Huck just mostly uses but. ]

[In case it wasn't obvious, I should confess that I don't enjoy reading Henry James very much. I find him very hard going, and once I've puzzled out what he seems to mean, I'm rarely convinced that it was worth the effort. Mark Twain, in contrast, I read again and again for the fun of it.]


Posted by Mark Liberman at 02:05 PM

February 20, 2005

Google defies Europe?

So says Jean-Noël Jeanneney, head of the Bibliothèque Nationale de France (BNF) in a Le Monde op-ed. He warns in apocalyptic terms of the "domination écrasante de l'Amérique" ("crushing domination of America"). It's not Google's domination of internet search that bothers him, but something that he thinks is much more important: Google Print and its deal with five major research libraries.

Pour l'instant, la nouvelle n'a guère attiré l'attention que des bibliothécaires et des informaticiens. Et, pourtant, je gage qu'on ne va pas tarder à en mesurer la portée culturelle, donc politique : vaste.

(For now, the news has scarcely attracted the attention of anyone other than librarians and computer scientists. And still, I believe that we will soon measure its cultural (and thus political) impact: enormous.)

This is the problem he sees:

Voici que prendrait forme, à court terme, le rêve messianique qui a été défini à la fin du siècle dernier : tous les savoirs du monde accessibles gratuitement sur la planète entière. Donc une égalité des chances enfin rétablie, grâce à la science, au profit des pays pauvres et des populations défavorisées.

(Here is taking form, in the short term, the messianic dream that was defined at the end of the last century: all the world's knowledge freely accessible all across the planet. Thus equality of opportunity finally established, thanks to science, to the benefit of poor countries and disadvantaged populations.)

But this would be a good thing, right?

Il faut pourtant y regarder de plus près. Et naissent aussitôt de lourdes préoccupations. Laissons de côté la sourde inquiétude de certains bibliothécaires préoccupés, sans trop oser le dire, à l'idée de voir se vider leurs salles de lecture [...]

(We must nevertheless look at it more closely. And immediately serious concerns emerge. Let's leave aside the mute dismay of librarians who are worried, hardly daring to say so, about the idea of seeing their reading rooms emptying [...])

Why, in the end, is this giving Europe a wedgie? Apparently because the future wretched of the earth will be learning Anglo-Saxon attitudes:

Le vrai défi est ailleurs, et il est immense. Voici que s'affirme le risque d'une domination écrasante de l'Amérique dans la définition de l'idée que les prochaines générations se feront du monde. [...] les critères du choix seront puissamment marqués (même si nous contribuons nous-mêmes, naturellement sans bouder, à ces richesses) par le regard qui est celui des Anglo-Saxons, avec ses couleurs spécifiques par rapport à la diversité des civilisations.

(The real challenge is different, and it is huge. Here we have the risk of a crushing domination by America in defining the idea that later generations will have of the world [...] the criteria of choice will be powerfully marked (even if we contribute ourselves, naturally without sulking, to these riches) by the perspective which is that of the Anglo-Saxons, with its specific coloration with respect to the diversity of civilizations.)

So what is to be done? Jeanneney complains about the relatively small scale of the BNF's virtual library efforts, which

... ne vit que de subventions de l'Etat, forcément limitées, et de nos ressources propres, difficilement et vaillamment mobilisées. Notre dépense annuelle ne s'élève qu'à un millième de celle annoncée par Google. Le combat est par trop inégal.

(.. live only on the contributions of the Government, necessarily limited, and our own resources, painfully and valiantly mobilized. Our annual expenditures are barely a thousandth of those announced by Google. The fight is not fair.)

and (you may not be shocked to learn) he suggests a "plan pluriannuel" with a "budget généreux":

Une autre politique s'impose. Et elle ne peut se déployer qu'à l'échelle de l'Europe. Une Europe décidée à n'être pas seulement un marché, mais un centre de culture rayonnante et d'influence politique sans pareille autour de la planète.

L'heure est donc à un appel solennel. Il revient aux responsables de l'Union, dans ses trois instances majeures, de réagir sans délai - car, très vite, la place étant prise, les habitudes installées, il sera trop tard pour bouger.

Un plan pluriannuel pourrait être défini et adopté dès cette année à Bruxelles. Un budget généreux devrait être assuré. C'est en avançant sur fonds publics que l'on garantira aux citoyens et aux chercheurs - pourvoyant aux dépenses nécessaires comme contribuables et non comme consommateurs - une protection contre les effets pervers d'une recherche de profit dissimulée derrière l'apparence d'un désintéressement.

(A different policy is required. And it can only be deployed at the scale of Europe. A Europe determined to be not only a a market, but a center of radiating culture and of political influence without parallel around the planet.

This is the hour for a solemn appeal. It falls to those responsible for the Union, in its three major branches, to act without delay -- for soon, the place will be taken, habits will be established, it will be to late to move.

A multi-year plan could be defined and adopted this year in Brussels. A generous budget should be assured. It's by going forward with public funds that we will guarantee to citizens and to researchers -- providing needed expenses as taxpayers and not as consumers -- protection against the perverse effects of profit-seeking hidden behind the appearance of disinterested service.)

As someone with a couple of decades of experience in negotiating information-sharing arrangements with European agencies in general, and French ones in particular, I'm enjoying a quiet chuckle at the thought of the "protection against perverse effects" that the people serving in such entities can be trusted to provide.

I think that I wish M. Jeanneney well in his campaign. An intercontinental competition to see whose library resources can be more interesting, attractive and open -- how could that be bad? (Well, since I asked: if all European digital library funding, along with various special IPR privileges, were to become the exclusive territory of an agency that is skilled in protecting its mandate, but sclerotic or incompetent in carrying it out. Could this happen? Let's say that there are precedents... It's not only in the private sector that more selfish motives can hide behind the appearance of disinterested service.)

There's an English translation of parts of Jeanneney's esssay on the Union for the Public Domain's list Upd-discuss, along with some interesting commentary.

[via Viewropa]

[Note: I wonder if French défier has the same connotation of consciously confronting a hostile force that English defy does? If so, then Jeanneney and Le Monde's headline writer are guilty of begging the question, since I see no evidence that Europe and Google now view one another in a hostile light, in advance of any future effects of Jeanneney's crusade and similar attempts to stir up resentment. If not, then perhaps a better translation would be challenge.]

[Update: Chris Waigl confirms the suspicion that défier means something closer to "challenge" in this context, as a footnote to a post (in French) full of interesting thoughts.

Steve (Language Hat) had earlier suggested the same thing via email. I expressed some uncertainty on the point, based on various dictionary entries and a small survey of usage on the web, and Steve insightfully observed that défier is often used when we'd say 'defy,' so that can be a valid translation -- but 'defy' inherently has hostile connotations lacking in the French word, which gets them from context". Given the context in this case, it seems to me that Le Monde and M. Jeanneney are floating somewhere in the interval between English "challenge" and "defy".]

[Update 3/27/2006 -- Other relevant posts:

The progress and prospects of the digital BNF (3/8/2005)
France defies Google (3/19/2005)
Eugoogle advances (3/23/2005)
Europe's response to Google to be managed by ... Microsoft? (3/26/2005)
Tomorrow was yesterday (3/27/2005)
News flash: European national libraries are willing to take EU money (4/28/2005)
Open access eh (5/2/2005)
Realistic surrealism (5/10/2005)
Anxious and pleistocene musings (8/1/2005)
Google purge (8/31/2005)


Posted by Mark Liberman at 10:29 AM

Desperate entomologists

Mark Isaak's Curiosities of Biological Nomenclature, which Bill Poser cited here last fall, made the NYT today. The article (by Henry Fountain) includes an interview with Neal Evenhuis, president of the International Commission on Zoological Nomenclature, who is said to have personally named more than 500 species of insects. Evenhuis is fond of translingual puns, including the flies Pieza rhea, Pieza pi and Pieza deristans. (His bibliography suggests that he named the genus Pieza himself -- Evenhuis, N.L. 2002. "Pieza, a new genus of microbombyliids from the New World (Diptera: Mythicomyiidae)." Zootaxa 36, 28 p. -- no doubt with humor aforethought.) Evenhuis identifies Phthiria relativitae as his "personal favorite", and says

"It's not that I'm desperate. I just have this streak of levity. Not all names have to necessarily be kind of boring."

Still, I suspect that Linnaeus might raise an eyebrow. Not to speak of Willie Strunk, who would have red-penciled in the margin of that last sentence III.12, V-"kind of" and perhaps something about the infield fly rule.

The thing that caught my eye on Isaak's index page was the link to a 1999 paper in PNAS by John Alroy, which estimates that "24-31% of currently accepted names eventually will prove invalid", an estimate that is said to be "conservative compared with one obtained by using an older, more basic method". These proportions seem surprisingly high, but the result makes sense if the cost of a mistake is low compared to the benefit of identifying an otherwise unnamed species. Anyhow, I like to see a field that is able to discover its mistakes, and willing to admit them.


Posted by Mark Liberman at 07:59 AM

February 19, 2005

A Hobbesian choice

The anonymous blogger at G as in Good, H as in Happy points out a mistake in a New York Magazine column by Kurt Anderson:

Each of us has a Hobbesian choice concerning Iraq; either we hope for the vindication of Bush’s risky, very possibly reckless policy, or we are in a de facto alliance with the killers of American soldiers and Iraqi civilians. [emphasis added]

As GH observes, Anderson almost certainly meant a Hobson's choice rather than a "Hobbesian choice". GH wonders whether Anderson's dilemma is strictly a Hobson's choice, which has traditionally been taken (as the OED puts it) to mean "the option of taking the one thing offered or nothing".

Her full measure of scorn, however, is reserved for the substitution of Hobbesian:

How on earth can Hobbes, author of Leviathan and other works that intended to show the English the dangers of democracy and demonstrated the need for absolute sovereignty in society, be the reference here?

Perhaps Anderson was thinking of Hobbes' observation about the inadequacy of unilateral action to yield a state of peace:

For as long as every man holdeth this right, of doing anything he liketh; so long are all men in the condition of war. But if other men will not lay down their right, as well as he, then there is no reason for anyone to divest himself of his: for that were to expose himself to prey, which no man is bound to, rather than to dispose himself to peace.

Or perhaps Anderson had some vaguer notion that Hobbes described individuals as powerless to reach any just outcome at all, facing a set of unattractive choices because of "the ill condition which man by mere nature is actually placed in".

But here we're discussing whether Anderson's substitution is a classical malapropism, a mere "humorous confusion of words that sound vaguely similar", or an eggcorn, where the substitution represents a plausible but historically incorrect re-analysis of a familiar word or phrase.

The key linguistic point is that Hobson's blocks Hobbesian here. Even if there is a valid and coherent reason for Anderson to see his choice as a "Hobbesian choice", he can't use that phrase without taking literate readers aback, and leading some of them to make fun of him. Unless, of course, he can convince them that the whole thing was a clever pun all along.

In April of 2003, Peter Wood at NRO discussed a lawyer's use of the same phrase in oral arguments before the U.S. Supreme Court, in the (affirmative action) case of Gratz v. Bollinger. Wood writes that

I understand that lawyers arguing before the Supreme Court have only a few precious minutes, and many of those are taken up with answering questions. Thus they have to give pithy answers and sometimes depend on sly allusions that may convey a lot to the Court but can puzzle laymen. Mr. Payton appears to be a master at this kind of coded communication.

And therefore he may have intended to combine this erudite reference to Thomas Hobbes with his comic-strip namesake, the skeptical plush toy tiger whose shares the adventures of Calvin, the six-year-old boy with the hyperactive imagination. In the strip, the dubious Hobbes repeatedly gets into the transmorgifier with Calvin knowing it will propel them into mischief and disaster. The Hobbesian choice, in this context, is to assume unwanted adult responsibility -- and I can understand why Mr. Payton would object to that. He would rather transmogrify unqualified kids into college students.

As I said, if you use a phrase like that, people are going to make fun of you. However, it seems to me that Peter Wood is not being fair and balanced here. In order to use the Hobbesian slip to mock advocates of affirmative action, Wood attributes it to John Payton, who argued the case on behalf of the University of Michigan. And indeed Payton used the phrase first, according to transcript at www.oyez.com:

JUSTICE SCALIA: You don't have to be the great college that you are. You could be a lesser college if that value is important enough to you.

MR. PAYTON: I think uh that decision uh which would say that we have to choose uh would be a Hobbesean [Hobson's] choice here.

But Kirk Kolbo, who argued the case on behalf of Jennifer Gratz and Patrick Hamacker, takes the Hobbesian ball and runs with it in his closing remarks, using the phrase six times in his final 100 words:

With respect to the Hobbesean [Hobson's] choice that Mr. Payton has talked about, they have resolved a different Hobbesean [Hobson's] choice. The University has decided that they are willing to lower their academic standards to get their critical mass.

They've resolve that Hobbesean [Hobson's] choice that way. But they've resolved the other Hobbesean [Hobson's choice] how to get those objectives and stay selective, they've resolved that Hobbesean [Hobson's] choice on the backs of the constitutional rights of individuals like Jennifer Gratz and Patrick Hamacker. They are the ones who are paying for the Hobbesean [Hobson's] choice that the university has resolved by the use of a two-track admissions system.

Wood says that "diversity's defenders came across as stridently self-righteous and pretty sloppy about the details. Mr. Payton's aversion to making a 'Hobbesian choice' captures that perfectly." But Kolbo's repeated accomodation to Payton's error was hardly a model of terminological precision. It reminded me of Pat Buchanan's accommodation to Ali G's talk about Saddam Hussein's "weapons of mass destruction or whatever, or as they is called, BLTs".

In any event, it seems that there is a sporadic tendency to use Hobbesian choice to mean a choice among flawed alternatives, especially where basic principles come into conflict. I'd call this a "citational eggcorn".

[Update: John Lawler points out by email that if I had looked a little further down Google's list of hits for {"Hobbesian choice"}, I would have found at #17 an explicit explanation of the folk etymology behind the eggcorn:

George's current dilemma is a classic Hobbesian choice, which is no choice at all, the name of which derives from Thomas Hobbes' belief that man must choose between living in a state of nature (a life which is "solitary, poor, nasty, brutish, and short") or suffering under an arbitrary and absolute government.

In fairness to Hobbes, he does hold out the hope that collective action can establish the rule of law, and therefore the possibility of justice. Anyhow, this reference adds plausibility to the idea that other authors of Hobbesian slips are making a similar semantic reanalysis.

It's a curious coincidence, by the way, that Thomas Hobbes' Leviathan was published in 1660, the very year of the OED's first citation for "Hobson's choice":

1660 S. FISHER Rusticks Alarm Wks. (1679) 128 If in this Case there be no other (as the Proverb is) then Hobson's choice..which is, chuse whether you will have this or none.


[Update #2: Chris Waigl has additional information and interesting thoughts over at serendipity, including a link to an ADS-L posting by Arnold Zwicky, and a discussion of the French phrase choix cornélien.]

[Update 3/6/2005: Anderson says it was a clever pun all along. ]


Posted by Mark Liberman at 07:57 AM

No smooth ride is as valuable as a rough ride

Several readers have written to defend E.B. White, and I feel bad about overstating his adjective content, so I've invited him here to speak on his own behalf.

(From a letter to J.G. Case, White's editor at Macmillan for The Elements of Style, dated 17 December 1958.)

I was busy working a lot of your and Miss N's suggestions into the text, wherever we were in agreement, when your letter came along (December 12th date) and stopped me cold. Do you remember that wonderful moment in the McCarthy hearings when Mr. Welch turned to Mr. Cohn and in his high, friendly voice asked, "And now, Mr. Cohn, when you found that one-third of the photograph was missing, were you saddened?" (Such a wonderful verb for little Mr. Cohn.) Anyway, I was saddened by your letter -- the flagging spirit, the moistened finger in the wind, the examination of entrails, and the fear of little men. I don't know whether Macmillan is running scared or not, but I do know that this book is the work of a dead precisionist and a half-dead disciple of his, and that it has got to stay that way. I have been sympathetic all along with your qualms about "The Elements of Style," but I know that I cannot, and will-shall not, attempt to adjust the unadjustable Mr. Strunk to the modern liberal of the English Department, the anything-goes fellow. Your letter expresses contempt for this fellow, but on the other hand you seem to want his vote. I am against him, temperamentally and because I have seen the work of his disciples, and I say the hell with him. If the White-Strunk opus has any virtue, any hope of circulation, it lies in our keeping its edges sharp and clear, not in rounding them off cleverly.

In your letter you are asking me to soften up just a bit, in the hope of picking up some support from the Happiness Boys, or, as you call them, the descriptivists. (I can write you an essay on like-as, and maybe that is the answer to all this; but softness is not.) I am used to being edited, I like being edited, and I have had the good luck and the pleasure of being edited by some of the best of them; but I have never been edited for wind direction, and will not be now. Either Macmillan takes Strunk and me in our bare skins, or I want out. I feel a terrible responsibility in this project, and it is making me jumpy. I ask your forgiveness and your indulgence.

The above, written by the below, are, of course, fighting words, and will, I am sure, bring you out of your corner swinging. But I think it is best that I get them down on paper. I want to get back to work, make progress, and make a good book; and until we get this basic thing straightened out, there isn't much chance. It is ghostly work, at best; and surrounded as I have been lately by a corps of helpers, all of them trying to set me on the right path, it is unnerving work [footnote: Case had commissioned three or four grammarians well versed in the textbook field to submit suggestions to White]. Your letter did unsettle me on a number of counts.

All this leads inevitably to like-as, different than, and the others. I will let them lay for the moment, sufficient unto this day being the etc. My single purpose is to be faithful to Strunk as of 1958, reliable, holding the line, and maybe even selling some copies to English Departments that collect oddities and curios. To me no cause is lost, no level the right level, no smooth ride as valuable as a rough ride, no like interchangeable with as, and no ball game anything but chaotic if it lacks a mound, a box, bases, and foul lines. That's what Strunk was about, that's what I am about, and that (I hope) is what the book is about. Any attempt to tamper with this prickly design will get nobody nowhere fast.


E.B. White

P.S. When I said, above, that Macmillan would have to take me in my bare skin, I really meant my bare as.

Lane Greene, "trying to channel a dead man", wrote to me yesterday that

I bet that if you could get a few drinks into EB White, he'd defend himself against the charge of hypocrisy with this: S & W is intended for bad writers and young ones, not for talented adult ones like himself. Its strictures, including its style strictures, are simple and unbending because that's what non-professionals understand. One characteristic of immature writing is definitely needless words, and to get college freshmen to churn out a decent 3-point, 5-paragraph essay making a tight argument, it's probably good advice. On the other hand, a good writer knows just when to be wordy and when to be sparse. That kind of writer, I'll bet White would admit, is not going to blindly take (and should not blindly take) stylistic prescriptions from Strunk and White...

I responded that I object much less to the book, which is a fine specimen of the genus of idiosyncratic usage rants, than to its reception as the foundational scripture of a strange stylistic fundamentalism.

A few weeks ago, I met a lawyer who tried to enlist me in support of his crusade to get the other members of his firm to banish sentences beginning with however-in-the sense-of-nevertheless. He was convinced that this must be a valid principle, since he had once read it in Strunk and White, but when his colleagues asked him for some better argument than an appeal to that authority, he came up blank. On learning that I'm a linguist, he eagerly begged me to reveal to him the reasons behind this rule, trusting naively that they must exist.

No, I told him, as far as I know it's just something that Willie Strunk made up. Classic English essays are full of examples like this one from Edmund Burke: "However, they did not think such bold changes within their commission." If you want to treat writing as a game that is more fun for those who obey Stunk's whims, suit yourself; but it won't do to pretend that either logic or custom compels you.

Just for the record, in the passage from White's letter quoted above, I count 47 adjectives in 646 of White's words, or 7.3%. This is a mere 22% more than the norm for English prose as a whole, and about 9% less than the norm for academic writing. Fair is fair.


Posted by Mark Liberman at 06:46 AM

February 18, 2005

Abu what again?

Tim Buckwalter forwarded to me his email exchange with Omar Johnstone on further subtleties of the etymology of Abu Ghurayb/Ghraib/Ghreib (see here for the backstory). The bottom line, for me at least, is that three native speakers of Arabic, asked by Tim, "all agreed that it [i.e. 'ghurayb/ghreib'] means "small gharb (small something relating to the west or place where the sun sets", and not "small ghuraab" (small crow). None of them were Iraqis, however, and one held out the possibility that things might be different in Iraqi Arabic.

Below the jump, Omar and Tim discuss the matter in spectacularly philological detail.

First Omar Johnstone:

You say "the dimunitive of ghuraab "crow" ought to be ghurayyib, suggesting that those who offered the "small crow" analysis for Abu Ghurayb (or Abu Ghraib, etc.) must be mistaken."

Wright, in his extensive discussion of the diminutive which begins on i 166 B gives numerous examples that would appear to contradict your case. Of particular interest is Sadiiq / Sudayyiq (i 166 D) [ghariib] and ghulaam / ghulayyim (i 167 A)[ghuraab]. While Wright does not transliterate his examples, they do in both cases indicate a shaddah over the yeh following the second radical. The absence of a shaddah for the first exmaple would confirm your contention that the diminutive of /gharib/ ought to be /ghuraib/ rather than /ghurayyib/. However, if Wright is correct, the diminutives of ghariib and ghuraab are identical in spelling.

Dozy, in his Supplement aux Dictionnaries Arabes (Librarie du Liban, 1881, 1991) offers /ghuraib aS-SaHrah/ with a sukkun over the yeh. He glosses this as pyrrhocorax graculus, or alpine raven and cites a Western source as his authority. To this he adds, "Peut-etre la forme correct et-elle /ghurayyib/ [shaddah over the yeh], dimin. de /ghuraab/". Here, he seems to agree with Wright on the correct spelling of diminutive forms. Dozy's incorrect spelling may reflect the pronunciation of diminutives in Arabic. I live in Riyadh, and throughout the Abu Ghuraib business, I do not recall ever hearing this name pronounced /ghurayyib/, nor, indeed, have I ever noticed the diminutive ending /-ayyib/ in any other word; granted, dininutives are not commonly heard. This complex vowel coming at the end of a word would only be distinct if the stress were to fall on it - and even then, this may only result in a lengthening of the vowel rather than the introduction of another.

Islamists and other linguistic purists often do stress the final syallable of derived masculine adjective forms, saying things like /islamii'(ii)/ and /Bayhaqii'(ii)/. This sounds slightly ridiculous. I suppose it is a literary conceit deliberately contrived to convince us of their authority in all things or as a shibboleth for mutual recognition amongst the rigidly righteous, but I do not recall hearing /Abu ghurayyib/, even from them.

Tim Buckwalter's response:

Thanks for all the citations and examples from Wright and Dozy. In my second e-mail to Mark I mentioned that words with the pattern CvCvvC take the CuCayyiC diminutive pattern, so this explains why "ghuraab" and "ghariib" map to the same diminutive form, "ghurayyib." Yesterday I asked three native informants (two North African, one Sudanese, all women) for the meaning of "ghurayyib" and they immediately responded "small ghariib." I asked them what they thought about "small ghuraab" and they immediately changed their first opinion! After much discussion they concluded that "ghurayyib" is not used (maybe because of the ambiguity?). As for "ghurayb/ghreib" they all agreed that it means "small gharb (small something relating to the west or place where the sun sets?)" although one of them said that perhaps Iraqi Arabic follows a different derivational rule for diminutives.

The spelling of CuCayyiC words without diacritics can lead to misreadings such as ghurayb for ghurayyib (both spelled gh-r-y-b), as you found in Dozy's work. Other homographs have known to cause problems even for native speakers. I first heard of Abu Ghreib from the western media, so when I first saw the name in the Arabic media I knew it was ghurayb and not ghurayyib. For what it's worth, the MSA lexicon doesn't have many CuCayyiC words. I found only these:

  `uqayyib  = small eagle (dim. of `uqaab)
  kutayyib = booklet  (dim. of kitaab)
  'uwayyil = proton (dim. of 'awwal, which is CaCCaC)
  Huwayyin = small animal (dim. of Hayawaan, which also does not follow the CvCvvC pattern!)

And from the dialects:

   kuwayyis = good, nice
   sughayyar (possibly a modification of sughayyir?) = small 

Also, if I understood you correctly, your argument is that Abu Ghreib is very likely Abu Ghurayyib in Iraqi pronunciation? I'm familiar with the /'islaamii/ vs. /'islaamiyy/ issue you mentioned, but I think that's a reflection of the pausal form pronunciation of case ending (i.e., in formal situations you can change word final -ii to -iyy to signal that your using case endings, albeit in pausal form. I'll check the Clive Holes book to see if he says anything about that...


Posted by Mark Liberman at 09:34 PM

E.B. White: now with 60% more adjectives

In Geoff Pullum's first Language Log post on adjective use, he quoted Doug Biber's finding that about 6% of the words in English prose will be adjectives "whether you write novels or news stories, whether they're good or bad", with "academic prose" rising to an average of about 8% adjectives.

Since then, we've documented instances of fine writing at adjective rates of 15% and even 40%. Still, E.B. White's own adjective content of 13% is apparently more than double the English prose average, and about 60% higher than typical academic texts. [Update: Geoff points out to me that I read his post carelessly, and was therefore unfair to White: the 13% count includes adverbs as well as adjectives, and his adjective proportion is merely 8%, or about the same as typical academic prose. I therefore apologize to White's memory, and to any readers who have been misled. I think, though, that the basic point stands: White uses modifiers at a similar rate to everyone else.]

Does this mean that Strunk and White invented their anti-adjective animus out of free-floating intellectual crankiness, with no connection at all to the stylistic properties of actual texts? It's possible. However, an investigation of menu sociolinguistics led me last summer to another hypothesis: "the impulse to pile up fancy words and extra modifiers, and the admonition to write simply and avoid adjectives, are both expressions of the same social anxieties, seen from slightly different places on the social scale". On this view, Strunk and White's little book is not a manual of prose style, it's a self-help book for social climbers.

Posted by Mark Liberman at 07:58 AM

The blowing of Strunk and White's rules off

One additional word on Mark's bedtime reading ruminations, which are on their own a magnificent brief for the prosecution concerning the charges against E. B. White of being a linguistic hypocrite. One of the sternest strictures delivered in Strunk & White's stupid little book is the prohibition on the use of adjectives and adverbs. Simply do not use them, they say: "Write with nouns and verbs, not with adjectives and adverbs" (The Elements of Style, p. 71). Now, Mark happens to quote exactly 406 words from the book of White's essays that he fell asleep over. I have been over those 406 words and carefully identified the adjectives and adverbs. To be scrupulously fair to White, I omitted the New that occurs in every occurrence of The New Yorker, and I did not count items that would traditionally be classified as adjectives or adverbs where The Cambridge Grammar provides evidence that those classifications are wrong. Despite this lowering of the count (full details on request), there are 52 adjective and adverb tokens in White's 406 words. That's almost 13 percent of the total word count (the adjectives alone make up about 8 percent of the word tokens). As I have said before (and it has made many people quite edgy), it is not just that Strunk & White offer crappy usage advice; it's that they demonstrate that their advice is crappy whenever they write, because they are utterly unable to follow their own rules, even on a bet. And as Mark says, nor should they. White isn't at all a bad writer. But the dimwitted ukases that his book with Strunk promulgates have nothing to do with good writing or elegant style.

Posted by Geoffrey K. Pullum at 01:18 AM

The blowing of each other up

Elwyn Brooks White expanded William Strunk's original pamphlet to produce The Elements of Style, which Geoff Pullum has called "a horrid little compendium of unmotivated prejudices (don't use ongoing), arbitrary stipulations (don't begin a sentence with however), and fatuous advice ("Be clear"), ridiculously out of date in its positions on appropriate choices among grammatical variants, deeply suspect in its style advice and grotesquely wrong in most of the grammatical advice it gives".

Late last night, I chose at random from a pile of old and unread books, and found myself drifting off to sleep over The Wild Flag, which turned out not to be the exciting war novel that I thought it might be, but rather a 1946 collection of White's editorials "on Federal World Government and Other Matters". These pieces are vivid and readable, though White's political beliefs have not held up well -- his theme is that the solution to humanity's problems is world government, to be achieved through expansion of the power of the United Nations. Since the political content is so thin, not to say silly, I found myself interpreting his conceits in reference to his ideas about prose style instead of his ideas about world politics.

White announces himself in favor of irrational and incoherent opinions:

Most publications, I think, make rather hard demands on their editorial writers, asking them to be consistent and sensible. The New Yorker has never suggested anything of the sort, and thus has greatly eased a writer's burden — for it is easier to say what you think if you don't feel obliged to follow a green arrow. The New Yorker is both aloof and friendly toward its opinionated contributors, and I am grateful for this. I am reasonably sure that if some trusty around the place were to submit an editorial demanding that the George Washington Bridge be moved sixty feet further upstream and thatched with straw, the editors would publish it, no questions asked.

You can see from this why he felt comfortable with the unmotivated prejudices, arbitrary stipulations and fatuous advice that Geoff complains about. You can also see that White was fond of the "needless words" that he and Strunk advised everyone else to omit: "I think", "rather", "reasonably" and so on. It's clear that he likes the sound and rhythm of his own voice, and sometimes throws in a few extra words just for the sheer resonant fun of it.

He also exhibits periodic flashes of unprovoked aggression:

An editorial writer refers to himself as 'we,' but is never sure who the other half of the 'we' is. I have yet to encounter the other half of 'we,' but expect to nail him in an alley some day and beat his brains out, to see what sort of stuffing is behind such omniscience.

"Nail him in an alley some day and beat his brains out?" This is himself whose skull he's promising to crush, mind you. Doppleganger-bashing or otherwise, the cranky, abusive tone that Geoff detected in Strunk and White is often in evidence in these little editorials.

Along with the resonant redundancies, almost every paragraph exhibits some bizarre image or some bit of grammatical whimsy. White wields the English language like an elderly gardener using his 9-iron as a weed wacker:

Nationalism is young and strong, and has already run into bad trouble. [As opposed to "good trouble"?] We take pains to educate our children at an early age in the rituals and mysteries of the nation, infusing national feeling into them in place of the universal feeling which is their birthright; but lately the most conspicuous activity of nations has been the blowing of each other up, and an observant child might reasonably ask whether he is pledging allegiance to a flag or to a shroud.

"The blowing of each other up"? Cool, I thought to myself as I drifted off, but is that actually English? I was reminded of the time a four-year-old of my acquaintance referred to the remains of a local fire as the "the burned house down". White's phrase is less eccentric, but it still rates a boggle. And "pledging allegiance to a flag or to a shroud"? Strong stuff, but has it really ever occurred to any child to ask that?

That brings us halfway through the second paragraph on the third page of the preface, which is roughly where I fell asleep. However, as I read through the rest of the book this morning over coffee, other oddities leaped off nearly every page:

A world government, were we ever to get one, would impose on the individual the curious burden of taking the entire globe to his bosom — although not in any sense depriving him of the love of his front yard. The special feeling of an Englishman for a stream in Devonshire or a lane in Kent would have to run parallel to his pride in Athens and his insane love of Jersey City.

Isn't "taking the entire globe to his bosom" one of those hackneyed metaphors we're supposed to avoid? And trust a New Yorker to suggest that affection for New Jersey is evidence of insanity.

Nations are less candid than children, and their state departments have a less good prose style, ...

And humans in the state of nature are all honest and kind, and Dorothy Parker is Marie of Roumania.

Behold the platform speaker! He grasps the microphone as coolly as though it were a broom handle in his mother's kitchen and warns you (a thousand miles away) to beware of fantastic schemes. Standing there, speaking in a natural tone of voice, he is of the very nature of fantasy. His words leap across rivers and mountains, but his thoughts are still only six inches long.

"His thoughts are still only six inches long"? That's a short damn broom handle, Elwyn.

Now, I certainly don't mean to suggest that E.B. White is a bad writer. There are not many writers who could have kept me going through 188 pages on World Government, even on behalf of a Language Log post. But it seems to me that he's a good writer in spite of his own stylistic advice, not because of it.


Posted by Mark Liberman at 12:01 AM

February 17, 2005


On Language Log we spend quite a bit of our time analyzing one or another kind of bullshit about language. It is good to know that emeritus Princeton philosopher Harry Frankfurt has now provided us with a book-length analysis of the meaning of this important technical term. Frankfurt's On Bullshit argues that falseness of the bovine excremental variety is actually a greater danger to intellectual discourse than lies. For an article that provides a sampling of the book, click here (the latter link via Paul Postal).

Posted by Geoffrey K. Pullum at 02:56 PM

School of squander

According to a story in this morning's Philadelphia Inquirer, Bill Clement had this to say about the cancellation of the NHL season:

"It is such a day of squander and a day of waste that anybody involved in both sides should be ashamed of who they are right now."

When I read this, I thought that Clement was creatively nouning a verb, but it turns out that the nominal form of squander is well established. The OED has citations from 1709

1709 MRS. MANLEY Secret Mem. (1736) I. 27 Will he one Day set it all at Stake upon a Royal Cast, an Imperial Squander? Or descend to his Grave, choak'd with greediness of Gain?

It's easy to check a case of word usage like this, to see whether an example that strikes me as odd is an innovation or just something that I've missed in my experience of the language so far. It's harder to check something like Clement's use of anybody and both.

When he says "anybody involved in both sides", Clement clearly means that all the participants, regardless of which side they're on, should be ashamed of their fatal unwillingness to compromise. He's not slamming fence-sitters or double agents -- he's not even suggesting that any members of these categories exist. The negotiation between the NHL owners and the players' union has been a polarizing dispute, and if there is any individual who's consequentially involved with both sides at once, he's keeping a low profile.

However, when I read this, I first interpreted "anybody involved on both sides" as referring to people with split allegiance. [Yes, I know, he said "in both sides, but one oddity at a time, please...]

So the question is, did Clement make a mistake in saying this? Or did I make a mistake in understanding it? Or do we speak slightly different dialects of English? As in the case of the AHD's definition of patriot, this is not an easy kind of question to answer. In general, people's judgments about what they could or couldn't say or write, and what things can and can't mean, are not a reliable predictor of what how they actually speak, write and understand. This is partly because of the influence of perceived norms -- people tend to underestimate their use of stigmatized variants like "g-dropping" and cluster simplification, for instance -- but there is also apparently an influence of what we might call "mind set". Sometimes you just get stuck in a certain way of interpreting an ambiguous expression, especially if you're operating in an analytic mode.

Research 30-odd years ago on "quantifier dialects" found this pattern (as I recall) in the interpretation of (sentences like) "All the arrows didn't hit the target". There are two meanings, in this case roughly "all the arrows missed" and "it's not the case that all the arrows hit". Some people can understand such sentences either way, but others are quite sure that only the first meaning is possible, while others are equally strong partisans of the second meaning. The curious thing is that these judgments, including the partisan ones, are apparently unstable over time. If you test people again after a suitable lapse of time, some of the partisans of one reading have turned into partisans of the other one.

As far as I can tell, neither prescriptive norms nor random mind-set is influencing my interpretation of "anybody involved on both sides". But maybe I'm wrong.

It's certainly easy to find other examples, out on the web, of Clement-style interpretations of "any ... both":

They couldn't track it throughout the family. They couldn't remember anybody on both sides that ever had it.
Here we are a year later and it is evident to anybody on both sides that they lied to invade Iraq.
Most anybody on both sides of this debate can copy and paste articles by others.
And I'd like to apologise to anybody on both sides for the 1000 charicter limit, that's beyond my control, that's a Haloscan policy.

In the first cited example, for instance, there's no one (absent incest) who can be "on both sides" of a family. In the second example, I'm sure that the writer was not talking about people who flip-flopped on the war, and thus were (as individuals) somehow "on both sides" of the relevant dispute.

There's some (weak) evidence from relative frequency that many people agree with me about the interpretation of "any ... both"

  both sides either side
anybody on __
anyone on __
everybody on __
everyone on __

But I'm still uncertain how to distinguish between the "production error" and "dialect variation" theories about the people whose behavior indicates that they disagree.


Posted by Mark Liberman at 12:55 PM

February 16, 2005

Southern accent reduction courses cropping up from Texas to Kentucky?

A few days ago, David Donnell sent a link to an article by Noelle Landers in the Collegiate Times about "Southern accent reduction courses cropping up from Texas to Kentucky". These courses are apparently billed as a service to actors whose "accent might hold them back", but the story's author suggests that they are really aimed at upwardly mobile southerners who want to distance themselves from their regional roots.

We Language Loggers have been appropriately scornful in the past about people who think that people from the American south have "lazy mouths", or "sound as if you just woke [them] up", or signal low status and lack of intelligence by using regional pronunciations, words and syntactic constructions. It's been a while since Shaw had his phonetician Henry Higgins respond to Liza Doolittle's cockney "oh" by saying

A woman who utters such depressing and disgusting sounds has no right to be anywhere—no right to live. Remember that you are a human being with a soul and the divine gift of articulate speech: that your native language is the language of Shakespear and Milton and The Bible; and dont sit there crooning like a bilious pigeon.

adding to his upper-class male colleague Colonel Pickering

You see this creature with her kerbstone English: the English that will keep her in the gutter to the end of her days. Well, sir, in three months I could pass that girl off as a duchess at an ambassador's garden party. I could even get her a place as lady's maid or shop assistant, which requires better English. Thats the sort of thing I do for commercial millionaires. And on the profits of it I do genuine scientific work in phonetics, and a little as a poet on Miltonic lines.

We linguists don't do that anymore, in general. And as I explained in a post last summer, there's something heart-warmingly American about the (completely incoherent) view that a regional accent is a condition like obesity or a swollen ankle, which can be "reduced" to get closer to a "normal" way of talking:

In the U.S., the traditionally standard radio or television voice is perceived as being maximally bleached of all marked characteristics ("having no accent"). Linguistically this is nonsense, of course, but it does reflect a democratic set of values, in which the desired reference value is viewed as being at the middle or zero point of the descriptive space, rather than being at one extreme corner.

As I understand it, traditional BBC English, in contrast, is perceived by most people as being a marked value.

However, I'm not sure about the facts behind Noelle Landers' story. There's only one specific course cited:

Martin Childers, managing director of the Jenny Wiley Theatre in Prestonsburg, Ky., said his theater offers a course for middle and high school students who want to reduce their accent for acting purposes.

“We have very talented local actors who have the ability to work outside of this area, but their accent might hold them back,” he said. “A lot of people have to wait until they’re older to (take a course like this), so we’re offering it to them now.”

Although the original course was designed with teenagers in mind, Childers said that many adults have taken an interest as well and not for acting purposes.

“We’ve had quite a few requests from people who are not actors,” he said. “We’re going to extend the course to adults as well, because we’ve had so many requests for it.”

The main purpose of this course is to increase actors’ or professionals’ marketability in a world that might judge them on the way they speak, Childers said.

But the Jenny Wiley Theatre's educational outreach page doesn't mention any accent reduction courses..

The phrase {"southern accent reduction"} get no hits on Google. Looking for {"accent reduction"} gets 33,200, but of the top-ranked ten sites, only one is (barely) below the Mason-Dixon Line (McLouth Kansas, La Jolla CA, Plymouth MA, Carlsbad CA, Towson MD, Ann Arbor MI, Lyndonville VT, Pullman WA, Los Angeles CA, Royal Oak MI). This doesn't look like a specifically southern-states thing to me.

[Update: the starting point here seems to be this 2/4/2005 CNN story, based on an interview with Martin Childers of the Jenny Wiley Theatre in Prestonburg KY about "a new class that seeks to teach youngsters how to lose their Appalachian accents". There's no indication in the CNN story about other such classes springing up across the South; if I find out what Noelle Landers bases that claim on, I'll add the information here. ]


Posted by Mark Liberman at 05:32 PM

February 15, 2005

Eggcorn database

Chris Waigl of Serendipity has announced a database full of eggcorns. You can search, browse, and read about the history of the term and of her collection. It's a good idea and an elegant implementation.

And Kelly wrote in this morning with another contribution: parody for parity, in talk about sports leagues.

"Where is the parody? NFL playoff teams pretty much the same as last year."
""I've heard a lot of different sports anchors, jouralists and reporters pose the question: which is better for sports, parody througout the league or dominance by one or a few teams?"

This one might better be considered a misspelling type of classical malapropism, though, since the substituted word doesn't seem to have a semantic justification, nor is there a re-analysis of part of a word or phrase. Chris' about page explains why the term "eggcorn" was invented in the first place, and what some of the alternative kinds of lexical substitutions are.

[Update: Linda Seebach writes in with a link to a post by Erin O'Connor entitled "Malapropisms and other fun things". As Linda observes, many of the fun things that Erin describes (in a quotation from Susanna Moore's In the Cut or in her own voice) are examples of what we would call eggcorns, including for example "autumn furlage", and diseases such as "very close veins" and "Screaming Mighty Jesus" (for "cerebral meningitis"). ]


Posted by Mark Liberman at 07:07 AM

February 14, 2005

Abu Ghraib is not about ravens after all (?)...

Back in May, I quoted an authoritative-looking web page to the effect that "Abu Ghurayb ('the preferred NIMA transliteration' of Abu Ghraib)" means "father of the raven". But a few days ago, Tim Buckwalter (whom I trust when it comes to Arabic lexicography) mentioned that "ghuraib (as in abu ghuraib, or abu ghreib), is the diminutive of ghariib". The context was a complicated pun involving the words gharb "west" and ghariib "strange". So I wrote back with the ghurayb=raven link, and I asked Tim whether the Arabic words for west, strange and raven are all related.

His answer:

Hmm, that looks like a mistake: ghuraab is crow/raven, even in Iraqi
I wouldn't be surprised if there is an etymological connection
between "west" and "weird," kind of like left/sinister?
Apparently, theire is an Abu Ghurab in Iraq:
Good place to start for more Iraqi cities:

So Abu Ghraib is "father of the oddling", not "father of the raven".

And Arabic gharb "west" and ghariib "strange" are probably related, but ghuraab "crow/raven" probably isn't. Unfortunately there is no Arabic equivalent of the OED. If I find out more, I'll add it.

[Update: Pekka Karjalainen wrote in with a link to a posting on sci.lang:

> >  *gh*urayb may be a diminutive of *gh*ura:b "crow". 'abu: literally means 
> > "father of" ("father" in the construct state), here inthe meaning of " 
> > place abundant in". so it probably indicates "a place abundant in small 
> > crows"
> It's of some possible interest that the meaning is not dissimilar to
> Kosovo Pole(field of blackbirds).  Yes, I know that a crow is just a

incidentally "Kosovo" underwent a change in turkish (adopted by
Albanians) to Kosova through false etymology, << ova >> means "plain"
in turkish.

> black bird, not a blackbird ...

which also points to an eggcorn phase in the history of the Albanian name Kosova.]

[Update #2: I asked Tim what the rules are for forming dimunitives in Arabic. He answered as follows:

Good question! Here are the rules for generating the diminutive:

CvC(v)C  --> CuCayC(a)    
gharb --> ghurayb
Hasan --> Husayn  (Hussein)
baHr  --> buHayra ("sea" --> "lake")

CvCvvC --> CuCayyiC
ghuraab --> ghurayyib
kitaab  --> kutayyib  ("book","booklet")

CvCCvvC --> CuCayCaaC
salmaan --> sulaymaan  (Salman, Suleiman)

Early on in my work with Ken Beesley (1989-91) we discussed whether we
could automatically recognize diminutes not explicitly entered in the
lexicon. We didn't have the time to investigate that, and I'd love to
revisit it someday. Since our application was primarily for MT, we also
needed to investigate whether all diminutives could reliably be glossed
as "small of X."

So the dimunitive of ghuraab "crow" ought to be ghurayyib, suggesting that those who offered the "small crow" analysis for Abu Ghurayb (or Abu Ghraib, etc.) must be mistaken. ]


Posted by Mark Liberman at 03:16 PM

By the power vested in all of us

Happy Valentine's Day from Language Log. Appropriately enough, I just got back from the wedding of a dear friend in Oakland, California, on Sunday afternoon. To be more precise, the legal wedding had actually taken place in a rush a year ago without a chance for friends and family to be gathered together, so strictly what I just got back from was a renewal of wedding vows already taken, combined with a full-scale reception. It was a warm and affectionate occasion. The room (the wonderful Soizic Bistro) was full of happiness; teardrops moistened many a smile. It was a very conventional ceremony and reception, from the largely Episcopalian wording of the preliminaries and vows right down to the staged cutting of a huge cake followed by the married couple feeding each other chunks of it (in what I assume is a symbolization of the way married couples commit to taking care of each other's bodily needs). I don't think anything much would have raised an eyebrow if someone from rural Kansas had stopped by to witness the event — except that the two people renewing their wedding vows to each other were both women.

Other than that, it was just a modern wedding, and a particularly happy and successful one. Each woman had a proud and beaming father by her side taking part in the ceremony; each had a younger brother there to give a speech to the guests after the main course; the two families welcomed each other warmly as in-laws and sat intermingled, visibly enjoyed getting to know each other and like each other, while their young kids played on the floor or ran around in the next room. I never in my life saw ordinary family values so deeply and peacefully affirmed at any kind of event. If family values is your political bedrock, this was for you.

Ian MacKaye, the musician (founder of the band Fugazi , and of The Evens, and of the Dischord record label), was there to support his sister Susannah in the life choice she was making. Anyone who witnessed the sincerity and directness with which Ian welcomed his new sister-in-law Kirstin into the MacKaye clan would have lost most of their doubts about what the future holds for gay marriage. No one with a still-functioning mind and heart could have been at this event without seeing that it was a glimpse of the future. Our culture cannot continue forever to prevent gatherings as happy and socially affirmed as this, to deny the status and benefits of marriage to people as committed as this couple (who have been increasingly in love for fourteen years now). Not when their families and circles of friends and whole communities wish to recognize them as married and to support them in marriage.

Now, there were some linguistic obstacles to be negotiated (hey, c'mon, this is Language Log, we don't do merely social blogging here) But what struck me was how tiny, how trivial those linguistic difficulties in phrasing things turned out to be. Certainly, the person presiding can't say we are "gathered here to join this man and this woman"; so that is replaced by "join these two people". There certainly was no "Who giveth this woman to be married to this man?" (both women were in fact presented to be married via brief speeches by long-standing women friends of theirs); so the "giveth" piece of Church-of-England language was just deleted. Likewise the line about "love, honor, and obey" (jettisoned from nearly all heterosexual marriage ceremonies today anyway). The phrase "I now pronounce you man and wife" has to be replaced, of course; the replacement was "I now pronounce you spouses for life.". And at the end, there was no "You may now kiss the bride" (with its assumption that the male instigator is the subject and the blushing bride the object). (Susannah and Kirstin were bursting with happiness, and once the rings were on and the words were said they didn't need to be prompted to kiss each other's similing faces.) But basically everything else about the language of a wedding ceremony — in sickness and in health, as long as you both shall live, and all that good stuff — can be used unchanged. The alterations needed in the language of a typical wedding ceremony in order to accommodate gay and lesbian marriages are so minor as to be utterly negligible, and comparable to the minor adjustments that are made these days in quite conservative heterosexual marriage services in churches all over America.

One technicality involving a subordinate clause functioning as adjunct to a performative clause provided the biggest departure from weddings I've seen in churches and courthouses. Performatives are utterances that, instead of merely saying something, do something. Conferring degrees, making promises, or causing people to be legally married, for example. The performative "I now pronounce you..." is normally preceded by an adjunct like "By the power vested in me by the City and County of San Francisco" or whatever, to make clear the source of the legal authority to marry people. And there was something of a problem about what to say there.

Susannah and Kirstin had already been (as they then saw it) legally married in City Hall across the bay in San Francisco back in mid-February 2004, when for a period of a few days Mayor Gavin Newsom had decided that it looked to him like the Constitution didn't permit him to deny what so many gay and lesbian couples were asking him for, and the officials doing the marriages at that time understood themselves to have powers vested in them by appropriate civil authorities of California that made the marriages real in law. A month later a Supreme Court judgment stopped same-sex weddings in California. And months after that, the court told Mayor Newsom that he had overstepped the bounds of his authority in issuing the marriage licenses, and they declared Susannah and Kirsten, and 3,954 other same-sex couples, to be in possession of marriage certificates that are invalid. No legal authority now exists for a wedding of two women to take place. So what did they do about that adjunct, "By the power vested in me by..."?

Well, the woman presiding over the ceremony simply said: "By the power vested in all of us," and declared that by the will of the whole community, as represented by all those present, Susannah and Kirstin were married. (This is in fact modelled on Quaker practice: the Friends also hold the view that it is the community, not some priestly authority figure, that causes a couple to be married.) Everyone in the room seemed to approve heartily. In fact every single person there signed a large certificate to say so (another Quaker custom).

That certificate doesn't alter the law, of course. But I want (not for the first time) to counsel everyone to pay heed to the words of our President, George W. Bush. On February 24, 2004, President Bush made a firm and outspoken statement on the topic of relevance here. He objected to the "activist judges who are defining marriage", and he said plainly what he thought:

"Marriage ought to be defined by the people, not by the courts."

I think I agree with him. Anyway, on Sunday afternoon I definitely saw his plan being put into effect. Suddenly it became very hard to see how there could possibly be a future in which things would be otherwise. As I remarked on Language Log last year, when Susannah and Kirstin were newly married, dictionary entries show that the word marriage is certainly defined broadly enough in everyday English to cover same-sex marriages as well as the different-sex kind. But you really have to attend a same-sex marriage to get a real sense of just how natural it is going to be. Recalcitrant states may resist for a few years; perhaps there will even be a serious move to whip up a ban by constitutional amendment (they tried that with alcohol once, but it didn't last; and alcohol isn't as good for you as marriage). But I no longer think the resistance can last forever. The wisdom of our President quoted above, taken together with a few thousand gatherings like the one in Oakland today, will sort things out within a few decades, by the power vested in all of us.

Posted by Geoffrey K. Pullum at 12:44 AM

February 13, 2005

Who loves whose country?

Arnold Zwicky objects to the American Heritage Dictionary's definition of patriot as "[o]ne who loves, supports, and defends one's country". He explains that "[t]he subject ... is an indefinite pronoun one, in alternation with the indefinite pronoun someone, while the possessive ... is a generic pronoun one's, in alternation with the generic pronoun your".

Reading his post, I agreed with Arnold's objection -- the definition seemed weird to me too, suggesting that the one who loves and the one whose country is loved are different, as they are in the phrase "someone who loves, supports and defends someone's country". But thinking about it further, I began to wonder whether Arnold and I might both have taken a wrong turn here.

The definition of patriot is not an isolated example -- the AHD also includes

traitor: One who betrays one's country, a cause, or a trust, especially one who commits treason.
witness: 3c. One who signs one's name to a document for the purpose of attesting to its authenticity.
nailbiter: 1. One who bites one's fingernails as a nervous habit.
acrobat: 2. One who changes one's viewpoint on short notice in response to the circumstances.
pedant: 2. One who exhibits one's learning or scholarship ostentatiously.

So apparently this "mistake" is a matter of editorial policy at the AHD, a circumstance that ought to give us pause.

This is also not some new-fangled editorial intervention to avoid gendered pronouns. The 1913 Webster's 2nd has

parricide: 1. Properly, one who murders one's own father; in a wider sense, one who murders one's father or mother or any ancestor.
matricide: 2. One who murders one's own mother.

And if we look on the internet for similar examples (e.g. via the pattern {"one who * one's"}), it seems that many on the net agree with this usage:

One who seeks one's own happiness by inflicting pain on others, entangled by the bonds of hate, will never be delivered from hate.
One who knows one's former births, who sees heaven and hell, who has reached the end of births and attained to the perfection of insight, the sage who has reached the summit of spiritual excellence--such a one do I call a holy person.
The mighty individual is the one who conquers one's passions.
I thought the whole point of being a true American, one who loved one's country, was having the guts to take a look at the whole truth and nothing but the truth.
What sort of "anti-Semite" would be one who risks one's own life to save Jewish children?
[The one who] finds one's life will lose it and [the one who] loses one's life [for my sake] will find it.
...the Lord Himself always recompenses one who expends one's life in His way.
[Be like] one who loves one's fellow creatures and brings them close...
refugee, one who leaves one's native land either because of expulsion or to escape persecution.

[emphasis added, square brackets original]

So apparently quite a few people allow one's to be a possessive indefinite pronoun, in Arnold's terminology. And some of these people write glosses for well-respected dictionaries. On the other hand, the pattern {"one who * his"} has 233,000 hits, and {"one who * their"} has 36,900, compared to 610 for "{one who * one's}", suggesting that Arnold and I are not alone in being uneasy about this.

There's more to say here -- as often, when you start pulling on the loose ends it unravels a lot of the fabric of grammar, so to speak -- but this much will have to do for now.


Posted by Mark Liberman at 10:47 PM

Patriots and pronouns

On Friday I finally got my own set of Patriots stickers, which a few weeks ago were the source of a half an hour's amusement for my daughter and son-in-law and me. This was a game, over dinner, of Guess the Patriots: which ten personages are going to be celebrated as patriots on stickers? My daughter knew the answers, since she'd bought the stickers; her husband and I got to guess. You can play too.

Relevant background: the stickers (VA429 Patriots) are a product of the Hayes School Publishing Co. in Pittsburgh, who also produce stickers of the following sorts (among others):

Praise Words, Happy Faces, Winning Words, Apples, Happy Birthday, Smiling Stars (all these in English, Spanish, and French), various U.S. holidays, Careers, Good Health Habits, Dinosaurs, Pets, Music Masters, Famous African Americans, Reading Achievement, Bunnies, Hip Words, Teddy Bears, Race Cars

The patriots stickers are meant for use in U.S. schools, which means they're not about the New England Patriots (a pro football team of some note), with upper case, but about lower-case patriots, specifically American patriots. (My son-in-law, an Australian who's lived in the U.S. for only just over a year, was at an instant disadvantage.) Since there are political sensibilities at issue here, you can pretty much bet that the ten are going to include at least one woman and at least one African American. Ok: guess away!

Once I'd worked my way through dozens and dozens of guesses to the truly patriotic ten, I was puzzled at what the hell patriot meant to the folks at the Hayes School Publishing Co. (See, there's some linguistic content in this.) So I did what ordinary people do when puzzled by words: I looked in the dictionary, specifically the American Heritage Dictionary 4, which told me that a patriot is:

(1) One who loves, supports, and defends one's country.

But then I was puzzled not only about patriot but also about the AHD4's bizarre use of the pronoun one's, which seems to have resulted from someone's rejection of the anaphors his (sexist), his or her (clunky), and their (plural). I'll look at that puzzle first.

It's easy to see how the people at AHD4 ended up with (1), but it won't do. The subject of (1) is an indefinite pronoun one, in alternation with the indefinite pronoun someone, while the possessive in (1) is a generic pronoun one's, in alternation with the generic pronoun your. These two don't fit together. We can see how bad the fit is by trying to slot the alternatives into (1):

(1.1) Someone who loves, supports, and defends one's country.
(1.2) One who loves, supports, and defends your country.
(1.3) Someone who loves, supports, and defends your country.

These are all grammatical, and roughly synonymous with one another and with (1), but they don't mean what the editors of AHD4 were after, which requires that the possessive be anaphoric to the subject: someone who loves etc. their own country. Not (necessarily) mine, not yours, not the reader's, but their own.

The New Shorter Oxford English Dictionary (1993 ed.) gets this just right, opting for the clunky but accurate his or her and even adding a bit of complexity to the support/defend part of the definition:

(2) A person devoted to his or her country; a person (claiming to be) ready to support or defend his or her country's freedom and rights.

But enough of warnings to the AHD staff. Who counts as a patriot? More to the point, who will schoolteachers think counts as a patriot?

Well, on service to and defense of the country, military leaders will score high. So will notable presidents, especially those who were military men themselves or presided in wartime. (Patriotism and war turn out to more closely associated than I would have thought.) And of course there are the Founding Fathers. One person scores in all three areas, and indeed G. Washington -- I cite the names as they appear on the stickers -- is one of the ten Patriots. Of the remaining Founding Fathers, I would have expected Jefferson (who accompanies Washington on Mount Rushmore) to get a sticker, but no; instead, Benjamin Franklin gets the nod. Statesmen and diplomats don't seem to rank high on schoolteachers' patriotism scale; maybe Franklin is the token Patriot from this category, and gets a boost from his Founding Father status. (Paul Revere seems to be insufficiently Great.)

On to other great presidents, especially those with military or wartime connections. The prime candidate after Washington is Abe Lincoln, and then we might as well fill out Mount Rushmore with Teddy Roosevelt. They are indeed sticker Patriots. The two world wars are not yet covered, so we add Woodrow Wilson and F. D. Roosevelt to the set. Six down, four to go. Except for the Jefferson/Franklin thing, it's been pretty straightforward so far.

Any military candidates who weren't elected president? I ran through quite a few American generals -- George Marshall was my first choice -- before stumbling across the one who made the sticker set: D. MacArthur.

Still no African Americans and no women, and only three slots left. If you think of the task as being to name Great Americans rather than Patriots, then one African American clearly stands out: Martin L. King. Bingo.

Choosing a woman wasn't easy. My candidate was Eleanor Roosevelt, but I realized that she'd be too controversial, and anyway her husband was already in the set. My guess was that schoolteachers would go for Julia Ward Howe, as a combination of the artistic and the military (and comfortably back in history). But no. Instead, the sticker choice was a two-for-one, an African American woman: Harriet Tubman. (My son-in-law admitted he'd never heard of her, and I soothed him by confessing that I could hardly name any famous Australians outside of the arts and noting that the outlaw Ned Kelly, one of the few names that came to my mind, would scarcely count as an Australian patriot, no matter how far you expanded the meaning of patriot.) Tubman isn't a bad choice as a Great African American, but she's a bit of a stretch as a Patriot.

Now the really hard part, Patriot #10. This one would never ever have occurred to me if my daughter hadn't explained, in answer to my question about whether #10 was still living, that, strictly speaking, #10 was neither living nor dead. Ok, a fictional Patriot. Or, at least, a fictional Great American. Even then, it's a stretch: Uncle Sam, the (bellicose) embodiment of the nation.

Presumably, schoolchildren are expected to learn the meaning of patriotism -- in particular, American patriotism -- from the lives and deeds of these ten personages. I wonder. What, exactly, has Uncle Sam done?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:25 PM

Sphere is here

In today's NYT, Frank Rich discusses the PR problems of Clint Eastwood's latest movie:

"What really makes these critics hate "Million Dollar Baby" is not its supposedly radical politics - which are nonexistent - but its lack of sentimentality. It is, indeed, no "Rocky," and in our America that departure from the norm is itself a form of cultural radicalism. Always a sentimental country, we're now living fulltime in the bathosphere. Our 24/7 news culture sees even a human disaster like the tsunami in Asia as a chance for inspirational uplift, for "incredible stories of lives saved in near-miraculous fashion," to quote NBC's Brian Williams." [emphasis added]

I spent a few seconds puzzling about our national exploration of the oceanic abyss, before I got it. Oh, he's coining a new word, based on bathos, sense 2a and 2b:

1a. An abrupt, unintended transition in style from the exalted to the commonplace, producing a ludicrous effect. b. An anticlimax. 2a. Insincere or grossly sentimental pathos: “a richly textured man who . . . can be . . . sentimental to the brink of bathos” (Kenneth L. Woodward). b. Banality; triteness.

The etymology is Greek bâthus "deep, thick, strong, violent, copious, abundant", from which we also get bathysphere:

A spherical deep-diving chamber in which persons are lowered by a cable to study the oceans and deep-sea life.

Hence my confusion.

The -sphere (or -osphere) part is a hot morpheme these days, with blogosphere one of the spreading recent coinages. The OED explains that -sphere is a suffix "[u]sed in names of more or less spherical structures or regions forming part of or associated with the earth (or any celestial object)", giving the examples barysphere, biosphere, ionosphere, magnetosphere. Others include hydrosphere, lithosphere, noosphere.

In case you've forgotten, or never knew, there is apparently some controversy about the invention of the term blogosphere, with Brad Graham (1999) and William Quick (2001) the claimants.

As far as bathosphere is concerned, Frank Rich is late to the party: there's already bathosphere.org (an artists' website), and bathosphere.com ("e-commerce solutions", featuring a picture of the inside of a bathysphere...), and a picture album from 2003 entitled "!!Hot Bathosphere Action!!" (safe for work, but not really worth linking: the subtitle is "Some funny, embarassing, and slightly odd pictures of your friendly neighborhood otaku").

I suspect that credit for the contemporary exploration of this morpheme's possibilities (the spherosphere?) belongs to old Teilhard de Chardin and his concept of the successive evolution of lithosphere, biosphere and noosphere.

Checking the spherosphere via Google, I find that another blogger commented on Frank Rich's use of bathosphere, back last Thursday 2/10/2005. Which is strange, since Rich's column is dated 2/13/2005. Cue Twilight Zone theme. Could this be transnoospheric blogleakage from a slightly desynchronized parallel blogiverse? Resulting from quantum entanglement in Cisco's new Bose-Einstein switches? This may be a premise of Pynchon's next novel, but in this universe, it's just that you can't trust cyberdatelines.

[Update: several people have written in to observe that Frank Rich's column, printed on in the Sunday edition and datelined accordingly, normally goes online on the previous Thursday. That's what I meant about cyberdatelines and all. And the parallel blogiverse business was a joke... ]


Posted by Mark Liberman at 08:19 AM

February 12, 2005

Reaching your own constructions

In the Wall Street Journal on 2/10/2005, Bret Stephens wrote about Eason Jordan that "I'll leave it to others to draw their own verdicts". I thought this was a weird turn of phrase when I quoted him, but I held back from saying anything about it, in order not to worsen my reputation for intellectual ADHD. As I was re-reading the post for typos, this phrase nagged at me again. What you do to a verdict is reach it, not draw it, right?

I mean, think about about it. "The jury has reached a verdict." Fine. "The jury has drawn a verdict." Out of a hat? On a napkin?

So this morning I sat down to scratch this little itch by blogging about it. Now, I'm not going to sit here in my blogging pajamas and pontificate strunkishly about what people should or shouldn't say or write, even with compelling examples like these, so I thought I'd check with Norma Loquendi. Or at least the best approximation that I (and you) have easy access to, namely Google.

I suspected that Stephens was influenced by the rough semantic equivalence, in his context, of verdict and conclusion. You should reach a verdict, I reasoned, but draw a conclusion. Stephens wrote that he would "leave it to others to draw their own verdicts", so I decided to check the patterns

{"reach * own conclusion|conclusions"}
{"draw * own conclusion|conclusions"}
{"reach * own verdict|verdicts"}
{"draw * own verdict|verdicts"}

where '*' mostly ranges, as expected, over {"my|your|her|his|our|their|one's"}. Here are the results, presented as a 2x2 table:

  conclusion|conclusions verdict|verdicts
reach * own
draw * own

This is hardly convincing. In, fact, it's a pitiful excuse for empirical confirmation. True, conclusions are being drawn about 8 times more often than they're being reached, while verdicts are being reached about 5 times more often than they're being drawn. But to my ear, both reaching and drawing seem like fine things to do to conclusions:

Having the students reach their own conclusions is key to making this lesson effective.
For more information we strongly recommend that you read the full article and draw your own conclusions.

In contrast, the examples in which verdicts are drawn all seem weird to me:

Each Cabinet member will be able to draw their own verdict: Mr Brown’s views will be one of many.
You can draw your own verdict on me in the comments.

There are not really even 46 of these examples -- Stephens' recent WSJ sentence is replicated many times, for example, and there are some other repeats. However, the pattern of Google counts so far doesn't come near to justifying my intuitions.

One hypothesis is that others are confused in the same way that Stephens was. Maybe the basic construction here is really {draw|reach|come to [one's] own conclusion(s)}, and all the examples with verdict are just epiphenomena. Constructional malapropisms, so to speak.

So let's get rid of the "one's own" part, and just check

{"reach|reached|reaches|reaching a conclusion"}
{"draw|drew|draws|drawing a conclusion"}
{"reach|reached|reaches|reaching a verdict"}
{"draw|drew|draws|drawing a verdict"}

  a conclusion a verdict

Much better! This time, the conclusion is reached more than twice as often as it's drawn, for some reason. (Why is there a factor of 16 difference in the relative frequency of reach and draw between the two patterns sets I tried? That's a topic for another post.) But now a verdict is reached a reasonably large number of times, and drawn 2,281 times less often. This is a satisfactory statistical image of my intuitions about Stephens' turn of phrase.

Of course, I've just committed a common fallacy of empirical research. I don't know if it has a name -- it's a little like begging the question, but maybe it should be called "waiting for the answer". You have a conclusion in mind, and you keep trying empirical tests until you find one that gives you the answer you want. An Indian friend once told him that his parents had dealt with astrologers this way, in the matter of his marriage: the first one they asked found his wife astrologically unsuitable, and so did the second one, but the third one found the stars to be in favor.

Then again, astrology is bunk, whereas Google counts are a reliable indicator of English usage :-).


Posted by Mark Liberman at 11:07 AM

Internet targets Eason Jordan, CNN paves him over

The MSM, which mostly ignored the Eason Jordan story until now, have reported extensively on his resignation, sometimes with interesting additional information, or at least an interesting spin.

The AP story recycles the word target, perhaps simply due to lexical priming:

But the damage had been done. He [Jordan] was the target of an Internet campaign that was beginning to rival the one launched against CBS' Dan Rather after the network's story on President Bush's military service. [emphasis added]

In this case, though, there was no ambiguity about intent. Jordan was definitely the target of the blogpack, in a sense that reporters in Iraq have not been the targets of the U.S. military.

The BBC story, interestingly, chooses to group Jordan's resignation with a recent event in which a journalist in Iraq was specifically targeted for death -- by an insurgent group:

The latest journalist to die in Iraq was Abdul Hussein Khazal, 40.

Khazal, a correspondent for US-funded Arabic TV station al-Hurra, was killed by gunmen on Wednesday as he was leaving his house in the southern city of Basra.

His three-year-old son also died in the attack, claimed by a previously unknown rebel group.

This information is prominently placed in the third graf of the story. It evokes the fact that most of the journalists killed in Iraq have been killed by the other side, and have been killed on purpose because of who they are (though the BBC story does not go so far as to say this). This is a kind of juxtaposition that I don't expect from the BBC.

John Cook in the Chicago Tribune gets a sprightly quote from an anonymous source:

Jordan's resignation marks the end of a 23-year career at CNN as one of its most powerful and influential executives; much of his time at CNN was spent building and running the network's global newsgathering operations. But Jordan's power was severely circumscribed in a corporate reorganization in 2003, and insiders said Jordan was unhappy with a job that essentially made him a figurehead.

"It seems like a dumb thing to have said," said a highly placed CNN source who spoke on condition of anonymity. "Mostly he had the role of being the embodiment of CNN's journalism domestically and internationally, and having said the things he apparently said, it would have been difficult to go forward. But in a sense, he'd already been paved over. He was a chief of state, as opposed to an operating officer." [emphasis added]

He'd already been paved over. Nice. I wonder whether that "highly placed CNN source" was Jonathan Klein, who has a track record for memorable turns of phrase:

"You couldn't have a starker contrast between the multiple layers of checks and balances [at 60 Minutes] and a guy sitting in his living room in his pajamas writing."

The LA Times story by Ned Martel tells us that

... a website called Easongate.com, featuring the executive's corporate portrait on its home page, offered a clearing-house of criticism related to Jordan's statements. The website linked to 25 other sites in its "Blogroll," with mainstream columnists such as Roger L. Simon and more obscure bloggers such as "Red State Rant" and "Winds of Change." [emphasis added]

I've always thought of Roger Simon as a mystery novelist and screenwriter who took up blogging on the side after 9/11, as the title of his blog and his about page explain. Has he acquired a newspaper column somewhere? Or does 15,528 visits a day qualify him as a "mainstream columnist", while Winds of Change at a mere 5,303 is "obscure"?

Perhaps Martel is trying to say, in a confusing way, that the the blogpack pursuing Jordan was diverse in terms of number of readers. But surely this is not news -- how could it be otherwise? And he didn't pick the extreme examples, which would be Instapundit at 158,487 visits per day on the high end. Or maybe Martel just liked the contrast between the dignified appellation "Roger L. Simon" and the edgy "Red State Rant" and "Winds of Change"?

[Update: No, it turns out that the "Politics Editor" for U.S. News and World Report is also named "Roger Simon". He's a completely different person from Roger Simon the blogger; but we can't expect an LA Times reporter to know that... ]

Perhaps the most preposterous spin was in a story from on IslamOnline.net:

Lamis Awad, a Tunisian journalist who attended the event, said Jordan criticized the US forces for targeting reporters working for Al-Jazeera news channel in particular.

“The US administration would not allow any journalist working in a heavyweight American channel like CNN to publicly criticize its policies in Iraq,” she told the Doha-based broadcaster commenting on the resignation.

A few congresscritters have spoken up, including Barney Frank and Chris Dodd, who were there at the session in Davos; but no one else has suggested so far that the Bush administration has played any role whatsoever in the whole imbroglio.

The sad thing is that the Tunisian journalist, if not misquoted, was perhaps not lying, but rather just reasoning by analogy from her own situation. A couple of decades ago, when I was working in an industrial research lab, I was visited by a bright young telecommunications engineer from a North African country. During a dinner party at my house, he had a long conversation with a financial journalist, and was politely puzzled by the notion that her job included finding stories and discovering facts. In fact, he was incredulous, since he was convinced that the selection and content of magazine stories must largely be determined by the size of the bribes paid to the publisher by their subjects, always of course under constraint from the interests and sensitivities of relevant government ministers. The other Americans at dinner were shocked by this view, a reaction which he attributed to their surprising naïveté.

[Update 1:00 p.m.: The NYT web site still has no article on the Jordan affair, other than an automatic reprint of the AP wire story on his resignation. I guess this is an example of their tendency to take the view that a story broken by others doesn't exist. Though perhaps they will come through with a package of relevant revelations later on.]

[Update 1:45 p.m.: There's a story there now, by Jacques Steinberg and Katherine Seelye. It still isn't indexed by their search function, so maybe it was up earlier today as well. They quote Stephens' "defamatory innuendo" remarks.]

[Note: Sam Bayer notes that I managed to spell "sprightly" as "spritely". Oops. We'll have to transfer to Sam's account a suitable fraction of what we usually pay Geoff Pullum for his function as copy editor. ]

Posted by Mark Liberman at 09:14 AM

February 11, 2005

Blogs push Eason Jordan past the tipping point

According to ABC News and other outlets:

"CNN chief news executive Eason Jordan quit Friday amid a furor over remarks he made in Switzerland last month about journalists killed by the U.S. military in Iraq. Jordan said he was quitting to avoid CNN being "unfairly tarnished" by the controversy. "

When I wrote about this a couple of days ago, I thought it was pretty obvious that Jordan was tottering, even though it was unclear exactly what he had said, much less what he had meant, and even though the (mainstream) media coverage up to that point had been very limited and mostly gentle:

"Since this will get a little complicated, let me put my conclusions up front. (In the court of public relations, the Red Queen's rule applies anyhow: "first the sentence, and then the evidence".) Eason Jordan made a big mistake. He said something whose natural interpretation is incorrect and indefensible. He may have meant to convey the natural interpretation of his remarks, either because he believes it's true, or because he chose to exaggerate in order to express animus against the U.S. forces in Iraq. He may have meant to imply the natural interpretation but to leave himself interpretive room to back up, as politicians often do when they want to play to the prejudices of one group while preserving deniability for another. Or he may not have intended to convey the obvious interpretation of his words at all, despite their inflammatory effect and the prominence of the setting. None of these options is acceptable communicative behavior for the Chief News Executive of CNN."

A similar note was sounded by Bret Stephens (who was actually present at Davos), yesterday in the WSJ:

"I'll leave it to others to draw their own verdicts, but here's mine: Whether with malice aforethought or not, Mr. Jordan made a defamatory innuendo. Defamatory innuendo--rather than outright allegation--is the vehicle of mainstream media bias. Had Mr. Jordan's innuendo gone unchallenged, it would have served as further proof to the Davos elite of the depths of American perfidy. Mr. Jordan deserves some credit for retracting the substance of his remark, and some forgiveness for trying to weasel his way out of a bad situation of his own making. Whether CNN wants its news division led by a man who can't be trusted to sit on a panel and field softball questions is another matter."

That other matter has now been settled in the obvious way. The ABC article explains something that made Jordan's position even more untenable than it would otherwise have been:

"After several management restructurings at CNN, Jordan actually had no current operational responsibility over network programming. But he was CNN's chief fix-it man overseas, arranging coverage in dangerous or hard-to-reach parts of the world."

Stephens, though no ideological friend of Jordan's, seems a little apprehensive about packs of bloggers running wild through the nets, howling for journalistic blood:

"There's a reason the hounds are baying. Already they have feasted on the juicy entrails of Dan Rather. Mr. Jordan, whose previous offenses (other than the general tenor of CNN coverage) include a New York Times op-ed explaining why access is a more important news value than truth, was bound to be their next target."

Well, perhaps blogpacks have their role to play in maintaining a healthy intellectual ecosystem, just as the wolves are good for Yellowstone. Or so I read.

While we're on the subject of defamatory innuendo, let me point out a lovely example in Stephens' editorial. He describes Jordan's NYT op-ed as "explaining why access is a more important news value than truth". I was guilty of something similar when I wrote that "CNN had a pre-war deal with Saddam Hussein not to report his regime's atrocities, in return for protection and access".

CNN's "deal" with Saddam was apparently unspoken -- Jordan just figured, no doubt correctly, that if he reported Saddam's atrocities then CNN's news bureau would be expelled from Iraq. And by "protection" I meant that he avoided telling the story of how CNN staffers and sources had been treated, because he believed (again plausibly) that they and their families would be horribly killed if he publicized what had happened to them.

As for Stephens' phrase about access and truth, what Jordan actually wrote was that he "made 13 trips to Baghdad to lobby the government to keep CNN's Baghdad bureau open", and that he learned about "awful things that could not be reported because doing so would have jeopardized the lives of Iraqis, particularly those on our Baghdad staff". He didn't actually say that "access is a more important new value than truth" -- instead he said that preserving the lives of CNN's Iraqi staff was more important that telling the truth about Saddam's atrocities. I suspect that if you asked Jordan whether "access is a more important news value than truth", he'd deny it.

Now, in fact I stand by my description of the pre-war situation in Iraq: "CNN had a ... deal with Saddam Hussein not to report his regime's atrocities, in return for protection and access." That's effectively what the situation was, even if that way of describing it is slanted towards a certain evaluation, namely that CNN's decision was unethical and even immoral. And I'm similarly comfortable with Stephens' neat formulation: I'm convinced that Jordan did indeed believe that "access is a more important news value than truth", even though he didn't say that and might well claim that he doesn't believe it. He acted as if he believed it, and that's good enough for me.

But those descriptions ("a deal not to report atrocities in return for protection and access"; "access is a more important news value than truth") are mine and Stephens', not Jordan's. And we might have done more to make that clear.


Posted by Mark Liberman at 11:55 PM

Orientalist meets occidentalist

As part of a recent blog exchange (blogversation?) with Juan Cole, Martin Kramer told this joke, which he credits to Charles Issawi:

A Western orientalist goes to Egypt, and strikes up a conversation in Arabic with his taxi driver. The poor driver, after straining to understand his passenger, plaintively asks him how he came to know Arabic. Ana mustashriq! the orientalist answers proudly. In reply to which, the taxi driver mutters: Wa'ana mustaghrib...

Kramer doesn't offer a translation -- "here's an addendum," he writes, "but only if you know Arabic." I hate to miss a joke, so I asked Tim Buckwalter, who explained it to me.

Sharq = orient, and the denominal istashraq = approx. to orientize or orientacize, hence the active participle mustashriq = orientalizer or more properly orientalist.

Gharb = west, but the parallel form istaghrab is a denominal from ghariib (= strange), hence istaghrab = to regard as strange, to be puzzled (lit., to find something ghariib), and the active participle is mustaghrib, which parallels mustashriq perfectly.

"I'm an Orientalist!"
"Oh, yeah? I'm puzzled." [i.e., I can barely understand you!]

In this context the term mustaghrib could also be interpreted literally to mean someone who studies the West, but the term Occidentalist is not used.

BTW, ghuraib (as in abu ghuraib, or abu ghreib), is the diminutive of ghariib.

In my opinion, the term Occidentalist deserves wider currency. I think of it only in connection with Buruma and Margalit's recent book Occidentalism, which according to Amazon's "search inside the book" feature uses the word "occidentalist" 35 times in its 160 pages. Checking Google for occidentalism turns up 718 instances, including a review of Occidentalism by Eric Kenning, under the cute title "Occidents Happen", which contains this even cuter passage:

The most intriguing response [to French Enlightenment universalism] came from philosophical historian Johann Gottfried von Herder, whom Buruma and Margalit discuss as if he fit their Occidentalist profile, though he doesn't. Herder wasn't a nationalist who believed in the superiority of German or any other culture. He was a kind of libertarian traditionalist with, in Isaiah Berlin's words, "his acute dislike for political coercion, empires, political authority, and all forms of imposed organization." He believed that cultures had a specific individuality, and that to understand other cultures (or periods of history) you needed sympathetic imagination, not just facts and analyses. Once you understand them, you see that they're not just defective versions of yourselves. Different cultures should be valued and preserved as great works of art are valued and preserved, for their unique creativity and individuality. If Herder were around today, he'd be a familiar figure, making documentary films about South American rain forest tribal cultures and displaying STOP GLOBALIZATION and FREE TIBET and U.S. OUT OF IRAQ bumper stickers on his Volkswagen.

I'm not sure -- maybe today's Herder would an earnest young intern working with Samuel Huntington over at the Olin Institute for Strategic Studies (some background for this conjecture is here). Of course, Kenning's lefty filmmaker and my Huntington intern might just be two sides of the same anti-globalization coin.

Anyhow, the mustashriq /mustaghrib exchange is a good joke, with several layers of contemporary meaning. I'm glad I asked.

[Update 2/12/2005: an anonymous reader writes to suggest that a better translation for mustaghrib in this joke might be "disoriented". Cute, but it misses the east/west thing. If only the word were "disoccidented"...

It's curious that in English to be "oriented" is to know where you are and where you're going, whereas in Arabic to be "occidented" is to be confused by the strangeness of things. ]


Posted by Mark Liberman at 05:47 PM

Metonymy and Dean Wormer

I was giving directions to my house this morning, on the phone to a friend who was coming round for coffee (Chris Barker, the brilliant semanticist from UC San Diego, who is giving a talk here at UC Santa Cruz later today). "At the end of the cul de sac," I told him, "you'll see a driveway with a pile of mulch under a blue tarp; that's us." And I heard him giggle. I saw why. We are not a pile of mulch under a blue tarp, Barbara and me; rather, we live in the house behind it. This, of course, is a real-life occurrence of what the literary types call metonymy: as Webster puts it, "using the name of one thing for that of something else with which it is associated (as in spent the evening reading Shakespeare, lands belonging to the crown, demanded action by City Hall, ogling the heavily mascaraed skirt at the next table) : use of one word for another that it may be expected to suggest." And I was suddenly reminded of another amusing on-the-fly metonymy. A fine character actor, John Vernon, died this week. He played Dean Wormer in the movie Animal House (1978; technically called National Lampoon's Animal House). And one of the grimly serious lines Dean Wormer delivered with such aplomb, my very favorite was a wonderful piece of unintentionally inept metonymizing:

"The time has come for someone to put his foot down. And that foot is me."

Posted by Geoffrey K. Pullum at 04:28 PM


In this morning's news:

The National Science Foundation has named eminent linguistic scientist and veteran administrator David W. Lightfoot to head its Directorate for Social, Behavioral and Economic Sciences. He will take office on June 1.

Lightfoot, Dean of the Graduate School for Arts and Sciences at Georgetown University and a professor in its Department of Linguistics, will oversee NSF’s $197 million annual investments in fields such as anthropology, psychology, cognitive studies, political science, linguistics, risk management and economics.

Pardon us a bit of disciplinary kvelling.

[By the way, to give credit where due, the "Word of the Day" feature at randomhouse.com, where I linked for the information on kvelling, was devised and written over a period of years by Jesse Sheidlower, who is now editor-at-large of the Oxford English Dictionary. See his page Jesse's Word of the Day for more information.

I note that http://www.randomhouse.com/wotd/ now leads to an error page which tell me that "There has been an error; Could not find a word to display in folder "index_home/words/" which is less than 999 days old." Indeed. ]


Posted by Mark Liberman at 08:15 AM

Slander in Seattle

Washington state is in the news over an effort by Democrats (who probably have the votes to do it) to repeal a special law with purely linguistic focus: it specifically forbids slander against women. That's right: call a woman a slut in Seattle and you're breaking the law, buddy.

Unless, oddly, she's a prostitute. Now, a rational person might have thought that was exactly the kind of woman that would most need the protection of such a law, because people can slander the virtue of a working girl and really mean it. I think these profession-targeted insults are pernicious. The last thing you want when you're finally off duty is to have someone start hurling epithets at you on grounds of the work you do. When I finish work and head off for a margarita at some Santa Cruz hostelry, I don't want to have people who recognize me shouting "Pedant!" or "Correctness freak!" in my face. But in fact there's no law to prevent people accusing an honest grammarian, such as I, of being innumerate, female, jejune, sexless, or even postmodernist. Well, soon there also won't be any law against insulting the virtue of a woman, of any kind, in Washington state. In this country where free speech has gone from principle to mantra and from mantra to mania, it looks like speech is about to get even more free.

Posted by Geoffrey K. Pullum at 01:57 AM

February 10, 2005

What am I, Lewis Carroll?

Mark is too right (see the very end of his post) about Amazon.com's strange picture of my tastes. Because of one search I am typecast as a man interested solely in a strange, unholy, promiscuous mix of (i) sets and logic, of which I have enough; (ii) early 20th century children's novels, especially for young women, which I have never needed to know about at all; and (iii) postmodernist gender stuff I absolutely never read. The last time I visited the site, the welcoming message greeted me with a list of these books that it thought I might like (I color them blue for math, pink for children's novels, and lavender for gender studies; the green one is cognitive linguistics, also not in my sphere of competence, and I have no explanation for it):

1.Schaum's Outline of Logic
2.Basic Set Theory (Azriel Levy)
3.Axiomatic Set Theory (Paul Bernays)
4.Undoing Gender (Judith Butler)
5.Introduction to Logic and to the Methodology of Deductive Sciences (Alfred Tarski)
6.Volatile Bodies: Toward a Corporeal Feminism (Elizabeth Grosz)
7.The Jungle Book (Rudyard Kipling)
8.The Swiss Family Robinson (Johann Wyss)
9.The Adventures of Tom Sawyer (Mark Twain)
10.Anne of Green Gables (L. M. Montgomery)
11.Axiomatic Set Theory (Patrick Suppes)
12.Schaum's Outline of Group Theory
13.Schaum's Outline of General Topology
14.Schaum's Outline of Modern Abstract Algebra
15.Schaum's Outline of Discrete Mathematics
16.Cognitive Linguistics (William Croft)
17.Heidi (Johanna Spyri)

Please, Amazon! You don't know me! I have broad interests, really I do. I'm into Neal Stephenson (Cryptonomicon; maybe Snow Crash next), David Sedaris (Dress Your Family in Corduroy and Denim), Dan Brown (all right, I lied about Dan Brown), politics and philosophy (Sam Harris, The End of Faith; Daniel Dennett, Freedom Evolves), scientific biographies (Anita and Sol Feferman's new biography of Tarski; Anita's earlier biography of Van Heijenoort), literary biographies (Barbara Belford's Bram Stoker), earth sciences (Simon Winchester's Krakatoa), ecology (Jared Diamond's Collapse if Barbara will ever stop monopolizing it and let me get my hands on our copy)... books on computer science, rock 'n' roll, Oscar Wilde, economics, Unix, law, Coleridge, biology, ... Believe me! Why would I lie to you? I am not what you think! Axiomatic set theory and stories for little girls? What am I, Lewis Carroll or something?

Posted by Geoffrey K. Pullum at 01:17 PM

Anne of Green Gables and conversational optimization

A couple of days ago, Geoff Pullum asked amazon.com for a book by J. A. Green on the theory of sets and groups, and got answers that included one of the Anne of Green Gables novels. That very same evening, I happened to be reading a passage in William Gibson's 2004 novel Pattern Recognition where something similar takes place in a conversation between two human beings. Although both of Anne's intrusions were unexpected and unwelcome, they were the result of conversational strategies that make a lot of sense in general.

Pattern Recognition is about Cayce Pollard's search for the source of "the footage", a set of mysterious film clips spread anonymously around the internet. The footage has attracted an international cult, and Cayce's sub-cult of footageheads hangs out at a site called Fetish:Footage:Forum, run by someone named Ivy from her apartment in Seoul. One of Cayce's F:F:F friends, Parkaboy, has been doing some of what he calls "kanji-crusing" with his pal "Darryl, AKA Musashi, a California footagehead fluent in Japanese." In email to Cayce, Parkaboy explains that

Darryl and I, burrowing deep into back posts on an Osaka-based board of quite singular tediousness, had happened across what seemed to be a reference to #78 having been discovered to be watermarked.

Parkaboy and Darryl spread some "genderbait" on the Japanese forum:

... to get on with the narrative of Parkaboy and Musashi in deep kanji-space, we came back to the present, and our own language, with this one glancing and highly cryptic reference -- which I at first was convinced might be nothing more than an artifact of Darryl's translation. I returned to Chicago, then, and Darryl and I, curiosity's cats, began to lovingly generate a Japanese persona, namely one Keiko, who began to post, in Japanese, on that same Osaka site. Putting her cuteness about a bit. Very friendly. Very pretty, our Keiko. You'd love her. Nothing like genderbait for the nerds as I'm sure you well know. She posts from Musashi's ISP but that's because she's in San Francisco learning English. Very shortly, we had one Takayuchi eating out of our flowerlike palm. Taki, as he prefers we call him, claims to orbit a certain otaku-coven in Tokyo, a group that knows itself as 'Mystic,' though its members never refer to it that way in public, nor indeed refer to it at all. It is these Mystic wonks, according to Taki, who have cracked the watermark on #78. This segment, according to Taki, is marked with a number of some kind, which he claims to have seen, and know.

Cayce, who earns her living as a coolhunter and logo consultant, has been hired to find the source of the footage by the Belgian marketing genius Hubertus Bigend. She travels to Tokyo, posing as Keiko's English teacher on a tourist visit, where she is supposed to meet Taki and exchange a signed picture of Keiko for the Mystic-decrypted number from footage #78.

She meets him in a bar, "one of those apparently nameless little red-lantern pub-analogs they have here, ... set into ground-floor walls in back lanes like this one."

What she's confronted with here, she decides, is an extreme example of Japanese geek culture. Taki is probably the kind of guy who knows everything there is to know about one particular Soviet military vehicle ...

Their conversation does not go very well.

"Keiko's told me a lot about you," she says, trying to get into character, but this only seems to make him more uncomfortable. "But I don't think she's told me what it is that you do."

Taki says nothing.

Parkaboy's faith, that Taki has enough English to handle the transaction, may be unfounded.

And here she is, halfway around the world, trying to swap a piece of custom-made pornography for a number that might mean nothing at all.


He isn't, as Parkaboy has indicated, the best-looking guy she's recently had a drink with. Though that, come to think of it, would be Bigend. She winces.

"I do?" Responding perhaps to the wince

"Your job?"

The barman places her beer on the table.

"Game," Taki manages. "I design game. For mobile phone."

She smiles, she hopes encouragingly, and sips her Asahi Lite. She's feeling more guilty by the minute. Taki -- she hasn't gotten his last name and probably never will -- has big dark semicircles of anxiety sweat under the arms of his button-down shirt. His lips are wet and probably tend to spray slightly when he speaks. If he were any more agonized to be here, he'd probably just curl up and die.


"That's interesting," she lies. "Keiko told me you know a lot, about computers and things."

Now it's his turn to wince, as if struck, and knocks back the remainder of his beer. "Things? Keiko? Says?"

"Yes. Do you know 'the footage'?"

"Web movie." He looks even more desperate now. The heavy glasses, lubricated with perspiration, slide inexorably down his nose. She resists an urge to reach over and push them back up.

"You ... know Keiko?" He winces again, getting it out.

She feels like applauding. "Yes! She's wonderful! She asked me to bring you something." [...]

Taki fumbles in his sport coat's side pocket, coming up with a crumpled pack of Casters. Offers her one.

"No, thank you."

"Keiko sends?" He puts a Caster between his lips and leaves it there, unlit.

"A photograph." She's glad she can't see her own smile; it must be ghastly.

"Give me Keiko photo!" The Caster, having been plucked from this mouth for this, is returned. It trembles.

"Taki, Keiko tells me that you've discovered something. A number. Hidden in the footage. Is this true?"

His eyes narrow. Not a wince but suspicion, or so she reads it. "You are footage lady?"


"Keiko like footage?"

Now she's into improv, as she can't remember what Parkaboy and Musashi have been telling him.

"Keiko is very kind. Very kind to me. She likes to help me with my hobby."

"You like Keiko very much?"

"Yes!" Nodding and smiling.

"You like ... Anne-of-Green-Gable?"

Cayce starts to open her mouth but nothing comes out.

"My sister like Anne-of-Green-Gable, but Keiko ... does not know Anne-of-Green-Gable." The Caster is dead still now, and the eyes behind the dandruff-flecked lenses seem calculating. Have Parkaboy and Musashi blown it, somehow, in their attempt to generate a believable Japanese girl-persona? If Keiko were real, would she necessarily have to like Anne of Green Gables? And anything Cayce might ever have known about the Anne of Green Gables cult in Japan has just gone up in a puff of synaptic mist.

Then Taki smiles, for the first time, and removes the Caster. "Keiko modern girl." He nods. "Body-con!"

"Yes! Very! Very modern." Body-con, she knows means body-conscious: Japanese for buff.

Cayce does get her number, in the end, and Taki gets not only Keiko's photo but also, miraculously, Keiko herself (or at least Judy Tsuzuki, the Japanese-American barmaid that Darryl got to pose for the picture).

So why did Anne of Green Gables intrude into these conversations among Geoff, Amazon, Cayce and Taki? Geoff suggested that it happened to him because Amazon now searches on the contents on books as well as on their titles and authors. And he's right, insofar as "set" and "theory" occur in the text of Anne of Avonlea but not in its title or author (though "Advanced Search" allows you, as always, to limit search to specified fields, and searching for author="green" and subject="set theory" returns only one result, the book that Geoff wanted).

But there's another factor: the Anne novels are quite popular. The Complete Anne of Green Gables boxed set has an Amazon sales rank of 3,223. In comparison, the paperback edition of Sets and Groups by James A. Green -- the book that Geoff was looking for -- has an Amazon sales rank of 3,147,553, or almost 3 orders of magnitude lower. Actually, this is the worst Amazon sales rank I've ever seen.

And Amazon is no doubt ordering its search results, in part, according to the popularity of the items in the list. This makes sense, since more popular items are, other things equal, more likely to be what a searcher is looking for. If you have too many things to say to someone, why not start with the things that you think they're most likely to find interesting?

And since Amazon is keeping track of what items users look at online, as well as what they buy, there's another aspect of conversational dynamics here as well: one way to decide what to say to someone is to ask how much information you can expect to gain from the response. If you're trying to figure out who someone is, the ideal question would be one that would cut the set of people they might be exactly in half, in which case you'd learn one bit of information from their response. If you ask a question that 99.5% of the relevant population will answer in a predictable way (e.g. "are you interested in set theory?"), then you can expect to gain less than a quarter of a bit of information from the response. Of course, some bits are more interesting than others, so you'd also want to weight the expected information gain by its expected value to you.

I'm not sure whether Amazon is also using this second principle of conversational dynamics, but I'm pretty sure that (we're supposed to think that) Taki is. He knows that young women in Japan have traditionally made a cult of Anne of Green Gables, ever since the American occupation, but he also knows that this is now changing. Apparently "modern girls" are different, in ways that interest him. And Cayce's lack of response to his question is enough of a response for him.

Unfortunately, Geoff clicked on the link, and even searched inside the book. He thereby informed Amazon that he is an old-fashioned type who will also be interested in Heidi, Black Beauty, Little Women, The Secret Garden, and so on.


Posted by Mark Liberman at 07:30 AM

February 09, 2005

Nowhere to really else go

Language Hat quotes a TV announcer saying (about Tom Brady's pass to David Givens) that "He had nowhere to really else go", and observes:

That is perhaps the single most astonishing sentence I've heard a native speaker of English utter (in terms of grammaticality, I hasten to add); it's so bizarre I had to retype it because I automatically moved the "really" as I was copying it.

After a bit more discussion, he adds "I'd love to hear one of the Language Log mavens or other linguabloggers try to account for how it got there", observing that "[t]his is the kind of thing that makes me very skeptical of efforts to derive sentences from little NP-VP nodules that get lexical items inserted before being extruded from the assembly line and out of our mouths".

Well, time is short this afternoon -- and you'll pay for that "maven" dig, Hat -- but as always, I'm happy to oblige.

In this case, I suspect that the explanation has more to do the psychological complexities of real-time composition than with the logic of grammar, generative or otherwise. In other words, it was a speech error.

1. Starting point #1: "nothing else", "no one else", "nowhere else". Normal stuff.
2. Modification with really: "nothing really else", "no one really else", "nowhere really else".

Umm, it's yellow and Horizontal... nothing really else I can say about it.
There’s nothing really else around to which The Nameless Uncarved Block can usefully be compared.
My heavenly father loves me for me and noone really else matters.
Or was the situation so bad that there was nowhere really else to go?

This is non-standard, but no big shock. Note that "no one really else matters" (if it's grammatical) means something different from "no one else really matters" (or any of the other strings resulting from moving really around). I think that the people who use these expressions intend to intensify the else rather than the negative (though I'm not entirely sure, because I don't think I'm one of them, and I haven't studied the construction).

3. Starting point #2: "really to VERB" <-> "to really VERB". Different orders are preferred for different verbs in different constructions and meanings, and as a result, there is a lot of confusion as sequential probabilities fight against structural preferences. In particular, "really to go" has 6,970 hits, while "to really go" has 48,300. Often the "split infinitive" is the only order that makes sense:

Travel is the Only Way to Really Go Places.
But nobody wants to really go back to basics of diet and exercise as the fountain of youth.
The use of our armored assailant Instructors in these dynamically realistic street attack scenarios finally allows you to really go flat out.

I mean, how else are you going to say those things?

4. Observation #3: "nowhere to" is a really common bigram: 1.59M whG (even typed twice).

5. OK, now we're ready to go. The announcer started to put together the simple cliche "He had nowhere else to go" (689 whG). He decided to modify else with really: "He had nowhere really else to go". Then in the excitement of the moment, his sequential preferences ("nowhere to", "to really") pulled "really else" over past "to".


Posted by Mark Liberman at 03:00 PM

Another peace <-> piece shift

Last August, Arnold Zwicky listed "an assortment of eggcorns from [his] files", including:

"say one's piece > say one's peace, peace of mind > piece of mind. The first was noted by me on ADS-L, 5/21/03, from Mike Thomas (and others) on soc.motss, who queried my spelling of "I said my piece". Garner's A Dictionary of Modern American Usage reports this widespread reanalysis, as well as the less common "piece of mind"; MWDEU notes various "confusions" of peace and piece, even going so far as to employ the verb "botch" in this connection. But say one's peace is now so common among younger speakers (who are baffled by the claim that the original noun was piece) that it begins to rival have another thing (for original think) coming as a newly dominant variant."

Yesterday, Fernando Pereira wrote in with another example of the lexical trade between peace and piece: "make [one's] piece with":

Read an instance today, only 11 ghits [for "made their piece with"] vs 6,940 for "made their peace with". "made his piece with": 23 ghits vs 11,300 for "made his piece with". "made her piece with": 3 ghits vs 2,030 vs "made her peace with".

Putting it all together (perhaps unreliable given Google's boolean oddities?) as {"made my|your|his|her|our|their piece with"} has 274 ghits, while {"make|makes|making my|your|his|her|our|their piece with"} has 164. That's a total of 438, versus 70,200 for {"made|make|makes|making my|your|his|her|our|their peace with"}, so that the total is only 0.6% eggcorn. This one is there in the meme pool, but it's a long way from taking over. In comparison, {"say|says|said|saying my|your|his|her|our|their piece"} 88,200 whG, while {"say|says|said|saying my|your|his|her|our|their peace"} gets 18,100, or 17% of the total. So as Arnold noted, "say one's peace" has a significant chunk of mindshare and may be on its way to domination.

There's a more self-conscious example of this lexical trade in the name of the local fast-food outlet Peace a Pizza ("sorry, we're open"). Piece out.

[Update: Abnu at Wordlab points out another example: "A Separate Piece" by John Knowles. {"a separate piece" knowles} has 660 hits, as opposed to 24,000 for the same query with "peace". I was pleased to see that the second-ranked hit is an invitation to academic dishonesty from 123helpme.com, whose first sentence is a marvelous combination of pretentiousness and incompetence:

"People are often vain of their most criminal passions; but envy is one passion so mean and low that nobody will admit it" Francois de la Rochefoucauld(1613-1680), a French philosopher, once stated and that statement summarizes the undertone of A Separate Piece by John Knowles.


[Update #2: Jonathan Mayhew writes:

I'm wondering whether the idiom

"to hold one's peace"

leads people to assume that the opposite idiom should be

"to speak one's peace"

instead of the more correct "to speak one's piece"

Do people write: "I'm going to give him a peace of my mind" ? Most of the hits that show up for that are making a deliberate pun, like "just desserts" as a name of a pastry shop.

You would probably find an eggcorn for "holding one's piece" as well, if the confusion were working the other way around. Try to peace that one together.



Posted by Mark Liberman at 07:21 AM

February 08, 2005

Type twice for truth?

Jean Veronis at Technologies du Langage has done some additional measurement and modeling of Google count oddities. I haven't had a chance to work through his latest stuff in detail, but it looks like he has hold of something. Jean's bottom line recommendation: "if you want to know the real index count for any word, simply type it twice:"

Word Count
stuttering 749,000
stuttering stuttering 452,000

Posted by Mark Liberman at 01:43 PM

Pernicious ambiguity at Davos

That disturbance you feel in the blogosphere, if you're back late from a weekend on Mars, is Easongate. Two weeks after the event, the MainStream Media has picked up the story: in a WaPo piece datelined 2/8/2005, Howard Kurtz explains that "What CNN chief news executive Eason Jordan said, or didn't say, in Davos, Switzerland, last month has become a burgeoning controversy among bloggers and media critics."

One reason that it's unclear what Jordan did or didn't say is that although the session was videotaped, the organizers of the World Economic Forum apparently consider it to have been held under the Chatham House Rule, so that neither clips or transcripts have been released. But there's another reason as well: Jordan's remarks, whatever they were, set some kind of record for consequential ambiguity. That's a linguistic as well as a political problem. So what you need is not only a transcript -- or better yet, a recording -- you also need a linguist.

Actually, what you need is a semanticist, someone who knows about meaning, not a phonetician like me. Well, as the old song says, I ain't no semanticist, ain't no semanticist's son, but I can resolve your ambiguities til your semanticist comes...

According to the original account of the session by Rony Abovitz (on 1/28/2004),:

At a discussion moderated by David R. Gergen, the Director for Public Leadership, John F. Kennedy School of Government, Harvard University, the concept of truth, fairness, and balance in the news was weighed against corporate profit interest, the need for ratings, and how the media can affect democracy. The panel included Richard Sambrook, the worldwide director of BBC radio, U.S. Congressman Barney Frank, Abdullah Abdullah, the Minister of Foreign Affairs of Afghanistan, and Eason Jordan, Chief News Executive of CNN. The audience was a mix of journalists, WEF attendees (many from Arab countries), and a US Senator from Connecticut, Chris Dodd.

During one of the discussions about the number of journalists killed in the Iraq War, Eason Jordan asserted that he knew of 12 journalists who had not only been killed by US troops in Iraq, but they had in fact been targeted. He repeated the assertion a few times, which seemed to win favor in parts of the audience (the anti-US crowd) and cause great strain on others.

Due to the nature of the forum, I was able to directly challenge Eason, asking if he had any objective and clear evidence to backup these claims, because if what he said was true, it would make Abu Ghraib look like a walk in the park. David Gergen was also clearly disturbed and shocked by the allegation that the U.S. would target journalists, foreign or U.S. He had always seen the U.S. military as the providers of safety and rescue for all reporters.

The various accounts of the event (see below for some links) agree that Jordan said something about U.S. forces "targeting" or "deliberating killing" journalists. And in doing so, he ran afoul of at least three kinds of ambiguity so common that they have conventional names: de re vs. de dicto, attributive vs. referential, and specific vs. generic. These interpretive distinctions may not be taught in journalism school, but they should be.

Since this will get a little complicated, let me put my conclusions up front. (In the court of public relations, the Red Queen's rule applies anyhow: "first the sentence, and then the evidence".) Eason Jordan made a big mistake. He said something whose natural interpretation is incorrect and indefensible. He may have meant to convey the natural interpretation of his remarks, either because he believes it's true, or because he chose to exaggerate in order to express animus against the U.S. forces in Iraq. He may have meant to imply the natural interpretation but to leave himself interpretive room to back up, as politicians often do when they want to play to the prejudices of one group while preserving deniability for another. Or he may not have intended to convey the obvious interpretation of his words at all, despite their inflammatory effect and the prominence of the setting. None of these options is acceptable communicative behavior for the Chief News Executive of CNN.

Let's assume for simplicity's sake that what Jordan said was

U.S. forces in Iraq have intentionally killed 12 journalists.

The key interpretive question is how the meaning of "journalists" interacts with the meaning of the rest of the sentence. As soon as what someone wants or intends comes into the picture, we confront the issues of belief attribution known as the de re/de dicto distinction. It may be true that Oedipus wants to marry his mother, because he wants to marry Jocasta and she is (known to us as) his mother, but false that Oedipus knows that Jocasta is his mother. Under these circumstances, the statement that Oedipus wants to marry his mother -- whatever its philosophical status -- is an egregious violation of elementary journalistic ethics.

As Jay Rosen, from NYU's department of journalism, put it :

"The original account was too ambiguous for me. It had him saying United States soldiers targeted journalists, and then claiming that's not what he meant. He later explained it as: the soldiers were trying to kill these people, but did not know they were shooting at journalists."

Curiously, Rosen seems to think that he's using this reasoning to defend Jordan.

In fact, the ambiguities of Jordan's remarks were more extensive and more subtle. One issue is whether particular individuals (who happened to be journalists) were individually and specifically targeted, rather than being killed as a side effect of some more general action, like dropping a bomb on an enemy position. Another issue is whether the soldiers in question knew that the individuals they were aiming at were journalists, and if so, fired because of this knowledge or in spite of it. Finally, there's the question of whether such targeting (if any) was sporadic and individual, or general and a matter of (official or unofficial) policy.

From the discussions of the event by participants, it appears that Jordan played these ambiguities like a xylophone. Here's Michelle Malkin's report of her conversation with U.S. Congressman Barney Frank, who was on the panel:

Rep. Frank said Eason Jordan did assert that there was deliberate targeting of journalists by the U.S. military. After Jordan made the statement, Rep. Frank said he immediately "expressed deep skepticism." Jordan backed off (slightly), Rep. Frank said, "explaining that he wasn't saying it was the policy of the American military to target journalists, but that there may have been individual cases where they were targeted by younger personnel who were not properly disciplined." [...]

I asked Rep. Frank again if his recollection was that Jordan initially maintained that the military had a deliberate policy of targeting journalists. Rep. Frank affirmed that, noting that Jordan subsequently backed away orally and in e-mail that it was official policy, but "left open the question" of whether there were individual cases in which American troops targeted journalists.

After the panel was over and he returned to the U.S., Rep. Frank said he called Jordan and expressed willingness to pursue specific cases if there was any credible evidence that any American troops targeted journalists. "Give me specifics," Rep. Frank said he told Jordan.

Rep. Frank has not yet heard back from Jordan.

And from Malkin's report of her conversation with David Gergen, the panel's moderator:

First, Gergen confirmed that Eason Jordan did in fact initially assert that journalists in Iraq had been targeted by military "on both sides." Gergen, who has known Jordan for some 20 years, told me Jordan "realized as soon as the words had left his mouth that he had gone too far" and "walked himself back." Gergen said as soon as he heard the assertion that journalists had been deliberately targeted, "I was startled. It's contrary to history, which is so far the other way. Our troops have gone out of their way to protect and rescue journalists."

Gergen mentioned that Jordan had just returned from Iraq and was "caught up in the tension of what was happening there. It's a raw, emotional wound for him."

Gergen said he asked Jordan point blank whether he believed the policy of the U.S. military was to sanction the targeting of journalists. Gergen said Jordan answered no, but then proceeded to speculate about a few incidents involving journalists killed in the Middle East--a discussion which Gergen decided to close down because "the military and the government weren't there to defend themselves."

Gergen also echoed Rep. Frank's recollection that Jordan asserted that there were cases involving journalist deaths where "not enough care was taken by U.S. troops." (Gerard Van der Leun takes a closer look at this spin here.) Gergen said he was approached after the session by European journalists who expressed the belief that American troops were "roughing up" journalists and Iraqi nationals. He also said people left the event "concerned and wanting to know more."

The (mostly right-wing, pro-war) bloggers who have focused on this story have emphasized the fact that Eason Jordan has made similar accusations in the past, which appear to have turned out to be untrue or exaggerated; and that CNN had a pre-war deal with Saddam Hussein not to report his regime's atrocities, in return for protection and access (more commentary here; Jordan's complete NYT essay is apparently reprinted here ).

Mickey Kaus at Slate accuses Howard Kurtz of "doing CNN's damage control". We'll see if it's effective: Trent Lott went down, Dan Rather went down, and it looks to me like Eason Jordan is tottering. This being Language Log, I'll just point out that a good semantics course might have saved him.

Well, probably not. But it would have enabled him to analyze his downfall more perspicuously.

[Update: The first MSM discussion of this story was a 2/6/2005 article by Jack Kelly in the Toledo Blade and the Pittsburgh Post-Gazette. There is also a story by Roderick Boyd in today's New York Sun. Mark Jurkowitz comments in the Boston Globe on a statement from CNN, quoting from it at some length. However, searching for "Eason Jordan" on the cnn.com website does not turn it up. ]


Posted by Mark Liberman at 08:24 AM

February 07, 2005

Blogroll J and K

Jen's WebLog notes that the British Computing Society, in urging UK researchers "to investigate implementing more human-like behavior in computers", failed to mention speech and language. Nassira Nicola at Journal Extime points out that in French, an allophone is a kind of person, whereas in English, it's a kind of sound. Trevor at kaleboel speculates about ethnohistory, as he often does, and offers a brief sketch of his journey by bicycle from Barcelona to Cadiz, which reads roughly like the Cliff's Notes version of a random selection from Cervantes as re-written by Henry Miller. Kerim Friedman at Keywords provides some nice Groucho Marx quotes, in relation to the death of Robert Dwan, producer of You Bet Your Life ( Dwan: Anyway, he's from Hawaii … Groucho: From where? Dwan: Hawaii. Groucho: I'm all right, how are you?). Erika at Kittenishly Doomy Thoughts explains why "[j]ust as Gamera is the friend of all children, Wikipedia is the friend of all linguists"; documents the eggcorn long and sorted (as in "tale")[and see this echo as "sorted past" on the Gear Page]; and quotes John Simon to the effect that descriptive linguists are "a curse upon the race", while swooning over Jesse Sheidlower. (That's Erika doing the swooning, not -- as far as I know -- John.) Ah, the glamour of lexicography!

Posted by Mark Liberman at 09:32 AM

Metapragmatic apologia

Never mind fact checkers and theory checkers, out here in the blogosphere we've got joke checkers.

Let's start with the backstory. Yesterday I referred to an overheard snippet of conversation this way:

Over at In Passing, eve documented a new metapragmatic variable, quoting "a girl to a guy walking down Fulton st" who said "Sure, for values of 'neat' that involve you not getting your security deposit back."

The link on metapragmatic goes to the Jargon File entry for metasyntactic variable, glossed as "[a] name used in examples and understood to stand for whatever thing is under discussion, or any random member of a class of things under discussion". The Jargon File goes on to explain:

Metasyntactic variables are so called because (1) they are variables in the metalanguage used to talk about programs etc; (2) they are variables whose values are often variables (as in usages like “the value of f(foo,bar) is the sum of foo and bar”). However, it has been plausibly suggested that the real reason for the term “metasyntactic variable” is that it sounds good. To some extent, the list of one's preferred metasyntactic variables is a cultural signature.

The "sounds good" part also describes my invention-in-passing of the variant "metapragmatic variable". My reasoning, such as it was, involved several steps, or at least the interaction of several vague ideas. First, the phrase " for values of * that" indexes a subculture that includes programmers as well as mathematicians and others. Second, Michael Silverstein has already used the word metapragmatic to refer to folk reasoning about (reasoning about) meaning in context (in the phrase "metapragmatic ideology", which also sounds good, though I couldn't figure out how to use it in the space and time available). Finally, the originally cited remark was based on treating the interpretation of the word neat as if it were the instantiation of a variable. (I presume, FWIW, that it was in response to some remark like "That was a neat party last night!").

Anyhow, within two hours of posting the item in question, I got a note from Jim Apple:

I don't believe "Sure, for values of 'neat' that involve you not getting your security deposit back," is a use of "neat" as a variable. I think there are two possibilities here:

1. The speaker means "not neat at all", like the old engineers' joke that 2+2=5 for large values of 2.

2. The speaker wants the listener to understand that "neat" occupies a large enough semantic space so that some of the values are not neat enough to get the deposit back.

In either case, the reference to neat depends on its use earlier and on its value as a word. Variables have values that are not so bound. Compare two metasyntactic uses here:

A: I love parties of Canadians.
B: Sure, for large enough n.

A: He uses "adequate" every time he complains to me. "Your foo is not adequate! I am not adequately quxed with your bar! The adequacy of your baz is in question!"

Here, the use of n, foo, bar, qux, or baz is not limited by variables in previous scope. Computer people would call them "implicitly declared."

This is like when the guy on the next barstool taps you on the shoulder and says "I don't believe..." -- sometimes it's best just to say "You're probably right" and move on to another watering hole. Then again, the blogosphere would be a sere and lonely place without intense discussion among strangers about obscure points of interpretation. So...

Look, Jim, here's the first thing. A party is not necessarily as any less "neat" just because it puts the host's security deposit in jeopardy. Security deposits are good, but so is conviviality, and sometimes good things conflict. Engineers deal with this sort of thing all the time under the heading of "joint optimization", right? You and I, as sober adults, may inhabit a world in which apartment-wrecking parties are ipso facto not neat, but this may not be the world of random girls and guys walking down Fulton St.

And in the second place, it's normal in logic and in mathematics for variables to be specified inititially as taking on only certain types of values, or only values in a certain range. The same thing is true for "variables" in most programming languages, which may be restricted as to type or have other conditions placed on their values. I admit that it's not normal to limit the instantiations of a variable to the contextual interpretations of a word -- but the young woman overheard on Fulton St. was using the language of mathematics to express an insight about an aspect of her life not defined by any prior formalism. Our colleagues in the humanities call this a "metaphor", or sometimes a "joke".

And anyhow, you're probably right.

[Update: Jim wrote back with some further thoughts about words and variables, and ended by linking to an older blog post of his own on the "pedantic" setting of some compilers, and remarking that

Oh, bother. My pedantry has surpassed that of the professional pedants. I had a feeling this day would come . . .

Ipse dixit. ]


Posted by Mark Liberman at 07:51 AM

February 06, 2005

Set theory at Green Gables

Looking around for a cheap primer on sets, relations, and functions that might be useful for some of my students in Mathematical Foundations of Linguistics next quarter, I was struck by how the notion of small textbooks has just vanished in modern America (800 pages is commonplace, 400 is positively wimpy, and under 200 is close to unknown). So I tried using the Amazon.com search engine to try and find a text I remembered from decades ago: a little tiny book, very cheap, called Sets and Groups. It was by someone called Green, I recollected. I thought "set theory" and Green might track it down. And as the long list of relevance-sorted results came up, I stared in slack-jawed horror at the screen. Number 9 on the list was the Illustrated Junior Library edition of a 1909 novel for "young adults": Anne of Avonlea, by Lucy Maud Montgomery.

Why? What the hell had gone wrong? That was simple to answer. It was a result of the disastrous shift Amazon.com has made to a default behavior of searching full text content of books instead of standard data like title, author, and subject. On page 146 of Anne of Avonlea, Mrs. Allan has just told Anne to hold on to her ideals, and the story continues:

"I shall try. But I have to let go most of my theories," said Anne, laughing a little. "I had the most beautiful set of theories you ever knew when I started out as a schoolma'am, but every one of them has failed me at some pinch or another."
"Even the theory on corporal punishment," teased Mrs. Allan. But Anne flushed.
"I shall never forgive myself for whipping Anthony."

I can't deny that the word set does occur in that passage, and so does the word theory. (Amazon.com's search engine seems to completely disregard quotation marks.) And green? Simple enough if you know anything about the book. Anne of Avonlea is, of course, one of the Anne of Green Gables novels.

I actually happen to have a plain-text copy of Anne of Avonlea in a little collection of classic novels that I keep online for corpus searching. The book contains 41 occurrences of the lexeme set with or without the suffix -s, 7 of theory, and 74 of green. No denying the presence of the terms I searched for. But of course the value of searching on Amazon.com has been reduced for me, not enhanced. These hilarious unintended results padding out my search result list are just wasting my time.

Even though, I concede, the search did rapidly track down the book I wanted (of course, in the latest edition it has bloated up to 270 pages), and most of the books listed were about math, the full list of 3,595 books results is nonetheless grossly contaminated. I got at least two feminist theory readers ("the failure of the Green Revolution in the Third World, ... ecofeminists eager to add Shiva's theory of the relationship of Western colonialism to their parallel analyses"), a political science text, a book on manufacturing, a book on digital aerial surveying, a reference book on myth and magic, a reader on film... I gave up looking when I found that my list included Judith Butler's Bodies That Matter: On the Discursive Limits of "Sex". I'm a broad-minded fellow, but really. Judith Butler not only puts the word "sex" in scare quotes (if ever a word did not need scare quotes IMHO, it is the word sex), she also uses imaginary as a noun ("The Lesbian phallus and the morphological imaginary") and queer as a verb ("Passing, queering: Nella Larsen's psychoanalytic challenge"). Include me out. Sure, she has plenty of references to theory; and green shows up in a reference to the green carnations Oscar Wilde used to wear (a Victorian code via which homoesexuals used to identify each other, it is said). I don't really care where she uses set (I'm sure she does). I started out looking for an inexpensive math textbook, remember? How did we get to queer studies from there? (Butler's book now shows up as top on my list of recently viewed items at Amazon, of course. Next time I log in there will be a cheery "Hello, Geoffrey K. Pullum!" and a list of books on embodiment and phallocentrism for me to choose among.)

I'll tell you how we got to the lesbian phallus from the theory of sets: by upgrading our technology willy-nilly, that's how. Without me ever suggesting that I wanted this, in fact without even a by-your-leave or providing a way to switch this awful feature off, Amazon started a few months ago to make searching the entire text of books, as well as data about them, the default behavior for the search engine. And on searches with common words, this is a catastrophe. Amazon.com forgot that every upgrade is a downgrade; they were unaware of my view that one of the very last things I want is to have to endure is experiencing tomorrow's technology today.

[Added later: I was wrong only to a very minor extent about Amazon's "These-May-Interest-You" selections. The next time I connected to Amazon's site, I was recommended to take a look not only at numerous books on math but also at Heidi, the story of a young girl who has to go and live with a grumpy uncle in the Swiss Alps, plus (just as I predicted) one of Judith Butler's other books about sex and gender. I told you so. The prosecution rests.]

Posted by Geoffrey K. Pullum at 07:24 PM

Never pronouncing east Thursday?

The web site of El Sol de Zacatecas, a Mexican newspaper, appears to serve up automatically translated English versions of Spanish-language wire service reports. Whatever MT system they're using, it needs a better model of the contingencies of Spanish/English correspondence. According to the English version of the AFP story on Mel Martinez' historic first official use of Spanish on the floor of the Senate, his words "constituted the first speech in foreign language never pronouncing in the Senate by one of their members" ("el primer discurso en idioma extranjero jamás pronunciado en el Senado por uno de sus miembros"), attributing this to "the official transcription published east Thursday" ("la transcripción oficial publicada este jueves").

Now, the meaning of a text may sometimes be hard to pin down, and the correctness conditions of a language may sometimes be difficult to reduce to simple prescriptions, but I hope we can all agree that in this case the English translation is syntactically incorrect, stylistically aberrant, semantically incoherent and also not an accurate reflection of the content of the Spanish original. I'm assuming that this was a computer-generated translation rather than simply a bad translation, because I can't imagine that any well-intentioned human being with access to a Spanish-English dictionary and a working knowledge of the English language would translate "este jueves" as "east Thursday".

As Geoff Pullum has pointed out, nearly all strings of words are ungrammatical. Also semantically incoherent and stylistically aberrant. The College Board should have no trouble finding genuine problems for its "sentence error" questions, even without resorting to MT output.

The (original) Spanish version of the story:

Primer discurso en idioma extranjero de un congresista en el Senado de EEUU
Por: AFP
Publicado: Viernes, 4 de Febrero de 2005 8:39 AM

WASHINGTON - Unas palabras en español pronunciadas por un congresista estadounidense el miércoles, constituyeron el primer discurso en idioma extranjero jamás pronunciado en el Senado por uno de sus miembros, según la transcripción oficial publicada este jueves.

El discurso fue pronunciado por el senador de origen cubano Mel Martinez durante el debate sobre la designación de Alberto Gonzales como fiscal general.

"El juez Gonzales es uno de nosotros, el representa todos nuestros sueños y esperanzas para nuestros hijos...", afirmó Martinez, uno de los dos legisladores de origen hispano en el Senado estadounidense.
Martinez explicó que con sus palabras quiso dirigirse directamente a los electores hispanos para defender a Gonzalez, de origen mexicano, acusado por la oposición demócrata de formular una doctrina jurídica permisiva que habilitó las torturas a prisioneros.

Según el historiador Donald Ritchie, citado por el diario The New York Times, fue la primera vez en la historia que un senador pronunció en Cámara un discurso en un idioma que no sea inglés.
Estados Unidos no tiene lengua oficial, y la minoría hispana es la más numerosa del país, por lo que muchas organizaciones políticas, oficinas y servicios a los consumidores ofrecen servicios bilingües.

The English version:

First speech in foreign language of a congressman in the Senate of the U.S.A.

The WASHINGTON - words in Spanish pronounced by a American congressman Wednesday, constituted the first speech in foreign language never pronouncing in the Senate by one of their members, according to the official transcription published east Thursday. The speech was pronounced by the senator of Cuban origin Mel Martinez during the debate on the designation of Alberto Gonzales like general prosecutor. "judge Gonzales is one of us, represents all our dreams and hopes for our children...", affirmed Martinez, one of both legislators of Hispanic origin in the American Senate.

Martinez explained that with his words it wanted to go directly to the Hispanic voters to defend to Gonzalez, of Mexican origin, accused by the democratic opposition to formulate a permisiva legal doctrine that qualified the tortures to prisoners.

According to the historian Donald Ritchie, mentioned by the newspaper The New York Times, it was the first time in the history that a senator pronounced in Camera a speech in a language that is not English.
The United States does not have official language, and the Hispanic minority is most numerous of the country, reason why many political organizations, offices and services to the consumers offer bilingual services.

[In fairness to the anonymous MT system, the English version is readable, at least to the point of permitting the probable nature of the event to be inferred. But "east Thursday"? Give me a break. ]

[Update: Google's Spanish/English translation is identical, presumably because Google licenses the same MT technology. Whatever that technology is, we can be pretty sure it doesn't use sensible statistical techniques, since in deciding whether "este" should be translated as "this" or "east", when followed by "jueves", a large enough corpus gives a reasonable approximation to common sense just by counting: 693,000 whG for the English-language string "this Thursday", vs. 8,740 for "east Thursday". More sophisticated models are available, but should not be needed in this case.]

[Update #2: Ray Girvan says that the translation in this case (also the same as Altavista's Babelfish) was done by a SYSTRAN engine, as can be seen here. SYSTRAN, as I understand it, is an old-fashioned rule-based transfer system; though what rule maps "este" to "east" in front of "jueves" I can't imagine.]


Posted by Mark Liberman at 04:44 PM

Blogroll F to I

Fernando Pereira at Fresh Tracks suggests that "'Waste not, want not' may not be a good policy for getting out of local maxima". He's talking about neurons, in response to an article about the economic value of irrational exuberance, but there's a linguistic application as well: why a rationally (re-)constructed language would be inefficient. John Yunker at Going Global quips that Microsoft's proposed Windows XP Reduced Media Edition should be called the Windows XP Passive Aggressive Edition: the frame wars come to European software branding! Over at In Passing, eve documented a new metapragmatic variable, quoting "a girl to a guy walking down Fulton st" who said "Sure, for values of 'neat' that involve you not getting your security deposit back."

Posted by Mark Liberman at 08:36 AM

Syntactic and notional number

In reference to my posts on one of the SAT's "sentence error" questions (here and here), Maryellen MacDonald sent email with a sketch of relevant psychological research about number in nouns, verbs and pronouns. This led me to wonder about the difference between underpants and underwear. I'm not referring to Ken Keeler's theory that "the word 'underpants' is 20% funnier than the word 'underwear'" -- as far as I know, this plausible hypothesis still awaits psycholinguistic validation. Rather, Maryellen's note suggested an insight into the contrast between "these underwear" and "these committee".

Maryellen wrote:

The question of why verbs and pronouns tend to vary in the extent to which they agree with collective nouns is a topic of some interest in psycholinguistics. Probably the most relevant paper is this one:

Bock, J. K. Nicol, J. & Cutting, J. C. (1999). The ties that bind: Creating number agreement in speech. Journal of Memory and Language.40, 330-346.

The claim goes roughly as follows: collective nouns are syntactically singular but semantically (or "notionally" in this literature) plural. Subject-verb agreement is largely a syntactic process, described as percolation of number features over a phrase structure during utterance planning. There's little or no influence from the meaning (or notional plurality) of the words. Antecedent-pronoun agreement, however, is more influenced by semantics. For most nouns, there are similar patterns of agreement for both verbs and pronouns, but in the case of collectives, there's a conflict in number between syntactic number and notional number, and thus the verbs and the pronouns have different distributions of agreement patterns with collectives.

I've described this approach briefly enough that I probably haven't made it sound much better than a redescription of the data, but in fact there's a very large agreement literature in language production, and Bock and colleagues' views capture quite a lot of what turns out to be very complex patterns of greater or lesser tendencies to agree across different situations. Still, my own views lean more toward a constraint satisfaction view of subject-verb agreement, rather than the more purely syntactic view. A paper about collectives from this perspective, is

Haskell, T.R. & MacDonald, M.C. (2003). Conflicting cues and competition in subject-verb agreement. Journal of Memory and Language, 48, 760-778.

We didn't discuss verb vs. pronoun agreement in this paper, but Bock's idea about greater weight to the notional plurality in pronoun agreement could be incorporated. Of course there needs to be a story (on anyone's account) for why pronoun agreement is more semantically-influenced than is verb agreement, and I at least don't know of a satisfying answer as of yet.

Here's the abstract from Bock, Nicol and Cutting 1999:

Coherence in language relies in part on basic devices like number agreement. To assess meaning-based (notional) versus form-based (morphological) control of number agreement, we examined how speakers created number agreement for collective nouns, which can carry conflicting notional and morphological number. The agreement targets were verbs and two types of pronouns, produced in the course of a sentence-completion task. Comparisons of the verbs and pronouns indicated that verbs tended to reflect the morphological number of the collective controller, whereas pronouns were more likely to reflect the notional number. This argues that the number features of pronouns may be retrieved under control from the speaker's meaning, while the number features of verbs are more likely to be retrieved under control from the utterance's form. The implication is that the retrieval of words during language production is influenced by two distinct types of information, consistent with an inflectional account of agreement.

The experiments described in this paper also connect with the recent discussion here about the "agreement with nearest" phenomenon, because the materials used include subject phrases like

The actor(s) in the soap opera(s)
The cast in the soap opera(s)

The theory in Bock, Nicol & Cutting -- that verb agreement is basically syntactic, while pronouns are "notional" or semantic -- suggests that pluralia tantum should be considered both syntactically and "notionally" plural, since pants, scissors, clothes etc. usually take both plural verbs and plural pronouns. I would say "these pants are missing their price tag", not "*this pants is missing its price tag". But what about the contrast between underpants and underwear? These seem pretty much the same to me "notionally", but they generally take different pronouns as well as different verbs, at least according to my intuitions:

If your underpants are (*is) dirty, you should wash them (*it).
If your underwear (*are) is dirty, your should wash (*them) it.

Being an empirical sort of person, I did a few quick web searches to check my intuitions on this point. Most of the examples of "underwear are" or "underpants is" are irrelevant:

Marbles in my Underpants is definitely a much darker book than The Soap Lady, but in many ways it's almost a companion piece.
Demonic Underpants is the single most important development in the history of music since ANkST & ANkHS.
Whether Mr Morrison insists his mother wears racing driver's underpants is a matter that still requires clarification.
The silk fibres of pure silk in the silk long underwear are a natural, breathable insulator, so you won't overheat.

However, there are plenty of examples suggesting that "notional" number (and/or random fluctuations) play a role here as well:

Bravado Underwear are non-returnable and non-exchangeable, for hygienic reasons.
The Rugged Bear Girls Long Underwear are flame retardant, 100% Cotton and machine washable.
I drove home thinking about how these underwear would feel under my cargo shorts.
Sexy silk underwear are great at the gym.
This underpants is made of Polartec® Power Dry®, a patented fabric with two unique surfaces.
Yeah, well short underpants is better than no underpants at all, Chewie.
Dude. Girls' underpants is where Cooties COME FROM!
MAYOR: (Nick Maloney) Aaargh! Oh, my God! Get this underpants off me! [Burst of theme music]. ANNOUNCER: Don't miss Attack of the Killer Italian Y-Fronts.

Note that there is variation not only in verb agreement, but also in agreement with demonstratives -- this vs. these. This is an interesting difference between collective nouns and pluralia tantum. As far as I can tell, no competent speaker of English is ever seriously tempted to use these committee or these family as ad sensum plurals, while it's relatively easy to create contexts in which these underwear or this underpants are natural outcomes.

The complexity of interactions between form and context in this area (even leaving out undergarments) motivate the framework proposed in Haskell and MacDonald 2003, whose abstract explains:

Traditional theories of agreement production assume that verb agreement is an essentially syntactic process.
However, recent work shows that agreement is subject to a variety of influences both syntactic and non-syntactic, which raises the question of how these different sources of information are integrated during agreement production. We propose an account of agreement production in which several information sources contribute activation to singular and plural verb forms. Conflict between cues leads to competition which can in turn magnify the influence of subtle cues. Three fragment completion experiments tested key predictions of this constraint satisfaction approach. Experiment 1 demonstrated competition effects on verb choice and sentence initiation latencies. Experiments 2 and 3 demonstrated that conflicts between semantic and grammatical cues allow morphological regularity to exert a small but detectable effect on agreement. These results suggest that the constraint-satisfaction framework may provide a productive approach for understanding agreement production.

Haskell and MacDonald also have some interesting new things to say about the "agreement with nearest" issues:

One important issue in the study of agreement production is how local nouns come to influence agreement. In the traditional approach to agreement, attraction errors occur when features originating on the local noun are mistakenly allowed to percolate to the top of the subject noun phrase (Vigliocco & Nicol, 1998). In the constraint satisfaction approach, however, factors should influence agreement to the extent they exhibit a reliable correlation with verb marking. For correct agreement, local noun number is largely independent of verb marking, which would seem to predict that local noun number should play no role. However, there is an interesting subset of cases for which local noun number, rather than head noun number, appears to govern agreement: compare a bunch of marbles were rolling around the floor to a bunch of sand was blowing around. In such expressions, the head noun (which is often a collective) acts much like a quantifier (Michaux, 1992).

Thus, this particular construction (a <singularnoun> of <pluralnoun>, hereafter the 'a number of' construction) is correlated with the use of a plural verb. In the constraint satisfaction framework, the more similar a given construction is to this one, the more strongly the plural verb form will be activated. For a highly dissimilar phrase such as my winter jacket, the plural verb form will not be activated at all, and plural verbs should be produced very rarely. In contrast, a more similar construction such as the key to the cabinets will result in the plural form being slightly activated, and plural verb forms should occasionally be produced—which is in fact the case (Bock & Miller, 1991). Greater similarity to the 'a number of' construction should make the production of plural verbs even more likely. Note that this proposal essentially amounts to the claim that distributional information, which has been shown to play a prominent role in language comprehension, also has an impact in production as well. This possibility was considered in more detail by Thornton, Haskell, & MacDonald (2001).

Read the whole thing!


Posted by Mark Liberman at 06:59 AM

February 05, 2005

Blogroll A to E

OK, I made it through E. In our blogroll, that is. Having a couple of minutes to spare, I thought I'd check out the latest that the linguablogosphere has to offer, and here are the fruits of my labors on the first five letters of the alphabet.

Claire at Anggarrgoon has her sights on "on accident", as a well as a fine on-air agree-with-nearest example (".. amongst them are Australian citizen P___ P_______."); Ryan at The Adhumlan Conspiracy draws our attention to the fact that Jesse Sheidlower mentioned HPSG binding theory in Slate; Marc at bLing Blog wonders if Sturm and Drang might be a calquecorn; Blogos discusses the history of localization of propaganda; C. Callosum dissects a curious piece of public relations ("...all utterances made in the entire world have been catalogued within a 400 phoneme range..."); Marc at Close Range lists the Top 10 works in the philosophy of language in the 20th C., rounds off to 15, and then adds 14 more for good measure (and Jerry Fodor still doesn't make it? I'm just asking, is all...); Des von Bladet wonders about isoglyphs in Central Asia (there's some historical information here, Des); The Discouraging Word has started usage polls (88% preferred the new-fangled sense of notorious); Grant at Double Tongued Word Wrester rassles with unbwogable, HoYay and Patel shot, among many others; Rethabile at On English highlights a new (to me) blog, Langue sauce picante, thereby adding another step to the long blogroll I have yet to climb; at Experimental Linguistics, Lo discusses the use of nicknames in Brazil and the fate of a Canadian rap group called Fatal Phonetics, and W1ll13 30% Hacker debunks the legend of the brass monkeys.

21 letters to go -- but I have to go shopping for dinner. I'll continue the journey tomorrow. Apologies to anyone I've missed; this dense, scholarly blogging stuff is hard to do in a hurry.


Posted by Mark Liberman at 03:29 PM

Collective nouns with singular verbs and plural pronouns

Some people at alt.usage.english recently discussed my post on the SAT's "sentence error" questions, dealing with the example

After (A) hours of futile debate, the committee has decided to postpone (B) further discussion of the resolution (C) until their (D) next meeting. No error (E)

R H Draney argued that

"the committee" may be either singular or plural according to the customs of one's land...but the die is cast before letter (D) when the writer chooses "has decided"; this tells us that the sentence lives in a world where collective nouns are grammatically singular, and "their next meeting" conflicts with this information...the correct response must therefore be (D)....

This plausible-sounding perspective agrees with the American Heritage Dictionary's usage note on collective nouns:

In American usage, a collective noun takes a singular verb when it refers to the collection considered as a whole, as in The family was united on this question. The enemy is suing for peace. It takes a plural verb when it refers to the members of the group considered as individuals, as in My family are always fighting among themselves. The enemy were showing up in groups of three or four to turn in their weapons. In British usage, however, collective nouns are more often treated as plurals: The government have not announced a new policy. The team are playing in the test matches next week. A collective noun should not be treated as both singular and plural in the same construction; thus The family is determined to press its (not their) claim. Among the common collective nouns are committee, clergy, company, enemy, group, family, flock, public, and team. [emphasis added]

However, I'm going to venture to disagree with both Draney and the AHD, at least in part, although I share most of their analytic assumptions.

Like most Americans, I prefer singular verb agreement for collective nouns like family and committee, unless the meaning of the phrase emphasizes semantic multiplicity, as in "My family all live in North America". When the meaning is neutral or emphasizes unity, I strongly prefer the singular: "My family is gathering in Philadelphia for Thanksgiving". However, I can't imagine writing or saying "#My family is gathering in Philadelphia for Thanksgiving, and I'm preparing a traditional Thanksgiving meal for it." The problem is not that the sentence is ungrammatical, but rather that it doesn't say what I mean. I prepare the meal for them, not for it. I object strongly to a "rule" that gives me only two choices:

#My family is gathering in Philadelphia, and I'm preparing a Thanksgiving feast for it.
#My family are gathering in Philadelphia, and I'm preparing a Thanksgiving feast for them.

Neither of these sentences expresses what I would have wanted to say last November, which was:

My family is gathering in Philadelphia, and I'm preparing a Thanksgiving feast for them.

If someone's logically-concocted "rule" -- an external critique in Glen Whitman's Hayekian terminology -- tries to stop me from saying what I mean in this case, I perceive it not as a principle to be learned and obeyed, but as a tyranny to be resisted. I might choose to sidestep the issue by writing "The members of my family are...", but that is a cowardly if convenient accomodation.

A bit of web search suggests that most Americans share my patterns of usage, while also offering some small comfort to Draney and the AHD.

The contingency table for the various instantiations of the pattern "family is|are * to * its|their" suggests that the web prefers is to are and their to its, and also that there is an interaction in the direction the AHD recommends:

  to * its to * their
family is *
family are *

I suspect, however, that the interaction is not mainly due to effective belief in a "rule" about consistency of syntactic treatment, but rather is a side effect of consistency of semantic intent. And in any case, mixed-number cases are commoner than consistent-number cases, 5,135 to 4,382.

The counts are from Google, so caveat lector. Here are the patterns so that you can check the hits yourself (and of course I do know that some of the hits in each category are irrelevant to the question):
1. {"family is * to * its"}
2. {"family is * to * their"}
3. {"family are * to * its"}
4. {"family are * to * their"}

The typically American pattern of singular verb and plural pronoun (case 2) is commoner than any of the other three cases, and many of the examples strike me as unexceptionable:

This family is able to reduce their college expenses by over $85,000 dollars.
Food grows scarce, and the family is forced to slaughter their ox and eat it.
This family is thrilled to have their baby girl home from Kazakhstan!
Summer rolls around, and a working family is able to get their child into a decent all-day summer camp program.

It's plausible to argue that verb-agreement number and pronoun number should logically be the same within a given passage (the AHD says "construction", but this seems to be a matter of gradient salience, not grammatical principle). However, Norma Loquendi doesn't agree with this notion, no matter how logical it may seem, and my intuitive reactions are with her. In such a case, we have a choice: logic or custom? elite theory or common practice? rational reconstruction or spontaneous order? I'll stand with Hayek in siding with the spontaneous order of common practice, whose logic is usually more subtle and effective than some armchair expert's superficial rationalization.

Counts from contexts where the syntax is reasonably well constrained ("so my family is|are..." etc.) suggest that family takes singular verb agreement about 90% of the time, ceteris paribus:

is all are all was all were all
so my family __
so your family __
so her family __
so his family __
so our family __
so their family __
so the family __

*102 copies of a movie review ignored.

Inspection of the hits makes it clear that the British/American divide is an important factor. As a result, if we limited the counts to American pages, the preference for singular verb agreement would be even stronger. But the effect of adding "all" at the end -- doubling or tripling the proportion of plural agreement -- does show that semantic factors are also relevant, just as the AHD's usage note says.

I haven't found a satisfactory way to estimate Americans' quantitative preference for plural pronoun usage in reference to collective nouns like committee and family, but I suspect it's almost as strong as our (more than) nine to one preference for singular verb agreement in the same cases. These two strong preferences can be seen not only in everyday discourse, but in essays, novels and poems by respected authors. Here are a few relevant samples of poetry from LION (emphasis added throughout):

They don't play good soldiers
unless at attention or lying dead, rusting
behind his grandfather's tool shed. No wonder
everyone gave them up. Behind the glass
the peanuts have turned green.
A few green pennies jam the works.
He thinks of the family joke, his uncle's fortune.

One holds an ant colony. Shined up
it's worth a nickle to see. His family
crowds into the tool shed
, amazed
at the thousands of ants moving under the glass.
They wonder how he has done it.
It's a secret, he says. You have to train them.
He bangs on the glass and they all go crazy.

Bensko, John, 1949-: Uncle Robert's Peanut Vending Machines [from Green Soldiers (1981), Yale University Press]

This poem is not improved by changing crowds to crowd; and changing "they wonder how he has done it" to "it wonders how he has done it" is a disaster.

What can I say of the house now that the house
is over---what can I sing of the bridge
now that my family is on the other side,
where the birds finally tune the shadows
with their songs, and the lights need only
brighten for a moment, for there is no darkness
in their house
, only light, the causes of
light, the moment of memory when the
past pronounces the future, "so long," the leaves
wave, the sea waits for someone and someone
else ...

Burkard, Michael, 1947-: The Moment of Memory [from Fictions from the Self (1988), Norton]

Again, changing is to are after family would be unidiomatic at best, while writing "there is no darkness/in its house" would be bizarrely dehumanizing.

But now, once more, and face to face,
In happiness we meet, wife;
And through your care and God's sweet grace
Our family is complete, wife!
From valleys, mountains, snows, and sands,
From city streets and forest lands,
They come to clasp your yearning hands.

Carleton, Will, 1845-1912: The Festival of Family Reunion. [from City Festivals (1892)] Scene III, lines 56-62.

In this case, "our family are complete" would be nonsense, and "from city streets and forest lands/it comes to clasp your yearning hands" turns the poem into something out of H.P. Lovecraft. That might be an improvement, but it's hardly what the author had in mind.

There are several thousand other examples in LION's archive of American poetry. If there's someone out there who hasn't had enough of this yet, please feel free to classify and count them. For my part, I'm done.

I'll give Walt Whitman the last word. He never used family in the relevant kind of structure, but here's his take on group:

On my northwest coast in the midst of the night, a fishermen's group stands watching;
Out on the lake, that expands before them, others are spearing salmon;
The canoe, a dim shadowy thing, moves across the black water,
Bearing a Torch a-blaze at the prow.

Whitman, Walt, 1819-1892: THE TORCH. [from Leaves of grass (1872)]

You could argue that them refers to fishermen rather than fishermen's group -- though that is a violation of the equally spurious "genitive antecedent" prohibition -- but in my opinion, the passage is fine as it stands, and remains fine if "fishermen's" is changed to "fisherman's", or omitted altogether.

[Update: I originally wrote alt.usage.english instead of alt.english.usage -- or was it the other way around? Anyhow, I think it's correct now, thanks to Benjamin Zimmer, who wrote:

Just a heads-up on your blog entry, "Collective nouns with singular verbs and plural pronouns"... You say, "In a discussion on alt.english.usage...", when in fact the newsgroup in question is alt.usage.english. Those are actually two different newsgroups, and the regulars take the distinction between AUE and AEU quite seriously (the narcissism of small differences and all that).


[Update #2: Abnu at Wordlab writes:

I wrestled with this problem earlier this week when writing a post titled "Dumb and Dummer" over at Wordlab. I settled on the "ungrammatical" construction:

"...the board of directors of the academy thinks they're smarter..."

knowing full well that the "rules" gave me only two choices:

"...the board of directors of the academy thinks it's smarter..."
"...the board of directors of the academy think they're smarter..."

For the reasons argued in your post, I considered this a "tyranny to be resisted" myself, and used the contrarian construction that expessed my meaning. I have to admit, the whole thing gave me pause at the time, and I haven't slept well since. If only I'd had the benefit of your analicious.

Thus conscience does make cowards of us all. ]


Posted by Mark Liberman at 12:30 AM

February 04, 2005

For linguists only

Like other specializations, linguists have their song parodies, mostly incomprehensible to outsiders. Penn's (grad student) Linguistics Club has collected examples for several years, and I'm sure there are many others out there.

Among Dave Perlmutter's many talents are a fine singing voice and a remarkable facility for extemporizing song parodies. A few years ago, I asked Dave if he'd ever noticed that semanticists tend to wear leather -- pants, vests and even hats as well as the more conventional coats and jackets. Without hesitation, he responded (to the tune of The Streets of Laredo):

"I see by your outfit you're into semantics,"
These words he did say as I slowly walked by.
"Come set down beside me and talk about meaning,
For I'm a young linguist and I know I must die."

I believe that there were other verses, but neither of us can remember them.

At the recent LSA meeting in Oakland, I asked Dave if he'd adapted any other songs, and he sang me one that he wrote to the tune of "I'm dreaming of a white Christmas":

I'm dreaming of the old syntax,
Just like syntax I used to know,
Where the trees all glisten
And students listen
To hear where moved NP's go.

I'm dreaming of the old syntax
With each construal rule I write.
May your days be merry and bright,
And may all your hypotheses be right.

Dave explained in email: "It was written at the time when rules of construal were the latest thing. That reference should be changed to something more relevant now."

It's a different sort of cultural adaptation to re-interpret existing works without changing any of the words. In reference to a discussion of encounters with celebrities, Jason at Finches' Wings recently quoted Sonnet for Minimalists by Mona Van Duyn. Neither Mona nor Jason had linguists in mind, and the season for this poem has not quite arrived yet, but I'll echo it to end this post all the same:

From a new peony,
my last anthem,
a squirrel in glee
broke the budded stem.
I thought, where is joy
without fresh bloom,
that old heart’s ploy
to mask the tomb?
Then a volunteer
stalk from sour
bird-drop this year
burst in frantic flower.

The world’s perverse,
but it could be worse.

[Update 2/5/2005: Q Pheevr supplies several other linguistic filk, including one, previously unknown to me, in which I'm mentioned though not used. ]


Posted by Mark Liberman at 08:11 AM


I now see, and this surprised me, that Geoff Pullum, though he may frequently break new ground on the incivility front, has yet to take his rantmanship to the highest level. Indeed, the linguist who can master the rant genre is a rare beast, for as a group we have an unfortunate tendency to attempt reasoned argument when wild stacking of unrelated quotations and psuedo-factoids would have done the job so much more admirably.

Ladies and gentlemen, it is courtesy of Frankie Roberto and his excellent weblog which first introduced the wider world to a latin lover without whom our lives were ordinary that I am able to bring you today's distinguished perorator, a denizen of the far off blighty ol' shores for which Geoff and I weep through every wretchedly sun-drenched California day, and a man for whom, as you will see, no position on the value of descriptivism as against prescriptivism is too simplistic, for whom any distinction between linguistics and deconstructivist social studies is unnecessary, a man for whom no sequitur is too non, and so on and so forth, until finally this sentence, nay paragraph, comes to a close.

I must ask you to select appropriate audio background, be upsitting, and give it up, absolutely every last frothy globule of it, for my personal ranter of the year Ian Bruton Simmonds, and his wild christmas rant originally bombasticulated before the preposterously named The Queen's English Society, a festival of trouser stuffing entitled A Criticism of Modern Linguistics with Suggestion for Improvement of English through the BBC.

Posted by David Beaver at 01:40 AM

February 03, 2005

Phrasal prepositions in a civil tone

Oh, dear, I've made a copy editor irritable. This isn't going to be a good day. You see, in a recent discussion of Bryan Garner's sadly tradition-mildewed chapter in the magisterial Chicago Manual of Style, I said, in my lofty ex cathedra tone:

on page 188 he describes word sequences like with reference to as "phrasal prepositions" (they aren't)...

Peter Fisk at A Capital Idea promptly bristles:

...if "with reference to" isn't a phrasal preposition, what is it? Apparently, the only people privy to the "correct" terminology are those who plunk down $160 for the 1,800-page Cambridge Grammar of the English Language.

He's dismissing me testily as just a venal terminology-monger! And he adds to this slap a telling extra point in Garner's defense: "He strikes a reasonable balance between the prescriptive and descriptive. And he writes in a civil tone." He's really got me there, hasn't he? I've been accused of a lot of dreadful things, but never of maintaining a civil tone. You should see my first drafts:

Why the hell is it that peopl A lot of morons out there seem to think th It has been drawn to my attention that there is among the ignorant and the unwashed a certain amount of stupid quibbl disagreement concerning...

But let me try. Let me wrestle with my rhetorical demons (they whisper in my ear, in red, "Call him a loony! "). I want to try and provide a civil response to Peter's very reasonable question.

First, Peter: it's not about terminology. I read "phrasal preposition" as a technical description, not just an arbitrary tag with no syntactic import. I take it to embody the claim that things like with reference to and large numbers of others (Garner lists "according to, because of, by means of, by means of, by reason of, by way of, contrary to, for the sake of, in accordance with, in addition to, in case of, in consideration of, in front of, in regard to, in respect to, in spite of, instead of, on account of, out of, with reference to with regard to and with respect to", but big grammars like the one by Quirk and his colleagues list hundreds) are prepositions with a phrasal character, or phrases that function as prepositions. And "phrase" means something here: is on the mat is a phrase, and on the mat is a phrase, but cat is on the is not a phrase.

The Cambridge Grammar takes the trouble to point out that the "phrasal preposition" claim has certain consequences, and those consequences reveal the claim to be false.

These word sequences are not prepositions (they are not words at all), and they are not phrasal (they are sequences of independent words that are commonly kept adjacent, and in some cases they are associated with special meanings, but they don't make up a single part or constituent of a sentence: some bits are in one phrase and some in the next). If that does not suggest to you that talking about "phrasal prepositions" is the wrong way to talk about them, then I hardly know what to say, given this new and unfamiliar policy of keeping a civil tongue in my head.

Second, the discussion in The Cambridge Grammar is not arcane knowledge limited only to those who can slap down $160 on the barrelhead. For those who cannot get to a library or office where it is available, I am always happy to recount and explain, at least a bit. I can do that very briefly here for the case at hand (though I can really only give a smattering of points; the topic is a rich and interesting one, and would suffice to make a lecture of at least an hour).

The case that the sequences in question are not prepositions is overwhelming.

  1. If you call according to a preposition, what on earth is going on in according, I am told, to most authorities in the field ? A clausal parenthetical in the middle of a word? Interrupting a preposition? Parentheticals ("supplements", they are called in The Cambridge Grammar) occur between words, not inside them. (That's exactly why the misnamed "split infinitive" occurs and is — as Garner rightly notes — fully grammatical: to be is not a word, it's two words. So is according to.)

  2. If you call because of a preposition in examples like because of his injury, why is there also a suspiciously similar word of identical meaning in because he was injured ? A word of the same meaning that looks just like the first 7 letters of the alleged word because of ? Aren't we overlooking something here?

  3. And another thing: if you call because of a preposition, what about the fact that there is also a preposition spelled of ? An alleged preposition in which both the first part and the rest look separately like prepositions we already have? This is getting too weird for me.

  4. If you call for the sake of a preposition, you're actually ignoring an occurrence of the definite article inside it. The sake absolutely has to be a definite noun phrase. It can even occur in other noun phrase contexts, e.g., for the sake of peace and harmony. It's a peculiar noun phrase (it has to be definite, and can only occur (a) with a genitive noun phrase determiner, or (b) with the as determiner if the phrase is in complement of for and sake has a preposition phrase complement headed by of), but it's a noun phrase.

No, if you're serious about the notion that "preposition" is the name of a class of words, these sequences cannot possibly be prepositions. So that leaves the notion that they are phrases that function just like prepositions. The trouble is, they aren't phrases at all.

I'll give just one exemplifying argument. Consider in front of. If that's a phrase that functions as a preposition, and has a meaning that is the opposite of behind, then the following contrast is baffling:

[1] Is your car in front of the building, or behind?

[2] *Is your car behind the building, or in front of?

What we actually say is this:

[3] Is your car behind the building, or in front?

Why is that? Because in front of is not a phrase at all. The of goes with the following noun phrase in [1], forming the preposition phrase of the building (the of contributes no meaning of its own, but it's syntactically a part of that phrase, not of what precedes it). In [3], we don't bother to repeat the redundant part, so we leave out the whole phrase of the building. If in front of were a phrase, we would expect [2] to be grammatical, but it isn't.

There are more such arguments. (Consider, for example, why when someone says We can do it by means of intensive lobbying someone else can object, I don't think it's ethical to do it by those means: clearly means is a noun, quite separate from the following (underlined) of phrase, which the second speaker does not repeat.) And I'm not just trying to drum up trade for Cambridge University Press, or to advocate any idiosyncratic terminological replacements, when I say that Chapter 7 of The Cambridge Grammar includes more detailed argumentation: the least a scholar can do when putting forward a claim that some appear to dispute is to say what the claim is, give a sense of the sort of argumentation that supports it, and give a reference to a more serious treatment where full justification may be found. I can't come round to everyone's house with a copy of The Cambridge Grammar and a whiteboard and markers and lay the whole thing out on an individual-instruction basis. I don't know where you all live. (And I have a day job, teaching at the University of California, Santa Cruz.)

The bottom line: things have been discovered about English grammar in the last hundred years. I'm not pressing for new names for time-worn concepts, I'm objecting to the way people treat English grammar as if it were a frozen collection of eternal truths like Pythagorean geometry. The analogy is inapt: Pythagoras's theorem about right-angled triangles is true, and his proof of it is sound. Things are very different with grammar. Mistakes were made in the analysis of English syntax in the 18th and 19th centuries. Bryan Garner's presentation (though it is, I agree absolutely very reasonably balanced between prescriptive and descriptive approaches) sadly reflects none of the progress that has been made toward correcting those mistakes. His description of English morphology and syntax is point for point the same as what you can read in a little book that is beside me as I write: J. C. Nesfield's Outline of English Grammar, published in February 1900, exactly a hundred and five years ago. I'm saying, very civilly, that we can do better than this in the matter of grammatical description, and it's about time major publishing houses and dictionary makers started trying to instead of continuing to repeat earlier centuries' errors.

And Peter Fisk is a loony Stop that.

Posted by Geoffrey K. Pullum at 02:23 PM

SOV emerges in the Negev

The February 1 NYT has an article by Nicholas Wade entitled "A New Language Arises, and Scientists Watch It Evolve".

The language, known as Al Sayyid Bedouin Sign Language, is used in a village of some 3,500 people in the Negev desert of Israel. They are descendants of a single founder, who arrived 200 years ago from Egypt and married a local woman. Two of the couple's five sons were deaf, as are about 150 members of the community today.

The clan has long been known to geneticists, but only now have linguists studied its sign language. A team led by Dr. Wendy Sandler of the University of Haifa says in The Proceedings of the National Academy of Sciences today that the Bedouin sign language developed spontaneously and without outside influence. It is not related to Israeli or Jordanian sign languages, and its word order differs from that of the spoken languages of the region.

There does not seem to be any article on this topic in the Feb. 1 issue of PNAS, nor could I find anything in the "PNAS Early Edition" papers available on the PNAS web site, so perhaps the press release jumped the gun a bit.

Wade presents more details via quotes from Mark Aronoff, who is certainly a reliable source:

Two special opportunities to study a new language and identify its innate elements have recently come to light. One is Nicaraguan sign language, a signing system developed spontaneously by children at a school for the deaf founded in 1977 in Nicaragua. The other is the Bedouin sign language being described today. Sign languages can possess all the properties of spoken language, including grammar, and differ only in the channel through which meaning is conveyed.

Two features of the Bedouin sign language that look as if they come from some innate grammatical machinery are a distinction between subject and object, and the preference for a specific word order, said Dr. Mark Aronoff of Stony Brook University, an author of today's report. The word order is subject-object-verb, the most common in other languages. Dr. Aronoff said that the emergence of a preferred order was the critical feature, and that it was too early to tell if subject-object-verb is the particular order favored by the brain's neural circuitry.

Of course, even in the New York Times, quotations are not always an accurate reflection of the communicative intent of the person quoted -- see here for a previous example.

In particular, it's not clear what Mark (as quoted by Wade) means by "a distinction between subject and object" coming from "innate grammatical machinery". Wade quotes Ann Senghas as saying that "the preferred word order in Nicaraguan sign language kept changing with each cohort of children", but that Nicaraguan sign "has now acquired the signed equivalents of case endings, the changes used in languages like Latin to show if a word is the subject or object of a sentence", while the Bedouin sign language "has not yet acquired case endings". Since Spanish doesn't have case endings, the Nicaraguan development seems to be an example of spontaneous emergence of this type of morphological marking, independent of cultural transmission.

But I'm not clear why would one think that the subject/object distinction in Al Sayyid sign language "is internally generated with the help of genetically specified neural circuits that prescribe the elements of grammar", as opposed to being "transmitted just through culture, as part of the brain's general learning ability". That argument might apply with respect to the emergence of SOV word order, which is not typical of any of the other languages spoken in the area (as I understand it, Standard Arabic is viewed as VSO, and the modern Arabic colloquials as SVO; classical Hebrew is also viewed as VSO, and modern Hebrew as SVO; although there are many variant orders possible in all cases...). But surely many instances of the subject/object distinction are locally available for cultural transmission.

Since Mark Aronoff is a clear-headed, sensible and careful scholar, I'm inclined to apply my usual rule in cases of attributional abduction: blame the journalist. I'll have more to say after checking with Mark and/or finding the PNAS publication.

[Note: the usual method for producing stable links to NYT articles doesn't work for Wade's piece, so the link given here will expire after a few days.]

[Update 2/6/2005: Mark Aronoff writes:

Sorry not to have replied earlier, but we were so inundated that, well you know the life of a star. Actually, we were all in San Diego working on ABSL.

The paper will appear online next week, I believe. Just keep checking the PNAS website, as we have been doing daily. Once it is up, it will be freely downloadable.

The long and the short of the story is that the language does have sentences with >1 argument (unlike early Nicaraguan SL) and that the arguments are almost always in the order S IO O followed by V. One could say that the arguments are ordered by a semantic hierarchy, but then one would have to map that hierarchy onto some linear order anyway, with the higher argument having linear precedence. We feel that this is equivalent to positing S, IO, and O, though others may differ. What really matters to us is the consistent linear order, which is what we say in the PNAS piece. We don't care that it is this particular order, and we certainly saying nothing about neurons (that is the NYT talking).

As for external influence, these people are profoundly deaf from birth and appear to have no interaction at all with spoken language (except, of course, for those who are hearing, but the language began among a group of deaf cousins).


Posted by Mark Liberman at 06:42 AM

February 02, 2005

The Chicago Manual of Style --- and grammar

In the 1890s a proofreader working for the University of Chicago Press prepared a single sheet of guidance on typographic fundamentals and house style. It was augmented over time, and grew into a full style manual. The latest version was published in 2003 as the 15th edition of The Chicago Manual of Style. From the first sheet with printing on it to the last it has xviii + 958 = 976 pages, an increase in bulk of almost three orders of magnitude from that original information sheet. I finally ordered the 15th edition at the LSA book exhibit in January, when I saw that it included a new 93-page chapter on ‘Grammar and Usage’. My copy just arrived. Unfortunately, I now see, the new chapter does not represent an improvement.

The Chicago Manual of Style (CMS) is an unparalleled resource for those engaged in publishing, particularly of academic material. But the Press decided to farm out the topic of grammar and usage, and the writer they selected was Bryan A. Garner, a former associate editor of the Texas Law Review who now teaches at Southern Methodist University School of Law and has written several popular books on usage and style. His chapter is unfortunately full of repetitions of stupidities of the past tradition in English grammar — more of them than you could shake a stick at.

Presenting a representative sample would take a long time. Suffice it to say that on on page 177 he appears to claim that progressive clauses are always active (making clauses like Our premises are being renovated impossible); on page 179 he states that English verbs have seven inflected forms, including a present subjunctive, a past subjunctive, and an imperative (utter nonsense); on page 187 he reveals that (although he agrees, like every other grammarian, that the misnamed "split infinitive" is grammatical) he thinks that the adverb is "splitting the verb" in this construction (it isn't; it's between two separate words); on page 188 he describes word sequences like with reference to as "phrasal prepositions" (they aren't); and so it goes on and on. (I'm not asking you to just accept my word that these are analytical mistakes. Full argumentation on these points, and alternative analyses that make sense, can be found in The Cambridge Grammar of the English Language, a work that was available in published form a full year before the Preface was added to the 15th edition of CMS. A few days of revision would have sufficed to remove the blunders from Garner's chapter.)

When the University of Chicago Press started on the revisions that led to CMS 15, they could have lifted the phone and made an on-campus call to the late, great James McCawley, a professor in the Department of Linguistics there throughout his long career, and an author of many books with the Press. They could have asked him for advice. They did not, clearly. McCawley knew the field of English syntax as well as anyone alive, and would perhaps have offered to do the chapter himself, or to read and critique the chapter when it was submitted, or to advise them on who might be chosen to do write it. But once again, people who had ample opportunity to get expert help in dealing with a quintessentially linguistic question of great importance made their decisions without (it seems) consulting anyone in the one field focused on matters linguistic. (I say "once again" because I'm thinking of Mark's recent masterful critique of the College Board and its ignorant policies in designing putative tests of grammar knowledge.) They commissioned a tired rehash of traditional grammar repeating centuries-old errors of analysis instead of trying to obtain a more up-to-date presentation. A real lost opportunity that has lessened the authority of a wonderful reference book, one that on topics from punctuation to citation to indexing to editing can really be trusted. Just avert your eyes from the grammar chapter; while not completely without merit (it moves on from Strunk and White), it just isn't trustworthy in the way the rest of the book is.

Posted by Geoffrey K. Pullum at 12:40 PM

Are linguists natural-born Hayekians?

Geoff Pullum recently argued for a "third position" between the "two extremes: on the left, that all honest efforts at uttering sentences are ipso facto correct; and on the right, that rules of grammar have an authority that derives from something independent of what any users of the language actually do." Glen Whitman at Agoraphilia suggests that Geoff's position (which he says "strikes me as clearly correct with respect to language") is similar to Hayek's attempt to "find organic principles ... that we can use to understand what is 'right' or 'wrong'" in "spontaneous orders other than language, such as markets, etiquette, morality, and common law".

Glen writes:

What kinds of criticisms are valid in the context of spontaneous orders, if we accept Pullum’s argument? One interpretation is narrow: as scientists who wish to understand the social world, we are bound to accept internal principles of the systems we wish to explain as a positive matter. But as a normative matter, we are free to make external critiques as long as we’re honest about what we’re doing. Thus, a linguist might recognize a particular rule of English grammar as “valid” (inasmuch as it describes the language as it is), but then criticize that rule of grammar because it fails to satisfy some external criterion such as logical clarity.

Another interpretation is broader: only internal critiques are valid. In the context of the rules of just conduct, Hayek takes this position in The Mirage of Social Justice ...

Hayek’s argument hinges on two aspects of his thought – first, his severe doubts about the ability of human beings to fully comprehend the functionality of their social norms (an epistemological position); and second, his belief in an imperfect but usually beneficial process of cultural evolution. If one doubts either of these positions, external critique might seem more sensible.

My inclination, which I cannot fully justify here, is that both internal and external critiques can be valid and useful, but internal critiques are safer and more trustworthy, because they don’t require superhuman cognitive abilities.

I believe that most linguists agree with Geoff's "third position", and also tend to take the view that Glen attributes to Hayek, that the only valid critique of a "spontaneous order" such as language is "immanent criticism". Thus it seems fair to say that in their area of professional expertise, linguists have a natural affinity with the most important philosopher of the political right.


Posted by Mark Liberman at 08:45 AM

I just can't phantom it

Amanda Seidl sent in a classic malapropism:

I was in NYC with a friend having breakfast at one of those diners where you sit at communal tables and this woman sitting next to us kept going on and on about how she could not phantom this or that. When we left I turned to my friend and said, "Isn't that an interesting way to use phantom? I never knew it meant anything like fathom." She said (of course), "you idiot! She meant fathom!" I've heard it a few times since then and google turns up quite a few more --maybe now that Phantom of the Opera (or if you prefer Fathom of the Opera) is now a movie we'll get even more.

Net search indeed turns up quite a few examples of this creative mis-hearing:

When I told people I only walked and never ran, they always had the same reaction: "I just can't phantom it."
I apologize but it angers me so much... I just can't phantom it. I should have read a bit more carefully before accusing.
... I can't phantom why she is allowed to endorse a product she can't use with hair that isn't hers.
However, I can't phantom why he would support a conspiracy theory spouting terrorist sympathizer like Kucinich.
Personally I can't phantom why anyone could take the US educational system seriously enough to want to study there.
New features are fine with me and should be with everyone, I like features - and I for one can't phantom why this discussion is even happening.
If image exports carry transparency (and I can't phantom why they wouldn't), you can make some hella-serious titles from this which can then be loaded into Imaginate or directly into your NLE.
Personally, I can't phantom going without a shower every morning.
I'm new to this site and I hope talking with others will help,because people who have never had phobias or fears with the likes of me/you,or people who have never had the extreme horror of a panic attack just can't phantom,or begin to comprehend what we have to deal with every day.
Dawn jumps at the sound and looks at the object, not able to phantom how it reached its current destination.
A very few children might be able to phantom the nuances of this book.
Actually, I am not able to phantom why a person wouldn't be grafted back to salvation.
He could talk in Latin, English, Portuguese, and Russian; was able to phantom great mysteries.
Yes, a ring and a certificate of marriage means a lot, a lot more than you will ever be able to phantom, my dear, until you have experienced wearing that shoe.
I startled up with surprise, eyes glazed as I tried to phantom what possibly could have gone wrong.
Sometime it's hard to phantom how search engines work.
I would expect a better judgement and leadership from the good Dr Beyene. I find it hard to phantom him being opposed to the idea of national reconciliation.
Twenty people joining together in the move to such a preposterous state of mind is hard to phantom-but this does happen.
The perpetuation of Ukraine's mafia state is a formula for disaster that could lead to protracted violence, possible fracturing not necessarily along the East-West fault-line, and scores of nightmarish scenarios difficult to phantom.

And many more. As is often the case with such errors, the original (fathom) doesn't make any modern-day sense. Phantom isn't a lot more transparent as a colorful way to say "understand", but maybe there is some association with things that you think you see but can't grasp concretely.

[Update: Keith Ivey points out that "phanthom" (8,150 whG), "fanthom" (9,590 whG), "phathom" (3,430), etc., may be relevant intermediate forms. ]


Posted by Mark Liberman at 07:20 AM

February 01, 2005

More dubious grammar testing

In response to my post on the SAT's badly-designed grammar questions, Ray Girvan points out "a recent MetaFilter thread about the online test for the Blue Book of Grammar and Punctuation (http://www.metafilter.com/mefi/39080), whose author similarly makes particular choices on disputable constructions".

Genuine grammatical or semantic errors exist even in well-edited texts, and finding such errors can be a real test of grammatical and stylistic acuity. It's even easier to find uncontroversial typos, grammatical errors, malapropisms and other infelicities in unedited texts written by less skilled writers, or by writers in a hurry. Examples could be selected to test any desired level of verbal ability and familiarity with linguistic norms. And you could even test the ability to identify features that are inappropriate for a given genre or register, from easy things like contractions and slang, to more subtle things such as the use of anarthrous noun phrases in fiction as opposed to journalism.

Given the wealth of indubitable solecisms littering every textual streetcorner, it's strange that the people who make up grammar tests insist so often on picking examples where the norms are at best subject to dispute, and the "rule" that selects one of the alternatives is doubtful if not totally bogus.


Posted by Mark Liberman at 12:31 AM

Test taking as audience design?

In my post on the questions that the SAT calls "Identifying Sentence Errors", I complained that the "decision about how to answer becomes a judgment about the linguistic ideology of the College Board, not a judgment about English grammar and style". In response, Mike Albaugh wrote

Long ago and far away, I took a California Civil Service Exam, with an eye toward getting a summer job as a computer operator or some such. By the fourth or fifth question, it was clear that the test had not been updated in quite a while. OK by me, as I had taken my first C.S. classes at a bit of a backwater community college, and so, in 1970, could answer the test "correctly" for 1960.

The wheels of Civil Service ground too slowly to match me up with a job before the end of that summer, but did grind finely enough to offer me a position at the then-princely sum of $14K/year. I pondered a while before deciding that continuing on to college would be a better long-term prospect, although it was nearly 1978 before I matched that salary in inflation-adjusted dollars.

The situation arises often. When I am interviewing for a job, or presenting some technical information, I try very hard to determine the level of expertise of my questioner, lest I give the "wrong" answer for that level of expertise.

This "theory of mind" reasoning is familiar to all of us, and not just in the context of a job interview or a technical presentation, because it's at the core of all human communication. Another relevant term of art is "audience design", which was used by Clark and Murphy (1983) in reference to reference, and by Bell (1984) in reference to style. As Peter Patrick puts it in his online lecture notes about the sociolinguistics of style, Bell's idea was that

speakers adjust their speech ... towards that of their audience in order to express solidarity or intimacy with them, or conversely away from their audience’s speech in order to express distance.

Recent work continues to emphasize a communicative view of style, in which speakers and writers make stylistic choices partly in view of their relationship with an audience.

A certain amount of "theory of mind" and "audience design" reasoning in test taking is inevitable, but it seems to me that we ought to minimize such factors in tests like the SAT. To evaluate students on knowledge of the norms of standard American English puts that kind of English in a privileged position, and rewards those who are familiar with it, but I agree with those who think it's right and proper to do so. I suppose that it's even plausible to grade students on their understanding of dubious made-up "rules" such the prohibitions about "singular their" and "split infinitives", since they will need to deal with people who believe such things. (I'd want the question to make the context clear, however -- "...this sentence contains what some people think is an error in number agreement, despite the fact that many respected writers and grammarians..."). I can see no value at all, however, in ranking students on how well they can predict whether or not ETS graders put credence in such phony "rules". At best this adds noise to the measurement; at worst, it measures an educationally irrelevant kind of cultural affinity.

Clark, H. H., & Murphy, G. L. (1983). "Audience design in meaning and reference." In J. F. LeNy & W. Kintsch (Eds.), Language and comprehension (pp. 287-299).
Bell, Allan (1984). "Language style as audience design." Language in Society 13(2): 145-204.


Posted by Mark Liberman at 12:26 AM