Language Log: December 2006 Archives

December 31, 2006

Busy tongues

The latest instance of "Everyone already knows this" commentary on The Female Brain comes from Cinnamon Stillwell in the SF Examiner, 12/29/2006, under the headline "Experts Discover Men And Women Are Different!":

When it was revealed that scientific studies published in the new book "The Female Brain" demonstrate that women talk more than men, many of us responded with a collective shrug. Anyone who has ever been in a relationship with a member of the opposite sex -- whether romantic, familial or friendly -- knows that women talk more than men. A lot more.

"The Female Brain" indicates that not only do women talk three times as much as men, but they also get a chemical rush in their brains from hearing their own voices. This may explain why women describe "feeling better" after talking about problems or issues in their lives, beyond the mere relief of getting it off their chest.

The "revelation" behind the hyperlink is the Daily Mail article that I discussed around Thanksgiving ("Regression to the mean in British journalism", 11/28/2006). Brief recap: Louann Brizendine neither did nor cited any "scientific studies" about sex and talkativeness, but just invented some numbers out of thin air -- or maybe quoted someone else who invented the numbers. She's semi-retracted the claim. And the "chemical rush" business is apparently just as bogus -- see the links collected here for some discussion.

But let's light a scientific candle instead of cursing the journalistic darkness.

In an earlier post ("Gabby guys: the effect size", 9/23/2006), I discussed some data from the Fisher English Corpus Part 1 (FECP1), a collection of 5,850 telephone conversations lasting up to 10 minutes each, recorded in 2003. 10,950 of 11,700 conversational sides involved native speakers of American English for whom information about sex, age and years of education is available, I've posted a summary of data from those calls here. Each line presents information about one conversational side, laid out like this:

sex   no. of turns  no. of words  total time  words per min.   ID   age  years of edu
 m       149           773          208.5        222.446      2602   34     16
 f       138           876          212.26       247.621      1790   24     16

One simple thing we can do with this data is to fit a linear regression model. A script for the free software statistics package R, which will read in the data and fit such a model, is here. Type the R expression

summary(M1 <- lm(words ~ sex + age + edu + sex:age + sex:edu - 1))

(it's in the script as well) and you'll learn:

Coefficients:
            Estimate Std. Error t value Pr(>|t|) 
   sexf     731.1017    21.7214 33.658   < 2e-16 ***
   sexm     808.0448    26.2205 30.817   < 2e-16 ***
   age        2.5811     0.2945  8.765   < 2e-16 ***
   edu        4.3731     1.2402  3.526  0.000424 ***
   sexm:age  -0.2608     0.4285  -0.609 0.542853 
   sexm:edu  -1.6424     2.0538  -0.800 0.423915 
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 288.2 on 10944 degrees of freedom
   Multiple R-Squared: 0.909, Adjusted R-squared: 0.909 
   F-statistic: 1.822e+04 on 6 and 10944 DF, p-value: < 2.2e-16

The effects of sex, age and education were highly "significant" in the technical statistical sense. Whether these effects were significant in the ordinary language sense, you can judge for yourself. The sex-age and sex-education interactions were not significant.

Your basic modeled male put out about 77 more words per conversation than your basic modeled female did -- 808 vs. 731, or about 10% more. And independent of sex, each additional year of age was worth about 2.6 additional words per conversation, while each additional year of formal education was worth about 4.4 additional words. A light-hearted way to put this would be that being male is worth about 30 years of experience (76.9/2.58 = 29.8) or 18 years of formal education (76.9/4.37 = 17.6 ). In terms of word count in a 10-minute phone conversation, that is... [Emily Bender has written to warn me that irony is dangerous, and I risk having my numbers quoted by the BBC or the AP, along the lines of "From the point of view of verbal facility, being male is worth 30 years of practice or 18 years of formal education, according to research published this month". We'll see...]

Once again, young men talk like old women. But we already knew that, right?

Here's a boxplot to help you judge the size of the sex and education effects. The top and bottom of the box are the 75th and 25th percentiles; the whiskers extend out to the edges of the range, once some statistical outliers have been trimmed. The four boxes show males and females over the age of 25 with a high school education or less, vs. with a college degree or more. (The script also includes the code to generate this plot -- you can make your own plot for the sex and age effects...)

[Update -- several people, including Geoff Pullum, have copied me on email sent to Cinnamon Stillwell, clueing her in to the non-existence of the "science" confirming what she thinks everyone "knows", and to the contrary results of such studies as do actually exist. She hasn't responded yet, but in fact Steven Colbert has already scripted the response that I expect:

And on this show, on this show your voice will be heard... in the form of my voice. 'Cause you're looking at a straight-shooter, America. I tell it like it is. I calls 'em like I sees 'em. I will speak to you in plain simple English.
And that brings us to tonight's word: truthiness.
Now I'm sure some of the Word Police, the wordanistas over at Webster's, are gonna say, "Hey, that's not a word." Well, anybody who knows me knows that I'm no fan of dictionaries or reference books. They're elitist. Constantly telling us what is or isn't true, or what did or didn't happen. Who's Britannica to tell me the Panama Canal was finished in 1914? If I wanna say it happened in 1941, that's my right. I don't trust books. They're all fact, no heart.

And in their hearts, everybody knows that "women talk more than men. A lot more." And blacks are lazy, jews are avaricious, celts are drunks, southerners are stupid... You can't fight truthiness -- or the pop psychology books that promote it. Don't be fooled by the long list of scientific-looking references -- The Female Brain and its ilk might be books that invoke the authority of science, but they're all heart, no fact.]

Posted by Mark Liberman at 09:50 AM

December 30, 2006

Cool Hwip: the culture of a cluster

Family Guy is contending with The Simpsons as a source of materials for linguistics instruction. Last time the subject was uptalk ("Satirical cartoon uptalk is not HRT either") -- this time it's [h] before semivowels:

The classic reference on this topic is Raven McDavid Jr. and Virginia Glen McDavid, "H before Semivowels in the Eastern United States", Language 28(1) 41-62 (1952):

Although the pronunciation of /h-/ before vowels does not constitute a social shibboleth in the United States, there is evidence that the presence or absence of /h-/ in words like whip and humor is often considered a test of social acceptability. Thus when Thomas Pyles recently remarked that in his dialect (of Frederick, Maryland) the cluster /hw-/ does not occur, despite the efforts of well-meaning schoolteachers to impose it on generations of students, a reader immediately commented that nowhere had she observed a person of true culture who did not possess that cluster. Such responses are not confined to laymen. T. R. Lounsbury and William Dwight Whitney, and more recently C. K. Thomas and A. G. Kennedy, have insisted that there is a social stigma attached to those who do not pronounce /h-/ in words of these types. H. L. Mencken, on the other hand, considers the pronunciation of /h-/ in whip etc. an affectation.

I'm with Thomas Pyles and H. L. Mencken on this one -- "the baby whales" and "the baby wails" are homophonous in my speech. And at least some of Family Guy's target demographic is way beyond Mencken, considering [hw-] not affected but just plain hweird.

Here's McDavid & McDavid's hypothesis about the history:

By the time of the American Revolution neither the restoration of /h-/ in humor as a spelling-pronunciation nor the simplification of /hw-/ to /w-/ had been carried out in the cultured speech of southern England. Consequently it is easy to understand both the overwhelming preference of American speakers for humor with /j-/, and the fact that the areas with /w-/ in whip, wheelbarrow, whetstone, and whinny center around the ports, where contact with England was longest maintained by the mercantile class.

A (nonlinguist) guest brought /hw-/ up at dinner last night, and responses around the table made it clear that plenty of Americans still preserve this feature, hweird as it may sound to some.

[Hat tip: Vishy Venugopalan]

[For more on this, see Roger Shuy's post "Wut? Wen? Wich?", 9/17/2006.]

[Update -- Tiago Tresoldi writes:

a great post, but the video was cut and people who did not watch the show are probably not getting the "you are eating hair": in fact, Meg (the sister) had put some of her hair inside the pie. That is why Stewie (the baby) is eating "hair" and not "air".
The beginning is available here: http://www.youtube.com/watch?v=uDH7ASzdQ7k

]

Posted by Mark Liberman at 12:57 PM

Phonics

We've never had a Swedish cartoon before. This Jan Stenmark example, sent in by Anastasia Nylund, didn't seem very funny at first, but it's been growing on me:

"Daddy, how do you spell 'steam train'?"
"The way it sounds."
"Choo-choo-choo?"

Francis Strand offers some advice about how to pronounce Swedish spelling. Sample:

G - same as English before an A, O, U or Å; but before an E, I, Y, Ä or Ö it is pronounced more or less like a y; it's like in English before consonants, except when at the end of words such as berg or borg, where it sort of disappears as you almost make a y sound but don't really; the other consonant exception is when it comes before an N, such as in barnvagn - baby carriage - the combination of gn becomes like ngn. Finally, it sometimes doesn't follow these rules at all.

Francis has been blogging since August, 2001, under the title "How to learn Swedish in 1000 difficult lessons", providing one Swedish word or phrase in each post.

This might be our first Swedish cartoon, but it's not our first phonics cartoon, which was Rob Balder's' "Boy Reading" (from Partially Clips, 7/21/2202), and is worth posting again:

For some excellent (though unfortunately cartoon-free) background, read this.

Posted by Mark Liberman at 09:29 AM

Vocabulary size and country music

The Economist ("Middle America's soul", 23 December 2006, 45-47) quotes a contemptuous Bob Newhart joke about country music:

"I don't like country music, but I don't mean to denigrate those who do. And for the people who like country music, denigrate means ‘put down’."

Never mind the target of the joke (Newhart is probably satirizing the familiar blue-state bi-coastal snooty attitude toward c&w rather than endorsing it); let's think about its basis. Once again, it's vocabulary size as the measure of intelligence and wisdom and culture, isn't it?

Despite the fact that we have virtually no idea of how to measure vocabulary size rigorously and fairly (which is one thing differentiating vocabulary size from penis length), nobody cares: people are prepared (it would seem) to accept imaginary facts about how many words are known by groups of people about whom they know nothing (or about themselves, as with the Payack claims concerning English) as a reliable assay of intelligence level, or even the sophistication level of a whole language or culture, and to accept any kind raving nonsense anyone comes up with by way of vocabulary counting. The Reader's Digest word quiz is headed "It Pays to Increase Your Word Power": just sock those words away like cash in a bank. And Will Shortz's puzzles on NPR's Weekend Edition Sunday (how I hate that puzzle segment) are nearly always about rapid lexical access. It's the central stereotypical yardstick for how smart you are among ordinary people: how many words you have, and how quickly you can come up with the right one to name the right kind of snow or whatever. Newhart's joke reminded me again of how superficial lexicon size measurement is as a surrogate for intelligence, and how common it is to find journalists writing things (of either the many-words-for-X or the no-word-for-X variety) that suggest they accept it.

Posted by Geoffrey K. Pullum at 01:21 AM

December 29, 2006

The silence of the men

The pseudo-scientific urban legends about sex differences in talkativeness are mutating slightly as they spread around the world. The most recent variants can be found in a 12/22/2006 article in Die Welt: Von Heike Stüvel, "Das Schweigen der Männer" ("The Silence of Men"):

Amerikanische Forscher haben herausgefunden: Männer sprechen im Durchschnitt ein Sechstel weniger als Frauen. Die einzige Ausnahme sind Telefonate mit dem Handy.

Frauen sprechen im Schnitt 30.000 Wörter am Tag, Männer 25 000. Nur ein Viertel redet über Sorgen und Probleme. Am Telefon werden Männer redseliger. Sie benutzen ihr Handy häufiger als Frauen und führen damit im Schnitt 88 Telefonate die Woche.

American researchers have discovered: Men speak on average a sixth less than women. The only exceptions are mobile telephone calls.

Women speak on average 30,000 words a day, men 25,000. Only a quarter [of men] disucss their concerns and problems. On the telephone, men become more talkative. They use their mobile phones more than women and make on average 88 calls a week.

The 30,000/25,000 is a pair of numbers that I haven't seen yet -- there are dozens of pairs (and ranges) of numbers out there among the replications of this meme, among them 20,000/7,000; 30,000/15,000; 7,000/2,000; 30,000/12,000; 50,000/25,000; 25,000/12,000. But I haven't seen 30,000/25,000 before.

Who are the "Amerikanische Forscher" I wonder, and where do the words-per-day and the cell phone counts come from, I wonder? This Swiss study supports the view that college-age males, at least, might make more cell phone calls:

The Die Welt article might not tell us where its words-per-day and cell-phone usage numbers come from, but a few paragraphs in, some familar names show up:

Männer und Frauen sind - was das Kommunizieren betrifft - komplett verschieden. Obwohl Frau und Mann häufig dieselben Worte verwenden, meinen Sie selten das Gleiche. Woher kommt das?

Die US-Neurologin Louann Brizendine fand heraus: Das weibliche Gehirn hat im Sprachzentrum elf Prozent mehr Nervenzellen als das männliche - besonders im Bereich, der für Gefühle und Erinnerungen zuständig ist. Warum Männer oft nicht zuhören, haben Forscher auch herausgefunden: "Frauenstimmen sind aufgrund der Stimmbänder und des Kehlkopfs komplexer und melodiöser als Männerstimmen", so Michael Hunter von der Universität Sheffield. [...]

Das Gehirn wird durch die verschiedenen Schallwellen stärker beansprucht - das fordert viel Konzentration und führt bei Männern zur Ermüdung.

Where communication is concerned, men and women are completely different. Although women and men often use the same words, they seldom mean the same thing. Why is that?

The American neurologist Louann Brizendine has discovered: The female brain has 11% more nerve cells in the language center than the male does -- especially in the area that is responsible for feelings and memories. Researchers have also discovered why men often do not hear: "Because of the vocal cords and the larynx, women's voices are more complex and more melodious than men's voices", says Michael Hunter of the University of Sheffield.

The brain is more strongly stressed by varying sound waves -- this demands concentration and makes men tired.

Right. The "11% more nerve cells" part is bogus, though again I'm not sure exactly where it comes from. For a discussion of some of the parts of Louann Brizendine's book about the language-related areas of the brain, see here and here. (Dr. Brizendine hasn't done any research of her own on this topic.) None of Michael Hunter's research, as far as I'm aware, establishes anything about degrees of concentration or men getting tired or (in fact) anything about differences in men's perception vs. women's perception. For a discussion of Michael Hunter's work on (men's) perception of different sorts of voices, see here and here.

The Die Welt article ends with familiar stuff about how women express feelings, and men express facts, and that's all because of the paleolithic division of labor, in which men were responsible for hunting bison and "Frauen waren eher für Kinder, Küche, Kirche... oops I mean Hege, Pflege und Gefühle" ("women were instead for nurturing, care and feelings").

I think of Die Welt as a serious, reponsible publication. Wikipedia calls it "the flagship publication, within the so-called quality newspaper market, of the Axel Springer empire". It's disconcerting to see such a paper spreading apparently fabricated numbers without any serious attempt at attribution or any fact-checking at all.

There's a lot of hand-wringing these days about the sinking fortunes of print media. It's usual to blame competition from new sources of information. But I wonder how much of the problem is epitomized in articles like this one. Maybe the public has more sense than the journalists do.

Posted by Mark Liberman at 03:06 PM

December 28, 2006

Two ways to look at the passive

Since we last looked at the injunction Avoid Passive in any detail (in a posting by Mark Liberman that has links back to a pile of earlier postings), I've looked at some more treatments of the passive in books of advice. Here I'm going to report on two extremes: at the low end, Toni Boyle and K.D. Sullivan, The Gremlins of Grammar: A Guide to Conquering the Mischievous Myths That Plague American English (2006), and at the very high end, Virginia Tufte, Artful Sentences: Syntax as Style (also 2006). The two books share a semantic characterization of the passive, but otherwise they could scarcely be more different.

I'm going to follow both of the books in talking about the "passive voice", though older writers on English grammar (especially in the 19th century) regularly object to this term, on the grounds that voice and tense and mood and aspect and the like are names of grammatical categories realized in inflectional morphology. Latin has a passive voice, these writers explain, because it has a system of inflected verb forms that are primarily devoted to use in constructions of a certain sort. English, on the other hand, has constructions of this sort, but no verb form primarily devoted to use in them; the English passive (as in This book was written by a friend of mine) uses the "past participle" (to give it its traditional, and very opaque, name), which is also used in perfect-aspect clauses (like Kim has written many books) and adjectivals (like Written instructions are better than oral ones and When we arrived, at 5, the door was closed and locked). To put things another way, the work that's done by inflectional morphology in Latin is done in English "analytically", or "periphrastically", that is, by syntactic constructions.

What is this work? Simplifying a lot, the passive provides a way to treat what is normally the direct object of a verb (or, occasionally, the object of a preposition) as a subject. Note that my characterization is framed entirely in syntactic terms: the syntactic category verb and the syntactic functions subject, direct object, and prepositional object. There is no talk of actions, actors, agents, performers of actions, or recipients of actions. (These semantic notions are not irrelevant, because they're tied, in very complex ways, to the syntactic notions of subject, direct object, etc. They're just not identical to them.)

English has a number of constructions that do this work. Among them are several that use the past participle form of the verb and optionally allow the expression of the normal subject in a prepositional phrase with by (in actual writing and speech, the by-phrase is much more often omitted than not). I will refer to all of these as "passive constructions". Among them are the "BE-passive" in (1) and the "GET-passive" in (2):

(1) Kim was attacked by wolves.

(2) Kim got attacked by wolves.

There's now a problem in using the technical term "active (voice)". It contrasts with "passive (voice)", but how? Is it narrowly contrasted, so that active VPs are only the ones that can have passive counterparts? If so, then enormous numbers of verbs are neither active nor passive: in particular, intransitives of several types, as in (3), and unpassivizable transitives of several types, as in (4).

(3a) Kim slept.
(3b) Sandy disappeared.
(3c) Terry seemed unhappy.
(3d) Chris became a detective.
(3e) Three hours elapsed.
(3f) Terry screamed.

(4a) Kim resembles Sandy. (*Sandy is resembled by Terry.)
(4b) The play concerns poverty. (*Poverty is concerned by the play.)
(4c) I realized the answer. (*The answer was realized by me.)
(4d) These movies star Freddy the Pig. (*Freddy the Pig is starred by these movies.)
(4e) I have two houses. (*Two houses are had by me.)

Or is "active" broadly contrasted to "passive", so that anything that's not passive is active? If so, then intransitives and unpassivizable transitives are all active. In this case, we might as well abandon the misleading technical term "active" completely: VPs are either passive or not; the non-passive VPs don't necessarily have anything in common with one another, beyond not being passive.

This is all background. Now a word about the attitude we take at Language Log to the passive, which is that passive constructions have their uses and that a blanket injunction to avoid them, or even to avoid them as much as possible, is silly. Good writers, including Strunk and White themselves, use them with some frequency, as we have pointed out many times here on Language Log. In fact, most of Tufte's discussion of the passive (pp. 78-89) is devoted to its virtues, with many well-chosen examples.

(Quite often, people have written me to say that in their experience active clauses are usually, or even almost always, clearer than their passive counterparts. These are, of course, impressions, not the results of systematic studies of passive use; they are subject to the effects of selective attention and confirmation bias. When people have looked at polished writing to count passive clauses -- not an easy task, and subject to some judgment calls -- they find that 10-20% of the clauses are passive. And when you look at specific examples, very few of them would be improved by conversion to actives, and many would be changed for the worse.)

It is true that some writers seem to be overfond of the passive, and can use some encouragement to re-word. My impression, from working with students, is that the problem is rarely a simple fondness for passives, but usually involves a more complex set of difficulties in organizing discourses for an audience. The ineffective passives are just a symptom of a larger problem.

Now to the two books. Gremlins has a very brief treatment, less than a page (pp. 77-8). The section, titled "Verbs Have Voices", starts with an explanation of the voices of English:

Verbs have two voices to choose from, active and passive. If a verb is in the active voice, the subject is doing the action.

The matador confronted the bull, stared him in the eye, flicked his cape, then ran back to the side of the arena.

(The usual confusion between expressions and the things they denote, between words and the world. Subjects of sentences are linguistic expressions, and expressions don't do actions; denotations of subjects might sometimes do actions, however. This usage is so widespread that it might seen churlish to complain about it. But I think it's useful for students to keep the distinction between form and meaning in mind; remember that this is a book for ordinary people, not professional linguists or philosophers. Once that's well established, there's no problem in using the looser locution, since things will be clear in context.)

Ok, Gremlins talks about "verbs", period, suggesting that they hold to the view that all verbs are either passive or not, and they use "active" to refer to the non-passive ones. That's just a terminological choice. Then they give a version of the standard semantic characterization, in which verbs denote actions and subjects of active verbs denote the agents in those actions, and one example. The example has four active VPs in coordination, sharing the subject the matador. The first two of these can be seen as denoting actions only by stretching the notion of "action" considerably; confronting something and staring something in the eye are not caused changes of state. It is, of course, easy to find much more extreme examples, of active VPs that transparently do not denote caused changes of state: many of those in (3) and (4) above, plus things like:

(5a) Kuwait lies to the south of Iraq.
(5b) The tank holds 14 gallons.
(5c) Everyone appreciates fine wines.
(5d) Fine wines please everyone.
(5e) Picnics attract ants.

When you look at polished writing and ask how many clauses have verbs denoting actions and subjects denoting the agent of those actions -- again, not an easy task and subject to judgment calls -- the figures are once more in the 10-20% range. Action verbs with agentive subjects are certainly not in the majority.

I'm dwelling on these very familiar points because the characterization and the example appear in a book of advice; they're SUPPOSED TO BE HELPFUL to writers. I can't imagine how they could be. The semantic characterization is no more than recitation of a piece of a catechism, reproduced without understanding; a reader who takes it to be a claim about English (or languages in general) and tries to test it will quickly come upon examples like those above and conclude that the claim is false, while everyone else will just memorize it as a definition and pass on, no wiser. But why do semantic characterizations persist, in the face of such abundant counterevidence?

I suspect that the answer is in fact that they are treated as dogma. They are seen as being so fundamentally true that action and doer of the action have come to be understood as 'meaning of a verb' and 'meaning of a subject in an active clause', respectively. Plenty of people have responded to examples like those in (4) and (5) by patiently explaining to me that they do indeed describe actions, in some extended or metaphorical interpretation of the word action. For them, the semantic characterizations couldn't possibly be false. If so, then including them in an advice book is nothing more than instruction in the catechism.

In any case, Gremlins passes immediately to the passive, leading with:

The passive verb always uses some tense of to be.

The book that was written in four weeks was made into a movie in four years. [(I)]

A quibble: "some tense" should be "some form". Is written and was written have tensed forms of BE (present and past, respectively), but be written (base form), being written (present participle), and been written (past participle) do not, yet all of them are passive. A small point, true, but also another instance of the often shocking laxness in the use of standard grammatical terminology in popular writing ABOUT GRAMMAR.

More important, we've already seen that passive verbs don't always use some form of BE; there's also the GET-passive, as in (2). In fact, there's a whole lot more -- in particular, BE-less passives in various verb-complement constructions, as in (6), and in various free adjunct constructions, as in (7).

(6a) The fiends had Kim attacked by wolves.
(6b) We saw Kim attacked by wolves.

(7a) Attacked by wolves, Kim fled.
(7b) With Kim attacked by wolves, everyone was terrified.
(7c) Once attacked by wolves, you'll never feel the same about the forest.

And there are many constructions with the verb BE in them that are also not passives -- the progressive, in (8), and an assortment of copular constructions, sampled in (9).

(8) Wolves are attacking Kim.

(9a) Terry is unhappy.
(9b) Superman is Clark Kent in disguise.
(9c) There are penguins on the porch.

I mention all this because Gremlins has, for some reason, taken the occurrence of a form of BE as criterial for passives, when in fact it is neither necessary nor sufficient.

Meanwhile, there are several constructions involving subjects that are understood as objects of verbs, but are in fact NOT passive constructions, for example the four illustrated in (10), in which the subjects are understood as object of the verbs read, skim, lift, and wash, respectively.

(10a) This book reads easily.
(10b) This book is easy to skim.
(10c) This box is too heavy to lift.
(10d) My shirt needs washing.

Two of these -- in (10b) and (10c) -- do have a form of BE in them, and they all have non-agentive subjects, so at least two of them are problematic for the way Gremlins delineates passives.

But all of these details are as nothing in the face of the fact that this section of the book is the first place in it where passives are mentioned, and the fact that the two short passages above (three sentences of text in all) are the whole of the book's treatment of the nature of active and passive voice. Obviously, no one could make any sense of this if they didn't already know how to recognize actives and passives, at least in the easy cases, so what is this section for?

The point is to trumpet Avoid Passive (which is what comes next); the stuff about be is there, I think, just as a demonstration that serious grammatical issues are somehow involved. The tactic here is one I've seen in a number of popular advice books (I hope to post on some other examples eventually): the goal of a section of the book is to proscribe some usage, but first there are some ornamental technicalities, which serve to suggest that the proscription is somehow grounded in Real Grammar and therefore should be taken seriously. The ornamental technicalities are, typically, one or more of the following: truncated (Gremlins on the passive might be a new record here); therefore desperately incomplete; inaccurate on factual details; illustrated by flawed examples; discussed with technical terms used inappropriately; and not entirely relevant to the proscription. Oh yes, and the examples are almost always invented and almost always given without context.

In any case, the punch line is:

The subject of a passive verb never acts--which gets pretty boring. It's like listening to music that's always in a minor key. Dreary. So writing or talking in the active voice is best.

The truly remarkable part of this is its framing as an objection to ALWAYS using the passive (where it's available), something no one has ever even come close to suggesting. (Even in advice to use the passive in describing the design of experiments and tests in the scientific literature, the manuals don't tell you to use the passive everywhere. But, anyway, Gremlins isn't addressed to people writing scientific journal articles.)

Then there's the bad-mouthing of music in minor keys. Undeniably, minor scales and chords are popularly associated with melancholy, but there's plenty of minor music with other emotional tones (Beethoven's Fifth Symphony is in C minor, a key that many have seen as characteristically "stormy" and "heroic" for Beethoven), and most music of any length modulates between minor and major (sometimes shorter compositions do too; as Daniel Levitin notes in This Is Your Brain on Music, p.38, "Light My Fire" by the Doors has the verses in minor chords, but the chorus in major chords).

And then the analogy between passive syntax and minor music, which seems to turn on perceived associations between passivity, in the real world, and, on the one hand, passive syntax, and, on the other, minor keys -- in combination with a celebration of activity, energy, control, etc. in the real world, which are associated with active syntax and major keys. There's a lot to be said on the topic -- why, for example, is the contrast not between restiveness (bad) and placidity (good)? -- but, as far as I'm concerned, none of it belongs in a book like Gremlins. There might indeed be some metaphorical associations, between grammatical voice and extralinguistic matters, that have some psychological reality for at least some speakers, but they're likely to be subtle in their effects, much more subtle than other factors that I'll take up below.

Finally, a comment on "the subject of a passive voice never acts". For passivizable verbs with non-agentive subjects, the passives are just as (metaphorically) "active" as the corresponding actives, as is the case for (5c) and (5d) and their passives:

(5c - active) Everyone appreciates fine wines.
(5c - passive) Fine wines are appreciated by everyone.
(5d - active) Fine wines please everyone.
(5d - passive) Everyone is pleased by fine wines.

As far as I know, there are no verbs with agentive direct objects -- there is, after all, SOME significant connection between the syntactic functions in sentences and the participant roles in situations -- but you can concoct passives in which the subject denotes an agent, just not the agent of the verb that is passivized. What I have in mind are things like:

(11) I was moved/impelled/inspired to sing the national anthem.

Here the impulse or inspiration is internal to the speaker of (11). The effect of the sentence is to assert that the speaker sang the national anthem -- performed an action -- and did so as a result of this internal impulse or inspiration.

Back to the Gremlins text. The activity connection is pursued further in its final part:

[So writing or talking in the active voice is best.] To see why, let's take the last example [(I)] and turn it around.

She wrote the book in four weeks, but it took four years to make the movie. [(II)]

You want others to remember what you say and write, so keep it active. The exercise will do you good.

Taking it from the end: the activity connection is there in the pun on exercise; and the preceding sentence introduces a new (and unsubstantiated) claim, that active sentences are easier to remember than passive sentences. Now consider the passive example, (I). It is indeed awkward, but that's at least in part because the book that was written in four weeks is hard to contextualize. (It's only too easy to invent awkward examples, especially out of context.) If the referent of the book is given in the context (as it must be in (II)), then the following (which also makes the contrast explicit) is something of an improvement:

(III) The book was written in four weeks but was made into a movie in four years.

In (III), the restrictive relative clause (modifying the book) that makes the original hard to contextualize has been turned into the first conjunct of a coordination; the Gremlins rewriting, (II), does the same. That is, (II) is not a simple "turning around" of passives into actives; that would produce something like

(IV) The person who wrote the book in four weeks made it into a movie in four years.

in which, as in (I), the contrast between four weeks and four years is poorly expressed, because four weeks is inside a relative clause and four years is in the main clause. A minimal fix would put the two NPs in parallel positions:

(V) X wrote the book in four weeks and made it into a movie in four years.

(where X is some subject NP). Converting a passive with no by-phrase into an active requires supplying material not in the original; in this case, Gremlins supplies, without comment, a subject she.

But (V) implies (almost surely incorrectly) that the person who wrote the book also made the movie of it. Version (IV) shares this defect, but there's no such problem with (III), since (III) contains no NPs denoting the writer of the book or the maker of the movie. That's one of the virtues of the passive: it allows you to omit any expression of the subject of its active counterpart. In any case, fixing the problem with (V) requires you to supply different subjects for wrote and made:

(VI) X wrote the book in four weeks and Y made it into a movie in four years.

The first lesson here is that rewriting to avoid some proscribed usage often requires rewording other parts of the sentence, sometimes substantially. Advice manuals almost always do this subsidiary rewriting without comment, though if readers need advice on using actives and passives they almost surely need help in the rewriting process.

Now look at the first conjuncts in (III) and (VI): the book was written in four weeks (passive) vs. X wrote the book in four weeks (active). These clauses are not interchangeable in discourse, because the passive version is about the book, while the active version is likely to be understood as being about X; in general, a subject is likely to be understood as denoting something that is both topical in the sentence (what the sentence is about) and topical in the discourse (what the discourse is about at this point). That's another of the virtues of the passive: it allows you convey that a certain discourse referent (denoted by the subject of the passive) is topical. Gremlins, like most advice on the choice between active and passive, fails to even hint at the enormous importance of topicality in this choice.

Next, look at the second conjunct in (VI) -- and Y made it into a movie in four years -- and compare it to the active Gremlins version, (II), and the improved passive version, (III). As I've already pointed out, (II) and (III) bring out the contrast between four weeks and four years by using but instead of and. This is another way in which rewriting can introduce material not explicit in the original. That's the second lesson here: advice manuals very often make alterations in the original that are not required by a straightforward undoing of the proscribed usage; they "improve" the original in other ways as well and so heighten the contrast between the "bad" original and its rewriting (almost always without comment or explanation, of course).

In fact, the second conjunct of the Gremlins version, (II) -- but it took four years to make the movie -- goes way beyond the minimal rewriting in (VI). Strikingly, (VI) has an action verb and an agentive subject in this conjunct, but (II) does not! The verb in (II), took, is indeed active voice, but in the sense here TAKE belongs with the verbs in (5) above, which don't even come close to denoting actions. In fact, in this sense, TAKE is unpassivizable (with either of the two available verbs):

(12a) It took four years to make the movie.
(12b) *Four years were taken (by it) to make the movie.
(12c) *The movie was taken (by it) four years to make.

As for the subject, it's a "dummy" it, a place-holder with no denotation of its own (certainly not as an agent in an action); instead, in this construction to make the movie 'making the movie' is interpreted as the subject of took four years. The verb TAKE in related constructions, as in (13) and (14), is equally unpassivizable:

(13a) Making the movie took four years.
(13b) *Four years were taken (by making the movie).
(13c) Making the movie took Allen four years.
(13d) *Allen was taken (by making the movie) four years.

(14a) The movie took four years to make.
(14b) *Four years were taken (by the movie) to make.
(14c) The movie took Allen four years to make.
(14d) *Allen was taken (by the movie) four years to make.

((14a) and (14c) illustrate further constructions, like those in (10), which have a subject understood as the object of a verb but which are nevertheless not passive.)

What's happened here is that the Gremlins version of the second conjunct introduced an entirely new construction, not in the original (again, without comment or explanation). On top of that, the construction totally fails to fit the Gremlins characterization of active clauses, and indeed suppresses any mention of the maker(s) of the movie, just the way an agentless passive does. Goodness knows what readers are supposed to make of all this for practical purposes.

Now I'm not claiming that there's something wrong with the second conjunct of (II). In fact, I think it's pretty good. There are several variants or expansions of it that might also do:

(15a) ... it took four years to make the movie. [in (II)]
(15b) ... it took four years for Y to make the movie. [with mention of the maker(s)]
(15c) ... it took four years to make the movie of it. [with explicit reference to the book]
(15d) ... it took four years for Y to make the movie of it. [combo of (b) and (c)]

(16a) ... making the movie took four years.
(16b) ... making the movie took Y four years.
(16c) ... making the movie of it took four years.
(16d) ... making the movie of it took Y four years.

No doubt you can imagine still other possibilities. What's good about all of these is that they bring out the two relevant contrasts, between the movie and the book and between four years and four months.

The problem with (II) is its first conjunct, specifically the subject of this clause. Version (II) treats the writer of the book as topical, and that's possible (if so, then (II) conveys a topic shift, away from the writer of the book to the book itself, in contrast to the movie), but it's likely, especially when the sentence is viewed out of context, that the book is topical, in which case we want the book to be the subject of this clause -- that is, we want a passive. The book's writer can then be downgraded in its discourse status (by being mentioned in a by-phrase), or you can suppress mention of the writer entirely, depending on your wider aims in the discourse:

(VII) The book was written (by X) in four weeks, but it took four years to make the movie.

(or with any of the other variants for the second clause, or with one of the constructions in (14) in the first clause). And if you want to treat X as topical, then there are further possibilities, with active verbs in both clauses, for instance:

(VIII) X wrote the book in four weeks, but Y took four years to make the movie.

To sum up: the Gremlins treatment of the passive is appalling, but in detailing just what is appalling about it I've tried to bring out some important points. What's especially disheartening, though, is that The Elements of Style (right back to the Strunk 1918 original) -- cited approvingly in the Gremlins reading list, by the way -- gets some of this right. In particular, Strunk appreciates the significance of topicality in choosing between active and passive. Here's his summary:

The need of making a particular word the subject of the sentence will often, as in these examples [given just before this], determine which voice is to be used.

(Note the nominal of making a particular word... instead of the verbal to make a particular word... and the passive is to be used instead of the shorter and active to use. Strunk wasn't very good at following his own advice.)

Now we get to Virginia Tufte. Tufte just assumes her readers are acquainted with the concepts and terminology of traditional grammar; her aim is to show you what you can do with the resources of English. The section on "the passive verb" begins with the usual semantic characterization:

Otters eat clams. The verb is in the active voice: the subject performs the action. Clams are eaten by otters. The verb is in the passive voice: the subject receives the action.

Oh dear. But then she jumps right into a discussion of discourse organization:

Which form you use depends on whether you have previously been writing about otters or clams. One of the uses of the passive is to shift the topic or the emphasis. Another is to move the noun phrase that was the subject of discussion to a new location in the sentence, usually toward the end...

She also notes that "for good or ill" the passive allows you to omit this noun phrase entirely.

More generally,

There are, as with other inversions, many reasons for turning to the passive, including the need for special emphasis or rhythm, for strategic rearrangements of different kinds to aid modfication or to increase cohesion, for adjustments in a parallel series, and for certain more thematic effects, often providing a contrast with the active verbs.

These points are illustrated with pages of examples, extensively and sensitively discussed. Real examples, with contexts. On occasion, passives are rewritten into less effective actives. In the middle of the section she turns to the sorts of passives that critics most often complain about, especially in chilly officialese and impersonal reporting, though she notes that the coldness and impersonality of her examples might well have been intended by those who wrote them; it's not at all clear that the syntax is the problem.

There is no hymn to the energetic activity of the active, no castigation of the boring submissiveness of the passive. It's all about what you can do with the two voices.

If you don't know how to recognize a passive (at least in the easy cases), then you'll need some background before you can tackle Tufte. Don't, however, try to get it from Gremlins.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:42 PM

VPE on the edge

Our very own John McWhorter wrote the following yesterday:

And yet NOO is not "slang" -- it's grammar. One could write a whole paper on it (and, as it happens, one is!).

No doubt John knew just what he was doing here: producing an instance of so-called Verb Phrase Ellipsis (VPE) -- in "one is ___" -- where the missing material is to be understood as "writing a whole paper on it" (a present participle VP), even though the antecedent is "write a whole paper on it" (a base-form VP). I think many readers would have a moment (up to a few centiseconds, maybe) of pause while they worked that one out, and possibly they would have had a small spike in their P600 ERP responses (but nothing special for N400), indicating that they were noting a syntactic surprise. He was playing with us, making us do a little bit of interpretive work and, maybe, giving us some enjoyment in the process.

[Added 12/29/06: Several readers note that part of the surprise effect is in the shift from truly generic one to the pseudo-generic one that refers to the speaker.]

Two things here: what counts as a legitimate VPE (some things are definitely on the edge); and how to draw the line between creative language use that stretches the boundaries of grammar a bit and plain unacceptability (again, there are things on the edge).

Background about VPE: this is an English construction in which the complement of an auxiliary verb (a modal, BE, or perfect HAVE, plus a few other things for some speakers) or infinitival TO is omitted:

(1) I can't juggle knives, but Dmitri can ___.
(2) I'm not going, but Dmitri is ___.
(3) I was attacked by the wolves, but Dmitri wasn't ___.
(4) I'll be unhappy, and Dmitri will be ___, too.
(5) I've finished my work, and Dmitri has ___, too.
(6) I don't want to eat the sashimi, but Dmitri wants to ___.

(The "remainder" elements are bold-faced here, and the missing complements are indicated by underscores.)

Though the construction is usually known as Verb Phrase Ellipsis (sometimes Verb Phrase Deletion), the omitted phrase is not always a VP. In (4), it's an AdjP. "VPE" isn't a bad name, but it doesn't tell you everything. The slogan is: Labels Are Not Definitions.

VPE requires a linguistic antecedent -- it's not enough that the appropriate verbal semantics be "in the air" -- but it doesn't require that the omitted complement match the antecedent perfectly. Infinitival TO as remainder will have an omitted bare-form VP, but the antecedent can have a different non-finite form:

[present participial antecedent] Stanford University and the city of Palo Alto are opening up their joint meetings to the public for the first time in three decades "because there's no reason not to ___,"...

or a finite form:

[finite antecedent] Mercedes ranks high for dependability but not as high as it used to ___.

(The head verbs in the antecedent phrases are italicized here.)

Various other mismatches between the omitted phrase and its antecedent are possible. But some mismatches are edgy, and John McWhorter's -- present participial omitted VP, base-form antecedent VP -- is one of them. Here's a parallel example that I copied into a file because I lingered over it for a slice of a moment:

"We cannot allow energy to divide Europe as Communism once did," José Manuel Barroso, the European Commission president, told The Financial Times. But it is ___. (Thomas L. Friedman, "The Really Cold War", op-ed piece in the NYT, 10/25/06, p. A19)

It's not hard to collect even more extreme mismatches, which some people judge to be acceptable, while others do not. Here's one Ron Hardin reported on in the newsgroup sci.lang on 9/28/06, from an NYT editorial:

Those men could have been tried and convicted long ago, but President Bush chose not to ___.

Here the antecedent is passive, while the omitted VP is active ("try and convict those men").

Even further out -- well over the line, for me -- is this one:

Domagk, for his part, believed that he had run his tests flawlessly. Almost every time they tested an azo dye with a sulfa side chain, it killed strep; almost every time it did not ___, the effect [of killing strep] was absent or greatly reduced. (Thomas Hager, The Demon Under the Microscope (Harmony Books, 2006), p. 174)

Here the antecedent VP isn't explicit, but is suggested by the prepositional phrase "with a sulfa side chain": "have a sulfa side chain".

My response to most of the imperfectly matching VPE examples, however, is that they are either straightforwardly acceptable (and so escape notice unless I'm specifically looking for such examples) or edgy in their syntax but interpretable -- much like novel verbings:

Roughly 20 percent of men sexing other men and catching syphilis indicated only oral sex exposure. (Instinct magazine, December 2004)

I really gangbustered to get it [the project report] out. (Overheard by Tyler Schnoebelen, March 2005)

Updating my web site to reflect new movie scripts, DVDs and musical score CDs that the studios have freebied me with ... (Ken Rudolph on soc.motss, February 2002)

(plus an enormous number based on proper names: the verbs Bork, Winona, Martha Stewart, (James) Frey, Wal-Mart, etc.)

Novel verbings are all over the place; people invent them all the time. Some critics object to those that have become widespread, like access and dialogue, but as far as I can tell the objections are really about the tone of these words (they are administrativese or pretentious, or in the case of consequence 'punish', euphemistic) rather than about morphological conversion itself. Otherwise, verbings are just part of the artistry of everyday language, and like other artistry, require a bit of work by the audience. They frustrate the audience's expectation for a split second; resolving the surprise can then provide pleasure. (For the record, I found John McWhorter's VPE sentence satisfying.)

But: writing advice routinely counsels against surprising your audience, against making your readers work. Interpretation is supposed to be seamless and smooth. It is, of course, only too easy to find sentences that require far too much work, even in their context; we comment here fairly often on various sorts of ineptness that make the reader's task onerous. Still, there ought to be room for a certain amount of artistry in all sorts of writing; people shouldn't have to wait until they get their Fine Writer Certificate to play with some of the available effects.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:03 PM

On the trail of "the new black" (and "the navy blue")

In our occasional roundups of those phrasal formulae we call snowclones, one of the most fertile templates has been "X is the new Y" — most recently discussed in three posts by Arnold Zwicky (1, 2, 3), but extending back to the early days of snowclonology (1, 2, 3, 4; see also this pioneering post by Glen Whitman). The Wikipedia page on snowclones even gives "X is the new Y" as its very first example. Wikipedians and some other observers have suggested that the original model for this snowclone is the supposed fashion-industry motto, "Pink is the new black," which first got extended to "X is the new black" before becoming abstracted even further as "...the new Y."

So, to whom can we attribute what Arnold Zwicky calls "the ur-New-Y expression"? Turning again to Wikipedia:

The phrase is commonly attributed to Gloria Vanderbilt, who upon visiting India in the 1960s noted the prevalence of pink in the native garb. She declared that "Pink is the new black", meaning that the color pink seemed to be the foundation of the attire there, much like black was the base color of most ensembles in New York.

The attribution of "pink is the new black" to Vanderbilt has been dutifully repeated in a number of places in recent months, including The Ottawa Citizen, The Taipei Times, kottke.org, Eric Zorn's Chicago Tribune blog, and right here on Language Log. There's only one small problem with the Vanderbilt attribution: it's completely unsubstantiated. It looks like Diana Vreeland should get the credit instead, though she didn't quite say "pink is the new black" either.

[Update, 12/29: The Wikipedia entry for "the new black" has already been revised to give Vreeland credit rather than Vanderbilt. The uncorrected version is archived here.]

The attribution to Vreeland first popped up on the Wikipedia page for "the new black" in a revision on Sep. 13, 2005 by "DropDeadGorgias." The source for this information is uncredited, but some more digging finds that "DropDeadGorgias" wrote a post on Plastic.com on Jan. 22, 2004 mentioning a Guardian headline, "Gay is the new black." (The post was about early reports of the filming of Brokeback Mountain.) In the comments section, "HWheel" offered this explanation for the origin of "the new black":

In the swinging '60's, Gloria Vanderbilt visited India. Everybody was wearing wild colors, but there was lots of pink. She said "Pink is the new black." It's now a fashion cliche: "_________ is the new black," which changes every year.
I got this from the one-woman play, "Full Gallop," which was the wit and spirit of Ms. Vanderbilt.

So it looks like "DropDeadGorgias" took the commenter's word for it and amended the Wikipedia page for "the new black" to say that the expression is "commonly attributed to Gloria Vanderbilt." Unfortunately, this bit of information fails Wikipedia's usual standards of verifiability. Nobody else attributes "the new black" to Vanderbilt, except for people relying on the faulty Wikipedia entry.

A little more research zeroes in on the source of the misinformation. That one-woman play referred to by the Plastic.com commenter, "Full Gallop," is not about Gloria Vanderbilt — it's actually about Diana Vreeland. It's easy to see how one could get the two women confused: they're both chic fashion divas and high-society types with last names beginning with V. Vreeland, however, is the one who evidently deserves a place in the annals of snowclonology.

So what did Vreeland actually say? Turns out she didn't call pink "the new black," or even "the black of India," but rather "the navy blue of India." (Note also that calling pink "the navy blue of India" is actually more akin to snowclones of the "X is the Y of Z" model.) Vreeland's original wording is preserved in the script of "Full Gallop" by Mark Hampton and Mary Louise Wilson (published in 1997 but first performed in 1995 with Wilson in the role of Vreeland):

Actually, pale-pink salmon is the only color I cannot abide.
Although, naturally, I adore PINK. I love the pale Persian pinks of the little carnations of Provence, and Schiaparelli's pink, the pink of the Incas.
And, though it's so vieux jeu I can hardly bear to repeat it, pink is the navy blue of India.

This passage, like others in the script, is taken verbatim from Vreeland's 1984 memoirs, D.V. (p. 106 of the 1997 Da Capo Press edition, which incidentally has a foreword by Mary Louise Wilson). By that time, near the end of her life, she seemed quite bored with her famous catchphrase, considering it so vieux jeu (lit. 'old game') that she could barely stand repeating it. Indeed, a Nov. 28, 1980 profile of Vreeland in the Washington Post referred to "Pink is the navy blue of India" as "her most frequently quoted statement."

The quote had, in fact, been traveling with Vreeland ever since she burst into the public eye as the editor of Vogue in early 1962. In March of that year, Carrie Donovan wrote a long New York Times profile of Vreeland that included this anecdote:

A designer tells of the time he showed Mrs. Vreeland a swatch of bright pink silk of Eastern influence.
"I ADORE that pink!" she exclaimed. "It's the navy blue of India."
("Diana Vreeland, Dynamic Fashion Figure, Joins Vogue," New York Times, Mar. 28, 1962, p. 30)

In her 2002 biography Diana Vreeland, Eleanor Dwight identifies Donald Brooks as the designer who shared the story with the reporter. (Vogue photographer Norman Parkinson also recalls Vreeland's saying the line to him, as recounted in his 1983 book Fifty Years of Style and Fashion, reviewed here.) There must have been something particularly striking about Vreeland's bold formulation, since it would often be repeated in profiles of her — as in "The Vreeland Vogue," a Time Magazine piece from May 10, 1963.

Some claim that Vreeland's comment was inspired by a trip to India (as the Wikipedia entry claims was true of Vanderbilt), but I haven't found any evidence of this. I doubt this was the case, since in D.V. she writes that as a fashion editor she herself didn't travel, instead living vicariously through international fashion shoots: "I couldn't take off for a few weeks to see, say, a bit of India. But I could send groups of photographers, editors, and models, and they'd be there the next day." So I'd imagine that she came to the conclusion that "pink is the navy blue of India" based on one of these shoots that she arranged from the comfort of her New York office.

But when did "the new black" enter the picture? I have yet to find any usage before the 1980s, when various colors were anointed "the new black":

Colors are slated to be somber and muted, say most of the designers who previewed their collections for Fashion83. For example, Ferre says gray is the new black. (Los Angeles Times, Mar. 4, 1983, p. V6)

"There is a tremendous range to the color brown," says [textile and color specialist Elaine] Flowers, who expects brown to look updated because of the way it is paired with other colors, and used in varied textures. "It is the new black." (Washington Post, Mar. 15, 1984, p. D9)

Navy is the new black in Paris; in London and Milan, brown is the preferred alternative. (Washington Post, Apr. 3, 1984, p. C6)

"We're very strongly navy for the season," he [sc. merchandising agent Joseph Martinez] said. "Navy is the new black." (Los Angeles Times, Oct. 26, 1984, p. IV15)

Nearly 4,000 fashion professionals filled the New York Hilton's ballroom for two runway shows spotlighting fall trends: trumpet skirts, swing dresses, styles the moderator said were ''for the woman whose bank account is equal to her self-assurance,'' belts (''the accessory of the year''), velvet, gray (''the new black''), boots and big coats. (New York Times, May 27, 1986, p. C12)

Diana Vreeland is not mentioned in any of these early cites, and her use of navy blue as a standard fashion color had been replaced by black (thanks to Donna Karan and other designers of the day). It's hard to know exactly what influence Vreeland had on the "new black" pronouncements of the '80s, but perhaps for fashionistas of the era the old line about pink being "the navy blue of India" was such common knowledge that it was easy to mold into the "X is the new black" template. Or perhaps there are still some missing steps between the Vreelandism and the later snowclones. Either way, it doesn't look like Gloria Vanderbilt had anything to do with it.

[Update: Barry Popik has tracked down an intermediary step on the way to "the new black" of the '80s — "the new neutral":

Colors are the new neutrals. Find a color you like and wear it with everything. (New York Times, Sep. 16, 1979, p. NJ16)

Pearl gray is the new neutral, navy and black are everywhere, alone or with anything. (Chicago Tribune, Nov. 12, 1979, p. B3)

Lila Schneider, another New York designer, said, "Pink is the new neutral — a change from the stark white of the last few years." (Chicago Tribune, Oct. 19, 1980, p. 13-1)

No one knew how to interpret all this color experimentation until one New York observer finally blurted: "It looks like red is the new neutral." (Toronto Globe & Mail, Nov. 24, 1981, p. F6) ]

Posted by Benjamin Zimmer at 05:10 PM

Onomastic malice

I don't have a clue about what my parents were thinking when they gave me my middle name, Wellington. Since childhood, I've tried to bury it by using only an initial, W, and I've been even more diligent now that W has taken on a, well, more pejorative meaning. But think for a minute about the public relations problem Barack Obama has these days with his own middle name, Hussein. Or, for that matter, with his family name, Obama. And Barack may not be so helpful either. David Wallis (note the omission of his middle name, Robin, or even his initial) writes about this in a recent Slate article.

Wellington is some sort of national hero in England, at least, but not for a working class kid growing up in industrial northeastern Ohio, where it signified only uppity stuffiness and pretense. It even served as a mocking insult when I missed a crucial shot in an important high school basketball game and my classmates in the stands shouted out, "Wellington," to show their disapproval--one of those memories that I want to erase but can't quite purge.

Already Republican strategist Ed Rogers and right wing screeder Rush Limbaugh have started a political and bigoted onomastic attack on Obama. So far, at least, Obama has tried to use only his first and last names, not even suggesting that there is an H lurking there some place. But middle initials are said to sound presidential, like John F. Kennedy, Franklin D. Roosevelt, Gerald R. Ford, or Richard M. Nixon, and I wonder if Obama eventually will need to admit that he's forever stuck with an H there. Like my (sigh) W.

Posted by Roger Shuy at 05:00 PM

Not-so-worrisome details

The BBC may still be peddling its nonsense about cow dialects, but at least one comic strip character has rightly decided not to worry about this factitious factoid.

Here's today's "Sylvia":

Good instincts, Woman Who Worries About Everything!

(Hat tip Joel Berson.)

Posted by Benjamin Zimmer at 04:44 PM

Factoids of the Year

Today the BBC News "Magazine Monitor" posted "100 things we didn't know last year", introduced like this:

Each week, the Magazine chronicles interesting and sometimes downright unexpected facts from the news, through its strand 10 things we didn't know last week. Here, to round off the year, are some of the best from the past 12 months.

One of the featured items was very familiar:

45. Cows can have regional accents, says a professor of phonetics, after studying cattle in Somerset

Truly, they have no shame -- see "It's always silly season in the (BBC) science section" (8/26/2006) for the hilarious details.

One of the "10 things" in this week's "strand" will also be familiar to our readers:

1. Just 20 words make up a third of teenagers' everyday speech.

What fraction of the other 100 "interesting and ... downright unexpected facts" do you suppose are equally bogus? I'm not sure, but I'll bet at least that the presentation by BBC News is careless and misleading. Let's check another factoid with linguistic connections:

57. The word "time" is the most common noun in the English language, according to the latest Oxford dictionary.

This is a reference to a news item from June 22, "The popularity of 'time' unveiled", which is basically a re-write of an item from the "English Uncovered" supplement to the Concise Oxford Dictionary. "The hundred commonest English words" was posted on the AskOxford.com site on January 6, 2006, so there was plenty of time for research.

And the BBC got the main point right: the commonest noun in the BBC's billion-word corpus was indeed listed as time. But the story does manage to botch the background reasoning:

OUP project manager Angus Stevenson said much of the frequency of the use of words such as "time" and "man" could be put down to the English love of phrases, such as "time waits for no man."

I doubt very much that Angus Stevenson uttered any such preposterous violation of common sense. A quick Google search suggests that "time waits for no man" contributes only 67,800 hits towards the 2.23 billion pages containing time, and the 1.04 billion pages containing man. What the "English Uncovered" supplement actually says about this is:

Another reason for a word's high position on the list is that it forms part of many common phrases: most of the frequency of time, for example, comes from adverbial phrases like on time, in time, last time, next time, this time, etc.

Indeed, this version of the assertion is intuitively plausible, and Google counts for the listed phrases -- 59.9m, 192m, 45.4m, 71.7m, 267m respectively -- confirm the intuition. We're not asking for higher mathematics here -- just a bit of common sense, basic logic, and elementary care for the facts.

Here's a recent fact that I didn't know ("BBC loses license fee battle", 12/27/2006):

In what amounts to a major blow to the credibility of BBC director general Mark Thompson, the government has reportedly decided to go ahead with a far lower license fee settlement than called for by the BBC.

According to sources close to the settlement, Treasury Secretary Gordon Brown has settled on a 3% increase in the BBC's £3.3 billion ($6.5 billion) per year license fee in 2007, followed by an increase of 2% per year over the following three years.

The figures fall far short of the BBC's call for a 5.7% increase each year through 2012, a figure it said would take account the rate of inflation, currently running at 3.9%.

The news was broken by Channel 4 News and widely picked up by news organizations here.

When the BBC made its original license fee bid at the start of the year, it called for a license fee hike amounting to 6.4% a year for seven years. This was later revised downward to 5.7% after the government's independent auditors rejected the BBC's own financial analysis and cost projections.

I don't think that very much of that $6.5 billion per year goes to BBC News. But still, you'd think they spoke one of those languages without a word for accountability.

Posted by Mark Liberman at 01:56 PM

Like, a Christmas gift card

The Zits strip for Christmas Eve offered five gift cards a teenage boy can give his parents. All five are gifts of "communication", in a broad sense, and four of the five are specifically about language:

My favorite, of course, is the one that gets central billing in the cartoon, about the word like. I like that one [yes, that was intentional] because I'm part of the Stanford ALL Project -- initiated by John Rickford and also involving Isa Buchstaller, Elizabeth Traugott, Tom Wasow, and me, plus a supporting cast of students, both undergraduate and graduate -- which looks at innovative uses of all (in particular, intensifier all and quotative all) and ends up looking at quotative like as well:

I'm like "Yeah," and she's all "no" [From the song of the same name by the Mr. T Experience]

(Look for Rickford, Buchstaller, Wasow, and Zwicky on all, to appear soon in American Speech. Manuscript available here.)

Now, the thing about like is that, even if you exclude the verb like, it has so very many uses -- at least: as a preposition, a subordinator, a discourse particle, a quotative, and a sentence-introducing element, in an ironic assertional use:

Like I care about what you think. 'I don't care what you think'

And there are subtypes of the prepositional, subordinator, and discourse particle uses. We've looked at a number of these, in an unsystematic way, here on Language Log. Back in May 2005, Mark Liberman assembled a list of postings up to that point, with pointers to another blog and to Muffy Siegel's 2002 paper on like as a discourse particle (which includes references to the earlier literature on the subject).

In any case, teenagers have been fond of discourse-particle uses of like for quite some time, at least 50 years; some people now in their 50s and 60s still use like this way. Meanwhile, quotative like has risen in 25 or 30 years to become the dominant quotative in the speech of young people (and some older speakers use it too). The result is that some young people are indeed heavy users of like in functions that some of their elders do not use it in. And many of these older speakers are annoyed as hell about that.

This strongly negative response deserves some attention and analysis. Here I'm just going to open up the issues a bit.

When people complain to me about discourse-particle and quotative like, I ask them why they dislike it so, and they usually say that kids are just sprinkling a meaningless word (discourse-particle like) all over their sentences and are inexplicably choosing to use a preposition (quotative (be) like) instead of the perfectly good verb say. They characterize these uses as "bad habits"; they are very resistant to the idea that people who use like as a discourse particle or quotative are actually DOING THINGS by their linguistic choices (though the functions of these choices are what linguists have mostly been interested in); and they are offended by teenagers' rejection of older standard usages in favor of innovations. That is, they make no attempt to figure out what people who use a somewhat different variety from their own are conveying (they are uncooperative in their interpretation of other people's speech), and they refuse permission to other people to have varieties of their own (they demand conformity).

Uncooperativeness and demands for conformity attend responses to other inter-group linguistic differences, of course, especially when the groups differ socially, in power or prestige. I have met people who simply REFUSE to understand "double negation" (I didn't see no dogs 'I didn't see any dogs') in non-standard varieties, for example. But young people seem to suffer especially from these responses. No doubt that's because they are, after all, OUR children (for some sense of our) and we are distressed that they refuse to be just like us.

Note that discourse-particle and quotative like have both linguistic value (they can be used to convey nuances of meaning) and social value (they're part of the way personas and social-group memberships are projected). I'm not denying that there are fashions in these things; a major part of the Stanford ALL Project's recent work, in fact, has treated changes over time (some of them huge) in the details of the way people use all and its competitors. When I talk to those who object so strongly to "innovative" uses of like, I try to hit both the linguistic and the social points: the kids are doing things with these usages, and they're also following fashion (and there's nothing intrinsically wrong with that, especially if you're 15). And: nobody is saying that YOU should be talking that way.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:32 PM

A little more of The New Y

I don't intend to post new sightings of the snowclone The New Y as they come in -- there are just too many of them -- but I've recently come across two examples that strike me as of more than routine interest.

Meanwhile, as New Year's Day approaches, I've been hoping to unearth instances of "X is the new year": the decade is the new year, decades are the new years, the month is the new year, months are the new years, etc. No luck so far.

New sighting 1: "Pink is the new gold." This cleverness from Peggy Orenstein in a New York Times Magazine piece "What's wrong with Cinderella?" (12/24/06, p. 36). The story is about the "princess" trend for little girls, with everything in the color pink, a trend that is making huge profits (the gold) for Disney, Mattel, and others. This one echoes the ur-New-Y expression "Pink is the new black" in having pink as the subject and (what can be used as) a color word, gold, in the predicate, while punning on that word. Anyone unfamiliar with the snowclone would probably have a lot of trouble interpreting the sentence.

New sighting 2: "Doubt is the new religion, but does doubt doubt itself?" In a letter to the New York Times (12/26/06, p. A26) from Peter McFadden, writing about a recent upsurge in commentary critical of religious belief. This one is interesting because it can be read literally, parallel to "Doubt is the new trend" and understood as conveying 'Doubt is a new religion', or as an instance of the snowclone, conveying something stronger, roughly 'Doubt has replaced religion'. My first guess was that McFadden intended the latter, but then I'm disposed to see snowclones everywhere.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:49 AM

An apology to our readers

This morning, I was suddenly seized by the compulsion to correct an injustice. Geoff Pullum and I were the perpetrators, and our motives were pure. The victim was the BBC News organization, which amply deserved what we did, and worse. But an injustice it was nonetheless: we used an unfair and misleading argument.

The BBC's reporters and editors might be lazy and credulous, scientifically illiterate, bereft of common sense, dishonest, and given to promoting dubious products, but they deserve to be confronted with sound arguments based on solid facts. (A bit of humor is OK as well -- they're an easy target --but that's not the point here.) More important, our readers expect and deserve sound arguments and solid facts from us. But when Geoff and I took the BBC to task for their 12/12/2006 story "UK's Vicky Pollards 'left behind'", we used an invalid argument. It may have cited genuine facts and reasoned to a correct conclusion, but a step in between was, well, fudged.

I felt a little bad about it, but the general outlines of the argument were right, and I didn't have time to do a better job. My conscience has been nagging at me, though, and so this morning I'm going to set the record straight. Or at least, I'll set it as straight as I can. I'm handicapped by the fact that the the particular batch of nonsense that the BBC was serving up on December 12 was based on their misinterpretation of an unpublished, proprietary report, prepared by Tony McEnery for the conglomerate Tesco.

So I can't do the calculations that would allow a fair version of the argument. I'll do what I can for now, and I'll ask Tony if he'll do the corresponding calculations on the data that's unavailable to me. In any case, there's some conceptual value in the discussion, I think, even if we never learn the whole truth about this particular case.

The thing that set it off was the second sentence of the BBC story:

Britain's teenagers risk becoming a nation of "Vicky Pollards" held back by poor verbal skills, research suggests.

And like the Little Britain character the top 20 words used, including yeah, no, but and like, account for around a third of all words, the study says. [emphasis added]

Arnold Zwicky, who still likes to think of the BBC as run by sensible and honest people, commented in passing ("Eggcorn alarm from 2004", 12/14/2006):

[T]his could merely be a report on the frequency of the most frequent words in English, in general. If you look at the Brown Corpus word frequencies and add up the corpus percentages for the top 20 words (listed below), they account for 31% of the words in the corpus. But that would be ridiculous, and it wouldn't distinguish teenagers from the rest of us, so what would be the point?

The point, I figured, was to pander to their readers' stereotype of lexically impoverished teens. To help readers understand this, I made the same general argument that Arnold did, at greater length and with some different numbers ("Britain's scientists risk becoming hypocritical laughing-stocks, research suggests" 12/16/2006):

The Zipf's-law distribution of words, whether in speech or in writing, whether produced by teens or the elderly or anyone in between, means that the commonest few words will account for a substantial fraction of the total number of word-uses. And in modern English, the fraction accounted for by the commonest 20 orthographical word-forms is in the range of 25-40%, with the 33% claimed for the British teens being towards the low side of the observed range.

For example, in the Switchboard corpus -- about 3 million words of conversational English collected from mostly middle-aged Americans in 1990-91 -- the top 20 words account for 38% of all word-uses. In the Brown corpus, about a million words of all sorts of English texts collected in 1960, the top 20 words account for 32.5% of all word-uses. In a collection of around 120 million words from the Wall Street Journal in the years around 1990, the commonest 20 words account for 27.5% of all word-uses.

I should have pointed out that the exact percentage will depend not only on the word-usage patterns of the material examined, but also on

the details of the text processing (the treatment of digit strings and upper case letters makes a big difference, as does the frequency of typographical errors);
the size of the corpus -- larger collections will generally yield smaller numbers;
the topical diversity of the corpus -- new topics bring new words.

(The first factor, by the way, explains why Arnold got 31% and I got 32.5% for the Brown corpus.)

Without controlling carefully for those factors, citing the percent of all words accounted for by the 20 commonest words is almost entirely meaningless. That was the BBC's second mistake. Their first mistake was to imply that a result like 33% is in itself indicative of an impoverished vocabulary.

And in my desire to demonstrate in a punchy way the stupidity of this implication, I did something unfair -- I cited the comparable proportion for the 1190-word biographical sketch on Tony McEnery's web site:

And in Tony McEnery's autobiographical sketch, the commonest 20 words account for 426 of 1190 word tokens, or 35.8% . . .

In fact, Tony used 521 distinct words in composing his 1190-word "Abstract of a bad autobiography"; and it only takes the 16 commonest ones to account for a third of what he wrote. News flash: "COMPUTATIONAL LINGUIST uses just 16 words for a third of everything he says." Does this mean that Tony is in even more dire need of vocabulary improvement than Britain's teens are?

Now, I knew perfectly well that this was an unfair comparison, since the BBC's number was derived by unknown text-processing methods applied to unknown volumes of text on an unknown range of topics. I considered going into all of that -- but decided not to, partly for lack of time, and partly because it weakened the point. So I decided to put forward my own little experimental control instead:

In comparison, the first chapter of Huckleberry Finn amounts to 1435 words, of which 439 are distinct -- so that Tony displayed his vocabulary at a substantially faster rate than Huck did. And Huck's commonest 20 words account for 587 of his first 1435 word-uses, or 40.9%. So Tony beats Huck, by a substantial margin, on both of the measures cited in the BBC story. (And just the 12 commonest words account for a third of Huck's first chapter: and, I, the, a, was, to it, she, me, that, in, and all.) We'll leave it for history to decide whose autobiography is communicatively more effective.

This evened the playing field, since Huck gets a higher 20-word proportion than Tony, for the same text-processing methods applied to a slightly larger text. And I thought it was a good way of underlining the point that Geoff made back on Dec. 8 ("Vocabulary size and penis length"):

Precision, richness, and eloquence don't spring from dictionary page count. They're a function not of how well you've been endowed by lexicographical history but of how well you use what you've got. People don't seem to understand that vocabulary-size counting is to language as penis-length measurement is to sexiness.

Geoff Pullum was then inspired to give the BBC a dose of their own medicine ("Only 20 words for a third of what they say: a replication"), and observed that in the 402-word "Vicky Pollard" story itself, the top 20 words account for 36% of all words used -- more than the 33% attributed to Britain's teens.

This is a great rhetorical move -- the BBC's collective face would be red, if they weren't too busy misleading their readers to pay attention to criticism. But at this point, in fact, Geoff and I may have misled our own readers.

I'll illustrate the problem with another little experiment.

A few days ago, I harvested a couple of million words of news text from the BBC's web site. I then wrote some little programs to pull the actual news text out of the html mark-up and other irrelevant stuff, to divide the text into words (splitting at hyphens and splitting off 's, but otherwise leaving words intact), removing punctuation, digits and other non-alphabetic material, and mapping everything to lower case. As luck would have it, the first sentence, 23 words long, happens to involve exactly 20 different words after processing by this method:

the
father
of
one
of
the
five
prostitutes
found
murdered
in
suffolk
has
appealed
for
the
public
to
help
police
catch
her
killer

This of course means that the 20 commonest word types -- all the words that there are -- account for 100% of the word tokens in the sample so far. If we add the second sentence, 19 additional word tokens for 42 in all, we find that there are now 36 different word types, and the 20 commonest word types occur 26 times, thus covering 26/36 = 72% of the words used. The third sentence adds 24 additional word tokens, for 66 in all; at this point, there 53 different word types, and the 20 commonest ones cover 33/66 = 50% of the words used. After four sentences, the 20 commonest words cover 35/75 = 47% of the words used; after five sentences, 48/105 = 46%.

By now you're getting the picture -- 100%, 72%, 50%, 47%, 46% ... As we look at more and more text, the proportion of the word tokens covered by the 20 commonest word types is falling, though more and more gradually.

What happens as we increase the size of the sample? Well, the proportion continues to fall in the same sort of pattern. Here's a plot showing the values from 500, 1k, 5k, 10k, 100k, 500k and 1m words in this same sample of BBC text:

So the first 500 words -- roughly one story -- yields a value of about 33%, or about the "one third" that the BBC story cited for Britain's Vicky Pollards. But the million-word BBC sample winds up with a value of about 29%, which is somewhat lower. And if we went to a billion words of text, the number would go a little lower still. So we were in the position of countering a misleading statistic with another misleading statistic.

In fact, the whole notion of using the coverage of the 20 commonest word-types as a measure of effective vocabulary size is not a very good one. The result depends completely on the relative frequency of a few words like "the", "to", "of", "and", "is" and "that". Small differences in style can have a big impact on this value. Dropping or retaining all optional instances of "that", writing in the historical present (which boosts the relative frequency of "is"), systematically choosing phrases like "France's king" instead of "the king of France", etc. -- none of these have any real connection to effective vocabulary size, in any intuitive sense, but they can have a large impact on the relative frequency of the 20 commonest words.

Instead, we'd like a measure that depends on the whole word-frequency distribution.

Just measuring the total number of different words is not the right answer -- this number has many of the same difficulties. In particular, it very much depends on the size and topical diversity of the corpus surveyed. As the following plot shows, over the course of a million-word sample of BBC news, the overall vocabulary size, measured in terms of number of distinct word-forms used, continues to increase steadily:

This increase will continue up through much larger corpora than we are likely ever to collect from individual speakers or writer. It makes some sense to quantify active vocabulary in terms of the rate of growth -- this raises some interesting mathematical issues about how to parameterize the function involved, which would be a good (if a bit geeky) topic for another post. Unfortunately, however we quantify it, this will depends on a bunch of very delicate decisions about what a "word" is. Consider the following plot, which follows "vocabulary" growth in the same way through 50 million words of newswire text in English, Spanish and Arabic:

So, it seems, the Arabic vocabulary is about five times richer than the English vocabulary. (Please, people, don't tell the BBC!) No, that's not the right conclusion -- the difference shown in the plot is due to differences in orthographic practices and differences in morphology, not differences in vocabulary deployment. Arabic writing merges prepositions and articles with following words, as if in English we wrote "Thecat sat onthemat" instead of "The cat sat on the mat". And Arabic has many more inflected variants of some word classes, especially verbs. (And Arabic-language newswire also appears to have more typographical errors than English-language newswire, in general.)

This problem makes comparisons across languages difficult, but it also arises for within-language comparisons. You'll get an apparent overall "vocabulary" boost from a wide variety of topical, stylistic and orthographical quirks, such as writing compounds solid; frequent use of derivational affixes like -ish, -oid and -ism; eye dialect; use of proper names; spelling variants and outright bad spelling. As corpus size increases, such factors -- even if they are relatively rare events -- can come to dominate the growth in the type-token curve.

As this suggests, the total count of words that someone (or some group) ever uses is not a very helpful measure of what their ordinary word usage is like. We'd like a measure that reflects the whole distribution of word frequencies, not just the total number of word-types ever spotted, which is almost as arbitrary a measure as the relative frequency of the top 20 words.

One obvious choice is Shannon's measure of the information content of a probability distribution -- its entropy. This gives us a measure of the information acquired by seeing one additional word from a given source. If we express this as perplexity -- 2 to the power of the entropy -- we get a measure of the vocabulary size that would be associated with a given amount of per-word information, if all vocabulary items were equally likely.

The trouble is, simple measures of entropy (or perplexity) also grow with corpus size. Here's what happens to (unigram) entropy and perplexity over the course of the first million words of BBC news that I harvested the other day:

Now, we could pick some plausible standard corpus size, say 100,000 words, and determine the perplexity of the word-frequency distribution for that corpus (using some well-documented text processing method), and use that as our measure of effective vocabulary. Conclusion: "a 100,000-word sample of BBC newswire has an effective vocabulary, in information-theoretic terms, of 950 words". If we knew what the corresponding number was for Tony McEnery's samples of British teen speech transcripts and weblogs, we'd have some sort of fair comparison.

That's a nice bit of rhetoric, but it's not really what we want. There are at least two sorts of problems. One is that we ought to be measuring information content in terms of how well we can predict the next word in a sequence, not how well we can predict a word in isolation. Another problem is how to find a measure that isn't so strongly dependent on corpus size.

There's a simple (and very clever!) solution, though you have to be careful in applying it. However, it takes a little while to explain, and this post is already too long, so we'll come back to it another day.

BBC News functioned true to type here -- lazy and credulous, scientifically illiterate, bereft of common sense, and the rest of it. You can add class prejudice and age stereotyping in this case as well. And the basic point that Geoff and I made was correct -- the BBC was bullshitting when it implied that British teens are lexically impoverished by noting that in their conversations, "the top 20 words used ... account for around a third of all words". Although the teens probably do need vocabulary improvement, the cited statistic doesn't address that question one way or the other.

But what Geoff and I did was, in effect, to use the journalists' own techniques against them, and our readers deserve better than that.

Posted by Mark Liberman at 09:42 AM

Surging vocabulary

Geoff Nunberg is right to point out the semantic novelty of surge in the sense of "a prolonged deployment of additional troops in Iraq," as the Bush administration and others have used the term in recent weeks. But there's another innovative aspect of surge as it has been deployed (so to speak) in the political discourse surrounding the war in Iraq since the midterm elections. It's not just the noun form of surge that's getting reshaped — the verb is getting a makeover too.

In an op-ed piece in the Wall St. Journal on Nov. 10, former CIA analyst Reuel Marc Gerecht pessimistically previewed the forthcoming release of the Iraq Study Group's final report. (Gerecht was one of the few neoconservative voices on the ISG's panel of experts.) Gerecht's widely quoted assessment went as follows:

We either declare defeat and withdraw completely tout de suite, or we surge troops into Baghdad and fight. The ISG will surely try to find some middle ground between these positions, which, of course, doesn't exist.

A week later, Tim Harper wrote in the Toronto Star (abstract here, full text here) about the "new buzzword" in U.S. policy in Iraq: "a so-called surge, in one last bid to win a war that looks more and more unwinnable." (In the past I've criticized Harper's inaccurate claims about another putative military buzzword: "transfer tubes," supposedly a euphemism for "body bags." But in this case his reporting is right on the money.) In exploring the talk of a possible "surge" of troop levels in Iraq, Harper quotes Sen. John Cornyn (R-Tex.) in a Fox News interview:

I've resisted the call by Senator McCain and some others that we needed to surge troops on a temporary basis, but, you know, I'm beginning to think that he's got a point.

Since then, there's been plenty of talk about plans to "surge troops" (or "troop levels") in Iraq. This transitive usage of surge, meaning 'to introduce (something, esp., troops) quickly, forcefully, and in large numbers (into a region),' has yet to be recorded by the major English dictionaries. The OED lists a poetic transitive sense, meaning "to cause to move in, or as in, swelling waves or billows; to drive with waves," as in this couplet from James Lowell's "A Parable, 'Said Christ Our Lord'" (1873):

Great organs surged through arches dim
Their jubilant floods in praise of him.

In Lowell's verse, organs are "surging" metaphorical floods of praise for Jesus, which only vaguely resembles current usage, in which the administration hopes to "surge" floods of soldiers into Iraq to rebuff insurgents. (Is there some sort of cross-pollination between surging and insurgency going on here?) Other than the 'drive with waves' poeticism, the only transitive usage noted by the OED and other dictionaries is a specialized nautical sense, 'to let go or slacken (a rope or cable) gradually,' which doesn't seem related at all.

The military sense of transitive surge may be new to most readers (as it was to me), but it turns out it's been kicking around defense circles since the '90s. The earliest relevant citation I've found so far is a Sep. 7, 1993 article from Defense Daily, referring to ships that would "surge troops and equipment to crisis areas." And here is a selection of subsequent cites, mostly from military brass and civilian officials:

NATO has said that it will embrace these countries through interoperability, as they upgrade their forces they will make sure that they -- the plug goes into the West wall rather than the East wall -- and through reinforcement; that is, the ability to surge troops into an area of trouble, as opposed to permanently stationing large numbers of combat forces in these new countries. (Briefing by National Security Advisor Sandy Berger, May 31, 1997)

Jackson was worried Slobodan Milosevic might try to retaliate against Macedonia after NATO began its bombing campaign by surging troops across the border. (USA Today, Apr. 2, 1999)

The bases will be used as training grounds for U.S. forces on six-month rotations, hubs for intelligence gathering, and marshaling yards when the Pentagon needs to "surge" troops to a specific region. (U.S. News & World Report, Oct. 6, 2003)

Now, there is no question that there's parts of Iraq that we need to surge troops into, and that there's parts of Iraq that may not need the number of troops that at earlier times were in there. (Testimony by Gen. Peter J. Schoomaker, Senate Armed Services Committee, Nov. 19, 2003)

Abizaid asked his staff for options for surging troops to Iraq last week as violence in the Sunni triangle stepped up. (UPI, Apr. 12, 2004)

We might need to surge troops in for short periods and to bring them out again to achieve an effect. (Testimony by Maj. Gen. Freddie Viggers, House Armed Services Committee, May 17, 2004)

For example, during the March riots in Kosovo, NATO was able to surge an additional 3000 troops within a few days, the first arriving in less than 24 hours. (Testimony by Acting Assistant Secretary of Defense Mira Ricardel, Senate Committee on Foreign Relations, July 14, 2004)

I think there's enough troops right now to do the job. And what they'll do is, they'll surge troops on a temporary basis, Americans, if there's not enough Iraqis trained to cover the elections. (CNN interview with Brig. Gen. David Grange, Sep. 25, 2004)

The United States can always surge troops for specific needs by altering rotation rates or using the theater reserve. (Testimony by Dr. Anthony H. Cordesman, Senate Committee on Foreign Relations, July 18, 2005)

We can surge one, two, three brigades for the election. We will probably do that. (Testimony by Gen. Barry R. McCaffrey, Senate Committee on Foreign Relations, July 18, 2005)

If we had to surge troops, we could. (Meet The Press interview with Gen. Montgomery Meigs, Aug. 28, 2005)

So if you have in today's world 18 to 20 brigade combat teams deployed, we can surge, with the army force generation model, another 18 to 20 brigade combat teams. (AFP interview with Army Secretary Francis Harvey, Jan. 18, 2006)

It's been a gradual buildup over the years, but the floodgates have officially opened for the new sense of surge. It might even be a good contender for Word of the Year, though a novel transitive usage of a preexisting verb doesn't exactly have the same P.R. magic as, say, truthiness.

Posted by Benjamin Zimmer at 01:06 AM

December 27, 2006

Surge projection

The phrase "brief surge" gets 28,000 reported hits on Google. The phrase "long surge" gets 14,000. Not an overwhelming disproportion, at first blush, but do the latter search while excluding the items "troops" and "Iraq" and the hit-count falls to under 900, a substantial portion of them involving phrases like "week-long surge," "day-long surge" and the like, where long is simply a kind of measure classifier. All of which seems to support the observations by Spencer Ackerman at The New Republic and Steven Benen at Washington Monthly that the administration and its supporters are breaking new semantic ground when they use surge to refer to a prolonged deployment of additional troops in Iraq ("The only 'surge' option that makes sense is both long and large," say Jack Keane and Frederick Kagan in a Washington Post op-ed today, in what Matthew Yglesias describes as the double entendre of the day). The mot juste here would be "escalation," of course, but as Ackerman points out, that item is liable to call up unpalatable associations.

Posted by Geoff Nunberg at 09:26 PM

Reader response

Today's mailbag has a couple of interesting reactions to my little dramatic exercise, "Shakespearing the reader's brain". In response to the discussion of denominal verbs in Act I, Susan H. wrote:

Although there were many things that bothered me about the Presbyterian Church in which I was raised, the final straw was the minister's use of denominal verbs. Specifically, he would say things like, "we shall fellowship together."

I found it so offensive that I became an Episcopalian as soon as I arrived at university.

I hate to tell you, Susan, but you might need to migrate to another denomination. The recent Christmas Message of the 26th Presiding Bishop of the Episcopal Church, the Most Reverend Katherine Jefferts Schori, begins this way:

God loved us so much that he came to dwell among us, to tent among us in human flesh... [emphasis added]

The bishop's message ends by reminding us that "God continues to be birthed in fragile opportunities that will need to be nourished and tended by others".

Now it's true that Shakespeare used tent as a verb ("The smiles of Knaues / Tent in my cheekes, and Schoole-boyes Teares take vp / The Glasses of my sight", Coriolanus, Actus Tertius). And the OED lists birth as a transitive verb in U.S. dialectal use, with citations from 1906. But that kind of authority has not been enough to redeem the verbal forms of dialogue, access, and so forth -- this is a matter of individual conscience, after all, not susceptible to earthly principalities and powers. One suggestion: a quick scan suggests that Richard Dawkins' writing is largely free of unexpected denominal verbs.

But perhaps post-collegiate maturity has brought you enough tolerance to turn the other cheek, linguistically speaking, and to await other denominal verbs in a spirit of ecumenical acceptance. We might even say, purely in the interests of exciting our brains and helping us stave off old-age forgetfulness, that you could "cheek it".

Meanwhile, Jim G. was inspired by Act II to imagine a bright future for event-related potentials in the arts:

I am delighted with Mark Liberman's report of the neuroscientific evidence of creative wordsmithing, which reinforces what I tell myself we always knew about good and bad art: P600 events without N400 reactions.

Now, if only you guys will add more of the search for quality to your usual discussions of the mechanics and the history.... Maybe we can get the eegheads to find specific ERP events for triteness and coarseness... Just imagine: the Academy of Motion Picture Arts and Sciences might award Oscars based on EEGs of a sample audience watching the various films. Imagine real quality triumphing over the buzz factor.

Wasn't that in one of Woody Allen's movies?

Seriously. there's an interesting idea implicit in Jim's joke, which is that when you measure cerebral blood flow or neurologically-generated electromagnetic fields, you learn something that is truer, or more reliable, or in some other way of higher quality, than what you learn by asking people questions or observing their behavior. So far, I'm afraid that the facts are the opposite way around. You can get much more reliable and also much more nuanced information about how people deal with linguistic form and meaning from a wide variety of old-fashioned methods, starting with introspection and self-report, and continuing with (direct and indirect) experimental methods such as reaction-time measurements, gaze tracking, corpus statistics and many others.

That's not to say that functional brain imaging isn't worth doing -- it enriches the available evidence in important ways. But EEG/ERP, MEG, fMRI and PET are all relatively crude tools, with poor signal-to-noise ratio and poor temporal resolution (fMRI, PET) or spatial resolution (EEG, MEG), usable only in highly artificial situations, and incapable of measuring many key features of neural activity.

Jim's joke also depends on another fallacy, which is that measuring people's brains somehow elevates their responses out of the domain of social science -- where we're dealing with a sample from a socially and culturally variable population -- and into an empyrean realm of psychic absolutes. I commented on this widely-held view in an earlier post with respect to a study of sex differences in humor ("Flacks and hacks and Hitchens", 12/14/2006):

... the subjects in this study were 10 males and 10 females, average age 22, recruited at Stanford Medical School. [...]

But the paper's conclusions aren't about how Stanford med students' brains work. Instead, the conclusions are about what "males and females share" and what "females ... activate .. more than males" and so on.

Would anyone accept a characterization of Americans' political or religious opinions, or their product preferences, based on a sample of 10 first-year Stanford medical students? Would a newspaper try to predict a national election from such a sample? Would a network executive rely solely on such a sample in estimating the response to a new comedy show? You'd have to unusually stupid or gullible to believe predictions about the population at large that are based on ten 20-somethings enrolled at Stanford.

So why are Dr. Reiss and his colleagues willing to treat such a sample as acccurately characterizing the nature of the brain responses to humor of human females and human males, taken as a whole? And why does a savvy political journalist like Hitchens accept this extrapolation as truth?

There's an implicit assumption here that from the point of view of humor a brain is a brain -- or rather, a male brain is a male brain, and a female brain is a female brain. Age, education, personality, cultural background, occupation -- none of that matters, and so none of that needs to be controlled for. We neuroscientists don't need no demographically balanced samples, we're measuring brains. Determining men and women's responses to humor is treated like determining the melting points of bismuth and antimony -- all you need to do is to measure a pure enough sample. There's some residual recognition that statistical variation needs to be averaged out, which is why N=10 rather than 1. But if you think of the enormous variation in either sex's sense of humor -- surely as richly varied as their attitudes towards politics or shampoo -- the assumption of sexual uniformity seems very strange.

Even stranger than the idea of impersonal neuro-humor is the notion that absolute measures of "triteness" and "coarseness" could be derived from the brains of a sample audience watching a movie or a TV show, thus eliminating the "buzz factor" in favor of "real quality".

Unless, of course, you take the view that all brains share the same underlying aesthetic dimensions -- which being facts about brains, and therefore biological, must be universal, like loudness or pitch -- and that everyone also shares the same basic propensities to map sensory experiences onto these neurological scales. It's only later that things get individual and messy, when the brain's responses are shown to the subjective homunculus in the Cartesian theater, who muddies the aethetic waters with preferences and actions:

"Hmm, that's 7.3 out of 10 in neuro-triteness, and 8.7 out of 10 in neuro-coarseness, so..."

A) "yucch, turn it off immediately before I lose my lunch!"
B) "OK, one episode was interesting, but I don't think I'll make a habit of it."
C) "wow, let's buy the boxed DVD set of the last five seasons!"

On this view, you could determine the true aesthetic value of an experience, in effect, by zeroing in on what the brain is doing before the mind kicks in. This seems like a very odd picture to me, but I guess it's an example of the way that a lot of people (including some scientists) think about brain research. If so, this would help explain why even people who should know better find that irrelevant statements about the brain turn bad explanations into satisfactory ones.

Posted by Mark Liberman at 07:36 PM

Chinese cultural bureaucrat: Vietnamese culture is "superficial" -- for abandoning Chinese characters

An amusing post from almost a year ago at Pīnyīn News, "Vietnamese culture appears shallow without Chinese characters, says Chinese writer", translates part of a speech given in Guangzhou by Chén Jiàngōng (陈建功), the vice president of the Chinese Writers Association:

When I visited Vietnam I learned that the Vietnamese people once used Chinese characters. But because a French missionary invented a romanization method in order to spread Christianity, Vietnamese people gradually began not to use Chinese characters and instead used romanization for their language. In Vietnam, I discovered that their writers’ works all use romanization. Thus, the foundation for Vietnamese culture appears to be extremely superficial.

Well, it's good to be reminded that as ethnocentric as Europeans and Americans can be, they're still far behind the rest of the world in the elaboration of this quintessentially human characteristic.

And in evaluating the arguments about how the large number of homophones make a phonological writing system impractical in Japanese or Chinese, it's instructive to consider the historical experience of Vietnamese and Korean. That's not to say that Japan and China will follow the Vietnamese and Korean examples. But if they don't, it will be for historical/cultural reasons, not logical ones.

[The comments on the Pīnyīn News post suggest that the Vietnamese romanization was actually developed by Portuguese Jesuits -- but as I understand it, and as the wikipedia explains it, the key figure in the development of chữ quốc ngữ was Alexandre de Rhodes, who was in fact French -- even though his system was popularized via a Vietnamese-Portuguese-Latin dictionary. Those Europeans all look alike, anyhow.]

Posted by Mark Liberman at 05:52 PM

Back to Bentolila

In response to Mark Liberman's December 22 post on Alain Bentolila's position on rap music and his reference to my writing on the subject, I am in full agreement with Mark's dismissal of Bentolila's take on rap lyrics and ghetto speech as linguistically unsophisticated. However, I do see something in Bentolila's fear that the world view of rap lyrics discourages engagement with the universal in favor of the particular — but I come at the issue from a different perspective than he does.

Bentolila appears to suppose that the slang usage of a word like GRAVE in both positive and negative connotations suggests an impoverished vocabulary, in which words' usages are stretched beyond coherence because there is so little material to work with. He would likely apply the same "semantic bladder" analysis to Black English's famous use of BAD in the same way (a now antiquated term, but paralleled since with analogous usages such as of STUPID). However, Bentolila would be at a loss to come up with more than a few such cases — that is, there is no systematic use of words in this way.

For some words to switch-hit between the positive and the pejorative is normal in human language in general: MUZHIK in Russian means both "peasant" and "bloke, good ol' guy." This is not evidence of Russian going to the dogs — nor is it evidence of how magically creative Russian speakers are (a forced analysis of Black English's BAD back in the seventies that made my skin crawl even at 11). It's just how language works.

Then Bentolila worries that certain words are used especially frequently, as if this, again, suggests that the vocabulary is shrinking. The offending term most often referred to in English in this vein is FUCK, of course. However, in all languages there are words — such as discourse particles — that are used dozens of times a day by many speakers.

In Saramaccan Creole, the word NOO (pronounced "naw") means, roughly, "then." It is also used as a marker of new information. "NOO the man found out he didn't have any money left." "The red boat NOO, that's the one we need to use." "You, NOO, are his father." When a Saramaccan speaker talks about something in a narrative or explanatory vein, NOO seems, impressionistically, to be every bit as frequent as FUCK is in the speech of a streety American teen. And yet NOO is not "slang" — it's grammar. One could write a whole paper on it (and, as it happens, one is!).

One of the hardest lessons a linguist has to teach is that there is complexity and nuance in even the rudest, most unmonitored of speech. I just finished doing a radio show where a black speaker said "I almost had to get on up out of there" in reference to seeing a movie. A few days ago a relative of mine said "Then what am I doing sitting up here in your house?"

In neither case was anything "up" in the literal sense. This usage of UP is a pragmatic one — i.e. it conveys a speaker's emotional standpoint. To wit, this UP conveys a sense of intimacy upon the location referred to by the speaker. For example, the movie in question was Dreamgirls, and the connotation was that there were many black people in the audience and that their response was spontaneous and warm.

This usage of UP would stump any layman speaker who tried to explain it (I have tried this often) but it is in fact quite systematic -- even when used amidst ample utterances of FUCK and anti-authoritarian sentiment. It is just as much "grammar" as something few educated French people could explain such as what the prefix RE- contributes to the word REPOUSSER, which translates into English as "push" just as POUSSER itself does.

That Bentolila, a linguist, can hear nothing but deficit in the French of the banlieues suggests that there are differences in purview between what linguistics is in France as opposed to the United States.

However, Bentolila's worry that rap lyrics and street speech represent something other than universal views is not insane, in my view. Actually, rappers and their academic fans are of the view that even the ugliest rap lyrics DO express what at least OUGHT to be a universal message: that until the "playing field" is completely level, our job is to wait for a seismic revolution in how America operates rather than teaching people how to make the best of the less-than-perfect.

Yet this is, in terms of how people have made the best of themselves throughout history and how they are doing it worldwide today, a highly PARTICULAR world view. A street folk music that teaches people that there is something sophisticated and insightful about artfully phrased despair and nihilism is, at least, something plenty of people might have less than sunny feelings about.

If in France and the United States it is a more "universal" value to stress self-direction, persistence and ingenuity, then I would agree with Bentolila that all citizens would best be armed with at least a little "universal" to compliment the "authenticity" of the street code. In that vein, I would almost rather read Bentolila's glib musings than those of American intellectuals waxing rhapsodically about how "progressive" rap lyrics are.

Nevertheless, however we feel about the tone of street speech and its use in music, to dismiss it as a tub of semantic bladders is inaccurate, and impossible for anyone who bothers to truly listen.

Posted by John McWhorter at 01:17 PM

The Verbing Man

Mark Liberman's recent triptych on denominal verbs reminds me of a bit of light verse I discovered while doing research in the Proquest Historical Newspapers archive — proof positive that the rampant verbing of nouns was already ripe for satirization 120 years ago. And it's especially appropriate for the holiday season.

The Verbing Man.

"Oh, yes I Christmased," says the man,
Who skips from verb to noun;
I dined and turkeyed à la mode,
And curry sauced in town.

I restauranted everywhere,
I whiskyed, beered and aled;
Cigared I on Havanas rare,
And on Regalias galed.

I New Yeared, too, on viands rich
And I champagned myself;
Or Tomed and Jerryed — can't tell which,
Expenditured my pelf.

I resolutioned on that day,
As spirits throbbed my head;
But when the pangs next panged away,
I just cocktailed instead.

—Texas Siftings.
[reprinted in the Los Angeles Times, Feb. 3, 1887, p. 9]

I hope that all Language Log readers thoroughly enjoy their Christmasing and New Yearing (and Hanukkah-ing, and Kwanzaa-ing, and Chrismukkah-ing, and Chrismahanukwanzakah-ing...).

Posted by Benjamin Zimmer at 10:47 AM

A participle too far?

At the fourth annual Language Log Christmas party, someone seems to have decanted an extra bottle or two of rum into the eggnog, and so we're still catching up on pre-Christmas news. One item that shouldn't pass unnoticed was the introduction of part-of-speech terminology into the national dialogue on Iraq.

You can read the whole story in Sheryl Gay Stolberg's NYT weblog post "A Choice of Tenses in War of Words" (12/18/2006). Tony Snow, the White House press secretary, was trying not to admit that Colin Powell's statement that we're losing contradicts President Bush's statement that we're winning. (It's hard out there for a press secretary.) Stollberg describes the outcome this way:

But as to what the president believes about the present tense question — winning or losing? – Mr. Snow stammered a bit, then suggested this was a matter better left to grammarians than press secretaries.

“It’s one of those things,’’ he said, “where you end up – it all ends up trying to – you’re trying to summarize a complex situation with a single word or gerund, or even a participle.’’

It's not surprising that Mr. Snow is a bit confused about the terminology here. The Cambridge Grammar of the English Language has a section (p. 82) with the heading "A distinction between gerund and present participle can't be sustained". Some highlights:

Historically the gerund and present participle of traditional grammar have different sources, but in Modern English the forms are identical. No verb shows any difference in form ..., not even be. [Thus] we reject an analysis that has gerund and present participle as different forms syncretised throughout the class of verbs. We have therefore just one inflectional form of the verb marked by the -ing suffix; we label it with the compound term 'gerund-participle' ..., as there is no reason to give priority to one or the other of the traditional terms. [...] This grammar also takes the view that even from the point of view of syntax (as opposed to inflection) the distinction between gerund and present participle is not viable, and we will therefore also not talk of gerund and present participle constructions [...].

Tony Snow is not the first right-of-center flack to take up this issue -- I covered William Safire's encounter with gerundology back in 2004 ("To pass into a certain condition, chiefly implying deterioration", 6/30/2004).

But to me -- I'm a phonetician by trade -- the most striking part of the December 18 briefing was not the syntax but the stammering. Tony Snow is a professional broadcaster, who makes his living by talking, and normally exhibits a high level of verbal facility. In other words, the man is stone glib. But the cognitive stress of his Dec. 18 briefing elicited some truly spectacular disfluencies.

Since official transcripts (sensibly) omit most stutters, false starts, and filled pauses, I've provided an unoffical transcript of a sample passage below -- along with a couple of audio clips where a disfluent passage is repeated so that you can hear it clearly.

Q: Can I just come back to Powell one more time? Just to be clear, one of the points of disagreement, we are losing, you disagree with that?
MR SNOW: Again, the President has said before that we are winning.
It- it- look- what- pre- it- what- Colin Powell is saying, we're not winning, so therefore we must be losing,
and then he says, all is not lost.
So I'm just- I'm not gonna- what I am saying is
that we will win and we have to win,
and that's- that's the most important- that's the most im- hah?
Q: You're not disagreeing with him?
MR. SNOW: I'm just- -- I'm not playing the game anymore.
It's just- it's one of these things where - you end up-
y- it- it all ends up trying to be- (( b- )) i- y- you trying to summarize a complex situation with a single word
or gerund,
and uh-
or even a participle.

An audio clip of a larger portion of the exchange is here. Here's the official transcript (with video link that should start you out in the right place -- the White House transcript page gives a link to a video of the whole briefing).

I'm not trying to pick on Tony Snow here. At least not much. The fact is, everybody does this kind of thing, when the communicative terrain gets rough. And the linguistic phenomena of disfluency deserve (and have gotten) deeper study. We can learn a lot about the psychology of speech production in general, and perhaps also learn something about the beliefs and motivations of individual speakers on particular occasions. We still don't know enough, however, for automatic speech recognition to transcribe such disfluencies accurately, or even to add them accurately to a conventional edited transcript from which they've mostly been omitted. (This is a problem that I've worked on a bit, and plan to return to in future research.)

[Additional commentary by Jon Stewart is here.]

Posted by Mark Liberman at 09:06 AM

December 26, 2006

Shakespearing the reader's brain: A tragicomedy in three acts

Act I: Denominal verbs and their discontents.

As Eve V. Clark and Herbert H. Clark observed long ago ( "When Nouns Surface as Verbs", Language 55(4) 767-811, 1979),

People readily create and understand denominal verbs they have never heard before, as in to porch a newspaper and to Houdini one's way out of a closet.

English speakers have been doing this sort of thing since they called themselves Angelfolc -- the verb love started life as Old English lufian, from the noun lufu, and that old denominal beat has kept right on going ever since. But in more recent times, this ancient form of linguistic creativity has started to drive some people up the wall. Earlier this year, Ben Zimmer documented an interesting case from the 2006 Winter Olympics ("Odium against 'podium'", 2/15/2006): "a horrible development"; "very distracting"; "Ugh."; "Arrrrgh. Grrrrr."; "Please don't say 'She can definitely podium' again".

The odium is not reserved for brand-new coinages. Geoff Nunberg recently noted that "fully 98 percent [of the AHD usage panel, which he chairs] disapprove of the use of dialogue as a verb, as in Critics have charged that the department was remiss in not trying to dialogue with representatives of the community." And Prof. Paul Brians agrees that alternatives are better:

“Dialogue” as a verb in sentences like “the Math Department will dialogue with the Dean about funding” is commonly used jargon in business and education settings, but abhorred by traditionalists. Say “have a dialogue” or “discuss” instead.

Paul and the AHD panel are giving you sound advice. But it's worth distinguishing between practical advice about life in the real world, and moral evaluation of the people who make that advice necessary. When I advise you to avoid the districts where you're likely to get mugged, I'm not implying that muggers are justified in plying their trade. Both the AHD panel members and Prof. Brians are aware that the use of dialogue as a verb was not invented by 20th-century bureaucrats. The OED gives

1607 SHAKES. Timon II. ii. 52 Var. How dost Foole? Ape. Dost Dialogue with thy shadow?
1741 RICHARDSON Pamela II. 45 Thus foolishly dialogued I with my Heart.
1817 COLERIDGE Biog. Lit. (1882) 286 Those puppet-heroines for whom the showman contrives to dialogue without any skill in ventriloquism.
1858 CARLYLE Fredk. Gt. I. IV. v. 426 Much semi-articulate questioning and dialoguing with Dame de Roucoulles.

We can add this from Alexander Pope (endnote for his translation of Verse 147 of the VIth book of the Iliad, 1714):

Some may think after all, that tho' we may justify Homer, we cannot excuse the Manners of his Time; it not being natural for Men with Swords in their Hands to dialogue together in cold Blood just before they engage. But not to alledge, that these very Manners yet remain in those Countries, which have not been corrupted by the Commerce of other Nations, (which is a great Sign of their being natural) what Reason can be offer'd that it is more natural to fall on at first Sight with Rage and Fierceness, than to speak to an Enemy before the Encounter? Thus far Monsieur Dacier, and St. Evremont asks humourously, if it might not be as proper in that Country for Men to harangue before they fought, as it is in England to make Speeches before they are hanged. [emphasis added]

And from Samuel Richardson's Clarissa (Vol. 1, Letter XVI): "Will he bear, do you think, to be thus dialogued with?"

But we're dealing with visceral reactions, not historical citations, and apparently there's something about making new verbs that grates on people in a way that other category shifts don't. A classic Bill Watterson Sunday strip illustrates this mildly transgressive bad-boy vibe:

And Paul Brians, again, understands that this is a district where you're likely to be mugged by the people he calls "conservatives":

“Access” is one of many nouns that’s been turned into a verb in recent years. Conservatives object to phrases like “you can access your account online.” Substitute “use,” “reach,” or “get access to” if you want to please them.

And he's also right that in the particular case of access, the noun-to-verb shift is a fairly recent one. The OED's earliest citation is:

1962 A. M. ANGEL in M. C. Yovits Large-Capacity Memory Techniques for Computing Systems 150 Through a system of binary-coded addresses notched into each card, a particular card may be accessed for read and write operations.

A Google books search suggests that antedating back to 1949 may be possible. But again, this isn't about antiquity, it's about antipathy. Or maybe it's about some other kind of impact: to be precise, the P600.

________________________________________________________________________________________________

Act II: The Shakespeared Brain

Philip Davis, an English professor at the University of Liverpool, believes that the "shapes that thoughts take" in literature "[have] a dramatic effect at deep levels". So, according to his article "The Shakespeared Brain", The Reader #23, Fall 2006.:

I took this hypothesis - about grammatical or linear shapes and their mapping onto shapes inside the brain - to a scientist, Professor Neil Roberts who heads MARIARC (the Magnetic Resonance and Image Analysis Research Centre) at the University of Liverpool. In particular I mentioned to him the linguistic phenomenon in Shakespeare which is known as ‘functional shift’ or ‘word class conversion’. It refers to the way that Shakespeare will often use one part of speech - a noun or an adjective, say - to serve as another, often a verb, shifting its grammatical nature with minimal alteration to its shape. Thus in Lear for example, Edgar comparing himself to the king: ‘He childed as I fathered’ (nouns shifted to verbs); in Troilus and Cressida, ‘Kingdomed Achilles in commotion rages’ (noun converted to adjective); Othello ‘To lip a wanton in a secure couch/And to suppose her chaste!’ (noun ‘lip’ to verb; adjective ‘wanton’ to noun). The effect is often electric I think, like a lightning-flash in the mind: for this is an economically compressed form of speech, as from an age when the language was at its most dynamically fluid and formatively mobile; an age in which a word could move quickly from one sense to another, in keeping with Shakespeare’s lightning-fast capacity for forging metaphor. It was a small example of sudden change of shape, of concomitant effect upon the brain. Could we make an experiment out of it?

Well, of course they could:

With the help of my colleague in English language Victorina Gonzalez-Diaz, as well as the scientists, I designed a set of stimuli – 40 examples of Shakespeare’s functional shift. [...] It is not Shakespeare taken neat; it is just based on Shakespeare, with water. But around each of those sentences of functional shift we also provided three counter-examples which were shown on screen to the experiment’s subjects in random order: all they had to do was press a button saying whether the sentence roughly made sense or not. Thus below A (‘accompany’) is a sentence which is conventionally grammatical, makes simple sense, and acts as a control; B (‘charcoal’) is grammatically odd, like a functional shift, but it makes no semantic sense in context; C (‘incubate’) is grammatically correct but still semantically does not make sense; D (‘companion’) is a Shakespearian functional shift from noun to verb, and is grammatically odd but does make sense:

A) I was not supposed to go there alone: you said you would accompany me.
B) I was not supposed to go there alone: you said you would charcoal me.
C) I was not supposed to go there alone: you said you would incubate me.
D) I was not supposed to go there alone: you said you would companion me.

They recorded electroencephalographic signals (EEG) from their subjects, time-registered with presentation of the stimuli, and looked for N400 and P600 event-related potentials (ERPs). The N400 component is a negative-going effect about 400 milliseconds after stimulus presentation, believed to result from "semantic anomaly" -- content that doesn't make sense (van Berkum et al. 1999). The P600 component is a positive-going effect about 600 milliseconds after stimulus presentation, believed to "[reflect] difficulty with syntactic integration processes" -- grammatical shapes that don't fit together (Kaan et al. 2000).

The results were exactly what the ERP literature of the past couple of decades would predict (quoting Davis' Reader article):

(A) With the simple control sentence (‘you said you would accompany me’), NO N400 or P600 effect because it is correct both semantically and syntactically.
(B) With ‘you said you would charcoal me’, BOTH N400 and P600 highs, because it violates both grammar and meaning.
(C) With ‘you said you would incubate me’, NO P600 (it makes grammatical sense) but HIGH N400 (it does not make semantic sense).
(D) With the Shakespearian ‘you said you would companion me’, HIGH P600 (because it feels like a grammatical anomaly) but NO N400 (the brain will tolerate it, almost straightaway, as making sense despite the grammatical difficulty). This is in marked contrast with B above.

Where are Davis and his colleagues going with this?

This is a small beginning. But it has some importance in the development of inter-disciplinary studies – the co-operation of arts and sciences in the study of the mind, the brain, and the neural inner processing of language felt as an experience of excitement, never fully explained or exhausted by subsequent explanation or conceptualization. It is that neural excitement that gets to me: those peaks of sudden pre-conscious understanding coming into consciousness itself; those possibilities of shaking ourselves up at deep, momentary levels of being.

This, then, is a chance to map something of what Shakespeare does to mind at the level of brain, to catch the flash of lightning that makes for thinking. For my guess, more broadly, remains this: that Shakespeare’s syntax, its shifts and movements, can lock into the existing pathways of the brain and actually move and change them – away from old and aging mental habits and easy long-established sequences.

I applaud sincerely. So far, though, the experiments just show us what we already knew: semantic anomalies produce N400 ERP effects; syntactic anomalies produce P600 ERP effects; and unfamiliar denominal verbs are syntactically anomalous but semantically coherent, so they produce P600s but not N400s.

I'm impressed that Davis has been inspired to study Shakespeare's language with the concepts and tools of cognitive neuroscience. And I'm especially impressed that his presentation of the issues, the experimental design, and the experimental results is clear and careful.

But I'm also disappointed that Davis hasn't tried to analyze Shakespeare's language linguistically -- that is, in terms of careful reasoning about about relations between form and meaning. He got as far as the concept of "'functional shift' or 'word class conversion'" -- noun into verb -- but that's the start of the analytic process, not the end. Back in 1979, Clark and Clark suggested a particular theory about how unfamiliar denominal verbs get their semantic coherence:

Our proposal is that their use is regulated by a convention: in using such a verb, the speaker means to denote the kind of state, event, or process that, he has good reason to believe, the listener can readily and uniquely compute on this occasion, on the basis of their mutual knowledge, in such a way that the parent noun (e.g. porch or Houdini) denotes one role in the state, event, or process, and the remaining surface arguments of the denominal verb denote others of its roles.

And Google Scholar knows about 115 later works citing Clark and Clark 1979. However, if Prof. Davis is like other literary scholars, the whole field of linguistics since 1950 or so is pretty much alien territory to him, and given the evidence of his interests and insights, that would be a shame.

Davis ends his article on a poetic note. Perhaps, he suggests,

Shakespeare’s art [is] no more and no less than the supreme example of a mobile, creative and adaptive human capacity, in deep relation between brain and language. It makes new combinations, creates new networks, with changed circuitry and added levels, layers and overlaps. And all the time it works like the cry of ‘action’ on a film-set, by sudden peaks of activity and excitement dramatically breaking through into consciousness. It makes for what William James said of mind in his Principles of Psychology, ‘a theatre of simultaneous possibilities’.

This is an attractive set of metaphors for a reader's response to literature. And MEG and fMRI experiments are underway, he tells us, so we can look forward to future results, some of which may indeed tell us something that we didn't already know about Shakespeare's writing or about his effect on us. Maybe Prof. Davis will be inspired by his neurolinguistic insights to venture into linguistic analysis itself. And along the way, maybe we'll learn why so many people are so annoyed by denominal verbs in contemporary English.

But experienced Language Log readers will know that there's another metaphorical film-set here, another theater of possibilities -- the arena of media reaction.

______________________________________________________________________________________________

Act III: O O O O that Shakespeherian Rag—

'Are you alive, or not? Is there nothing in your head?'
But
O O O O that Shakespeherian Rag—
It's so elegant
So intelligent

[T.S. Eliot, The Waste Land, lines 126-130]

The curtain opens, as convention dictates, on a press release: "Reading Shakespeare has dramatic effect on human brain", 12/18/2006. [There's a version with some pictures at "physorg.com".]

The first sentence is a real show-stopper:

Research at the University of Liverpool has found that Shakespearean language excites positive brain activity, adding further drama to the bard's plays and poetry.

What is this "positive brain activity"? The P600 event-related potential, which is routinely evoked by syntactic anomalies, whether dramatic and poetic or banal and prosaic. What's "positive" about it? The electrical polarity of the effect. Or rather, the direction of motion of the averaged ERP signal -- which as I understand the process, is the reverse of the polarity of the measured electric field, since "[t]raditionally, negative amplitudes are ... plotted upwards for EEG data".

The next few sentences of the press release maintain the dramatic intensity:

Shakespeare uses a linguistic technique known as functional shift that involves, for example using a noun to serve as a verb. Researchers found that this technique allows the brain to understand what a word means before it understands the function of the word within a sentence. This process causes a sudden peak in brain activity and forces the brain to work backwards in order to fully understand what Shakespeare is trying to say.

Of course, pretty much every English speaker uses this same technique -- though when usage experts notice, they tell us to stop. And the "technique" of verbing a noun doesn't speed up the understanding of meaning. On the contrary, it doubtless slows it down, relative to the processing of common words used in an ordinary way -- it just does this without triggering the N400 signature of semantic anomaly.

Also, it's misleading to say that "this process causes a sudden peak in brain activity". The "peaks" and "valleys" of ERP components are very small effects, about 1/4 the size of random variations in the EEG signal:

The brain response of a single event (i.e. the signal after the presentation of a single stimulus) is usually too weak to be detectable. The technique usually employed to cope with this problem is to average over many similar trials. The brain response following a certain stimulus in a certain task is assumed to be the same or at least very similar from trial to trial. That means the assumption is made that the brain response does not considerably change its timing or spatial distribution during the experiment. The data are therefore divided in time segments (or "bins") of a fixed length (e.g. one second) where the time point zero is defined as the onset of the stimulus, for example. These time segments can then be averaged together, either across all stimuli present in the study, or for sub-groups of stimuli that shall be compared to each other. Doing so, any random fluctuations will cancel each other out, since they might be positive in one segment, but negative in another. In contrast, any brain response time-locked to the presentation of the stimulus will add up constructively, and finally be visible in the average.

The noise level is typically about 20uV, and the signal of interest about 5uV. If the noise was completely random, to reduce the amplitude of the noise by a factor of n one would have to average across n*n samples. Averaging over 100 segments would therefore reduce the noise level to about 2uV, at least less than half the amplitude of the signal of interest. A further 100 segments (i.e. 200 segments altogether) would reduce the noise level to about 1.4uV.

You can critique the rest of the press release for yourself -- we're moving on to the good part, the media uptake. Here are some hyperlinked headlines, with associated quotes:

Shakespeare used advanced brain theories:

Reading Shakespeare excites the brain and could help stave off old age forgetfulness, research has shown.

The Elizabethan playwright took advantage of theories of brain consciousness to wow the audience in their heads as well as on stage, claim scientists.

Researchers from the University of Liverpool found the unconventional structure and words of Shakespeare's plays and poetry surprises the brain, which produces a sudden burst of activity, or excitement.

They claim it could help keep the brain healthy and lively.

Bard boosts brain, researchers say:

British researchers using modern medical technology have demonstrated what generations of teachers have told generations of students: Shakespeare is good for you.

Reading parts of Shakespeare's plays causes the brain to become positively excited, researchers from the University of Liverpool said in a release Monday.

Tis nobler in the mind to read Shakespeare:

Reading Shakespeare excites the brain in a way that keeps it “fit”, researchers say.

A team from the University of Liverpool is investigating whether wrestling with the innovative use of language could help to prevent dementia. Monitoring participants with brain-imaging equipment, they found that certain lines from Shakespeare and other great writers such as Chaucer and Wordsworth caused the brain to spark with electrical activity because of the unusual words or sentence structure.

Bard's wordgames 'spark mind activity':

READING Shakespeare can excite your brain with stimulating linguistic techniques, according to a new study.

Some school children may claim that reading the Bard's numerous plays and poems is fairly tedious, but researchers from the University of Wales, Bangor, and the University of Liverpool argue that though pupils may not realise it, their brains are becoming excited.

Shakespeare surprises our brains, study:

With Shakespeare, it seems the wordplay's the thing that keeps us big on the bard.

Researchers at the University of Liverpool have found one of Shakespeare's favourite linguistic tricks - throwing odd words into otherwise normal sentences or using a noun as an unexpected verb - surprises the brain in a way that generates a sudden burst of mental activity that actually shows up on a brain scan.

This heightened brain energy, as reported today in the journal The Reader, may be one reason the bard's plays pack such a dramatic punch with audiences, the study suggests.

Shakespeare 'excites the brain':

Shakespeare's works are able to "surprise" the brain by using unexpected and exciting linguistic techniques, according to a new study. [...]

Professor Neil Roberts and Professor Philip Davis, together with Dr Guillaune Thierry from the University of Wales, Bangor, monitored brain responses in 20 people reading Shakespeare using a scanner called an electroencephalogram (EEG).

They found that a technique known as 'functional shift' – where, for example, a noun serves as a verb – allows the brain to understand what a word means before it understands the word's meaning in a sentence.

The Shakespeared Brain:

Literary experts and scientists have joined forces to investigate the effect Shakespearian syntax has on neural pathways. They have discovered that the brain reaches heightened levels of function thanks to the Bard’s unusually powerful use of words. Now more research is taking place to see if this increase in activity is acting as a work-out for the brain which could have lasting beneficial effects.

(Since it's surely a good thing for aging boomers to add daily doses of Shakespeare to their prophylactic intake of blueberries, red wine and fish oil, let's keep it quiet that they can get the same effect from reading about "dialoguing with stakeholders", or hearing about "the current favorites to podium at the Beijing Olympics" -- or for a mega-dose, from choral recitations of The Verbing Man.)

There seems to be something about the Christmas season that inspires the British media to follow a flack down the garden path in enthusing about the application of science to literary analysis. Last year we had the "common phrases used by [Agatha] Christie [which] acted as a trigger to raise levels of serotonin and endorphins, the chemical messengers in the brain that induce pleasure and satisfaction" ("The Agatha Christie Code: Stylometry, Serotonin and the Oscillation Overthruster", 12/26/2005; "The brave new world of computational neurolinguistics", 12/27/2005).

This year, it's that old Shakespearian P600.

Posted by Mark Liberman at 07:37 PM

More royal linguistic movie filth

This is an update to my previous post on the linguistic filth in The Queen. Lock up your children, and prevent them from reading what follows.

Steve Wiley has kindly reminded me of a further episode in the film (I remember it now that he has mentioned it). As explained at the entry for this film at the ScreenIt movie review site, under the heading "Disrespectful/Bad Attitude: Heavy", Prime Minister Tony Blair's wife Cherie is depicted as strongly anti-monarchist, and after she and her husband are dismissed from their brief first meeting with the Queen, Cherie very briefly characterizes the Queen's attitude as "Thank you very much, now fuck off."

Steve continues:

The MPAA's standard, since the inception of the PG-13 rating c.1984 has been to allow one f-word utterance, as long as it is used as a "pure expletive," as it were, rather than explicitly. See the film rating board's pharisaic guideline, followed by a vague and rarely, if ever, invoked exception clause:

A film's single use of one of the harsher sexually derived words, though only as an expletive, shall initially require the Rating Board to issue that film at least a PG-13 rating. More than one such expletive must lead the Rating Board to issue a film an R rating, as must even one of these words used in a sexual context. These films can be rated less severely, however, if by a special vote, the Rating Board feels that a lesser rating would more responsibly reflect the opinion of American parents.

I believe Kirby Dick's 2006 documentary on the subject, This Film Is Not Yet Rated, touches on this subject.

Now that Steven has pointed me to the ScreenIt site, I see that they are not only worried about what they present primly as "Thank you very much, now f*ck off"; they are also worried that children might imitate or be harmed by these other phrases from the script:

bugger it
screwed up
screw up
Where the hell is the flag?
freeloading, emotionally retarded nutters (They apparently don't know that last word; they give it as "natters".)
It's just daft.
bloody fool
bloody madness
old bat
oh, Christ
for God's sake
and two other uses of God

Well, I am not inclined to alter the judgment arrived at the earlier post. There is something wrong with a culture in which large numbers of people apparently feel that children as old as twelve must be protected from hearing people say that something is screwed up, or from witnessing utterances like Where the hell is the flag? in a serious film drama (or even It's just daft, which makes me wonder whether native speakers of English are doing this phrase-watching). I certainly want children protected from harm. But I don't want them kept in a fantasy land of light and fluffy language where no one ever utters a harsh word. It will turn them into... well, emotionally retarded nutters, if you'll pardon the phrase.

Posted by Geoffrey K. Pullum at 03:01 PM

Linguistic filth at the movies

Last Sunday's Los Angeles Times gives capsule reviews of 53 recent movies (either opening, on release, or just out on video) that are assigned to MPAA categories (G, PG, PG-13, R, NC-17). "Language" is cited as one of the reasons for a non-G rating in no less than 38 of the 53. That's 72% (or 75% of the non-G-rated films). What the hell (oops; PG, language) is going on in a culture where for 72% of the movies currently on offer, parental guidance has to be supplied concerning the danger that children might hear a stray swearword or the name of a sex-specific body part?

I stress that it is not just movies for grownups that are being labeled as having language too strong for young ears. It is films for kids. The distribution the different ratings is wildly skewed: the number of NC-17 films on offer is zero, and less than 4% are G. About half the films are rated R, a quarter are PG-13, and 20% are PG, the latter rating being basically the default for movies that are obviously intended for kids. But 73% of the PG films (8 out of 11) are certified as having language problems bad enough to call for parental oversight: cartoons like Flushed Away, comedies starring kids like Unaccompanied Minors, fantasy comedies like Night at the Museum, inspirational sports dramas like We Are Marshall, gentle romances like Sweet Land (all these are plain PG films: children can attend on their own, but parents are advised to give them guidance).

I'll tell you what I think. I think we have drifted away from sensible caution about exposing kids to obscene language, and wandered into irrational superstition about common words and phrases. We have become so anxious about word taboos we are behaving like a culture gone crazy. Our priorities are out of kilter.

I happened to view The Queen twice (by choice; it is extraordinary, and when Helen Mirren walks up to receive her Oscar for best actress I will be there cheering from the Language Log film critics' box). It is rated PG-13 for "brief strong language". I have scratched my head and wondered where in that graceful yet riveting film of family tension, administrative complexity, and governmental protocol there was an episode of language so foul that "parents are strongly cautioned to give guidance for attendance of children younger than 13."

I have come up with only one possibility as to what it must be. There is a scene in which Queen Elizabeth drives her Land Rover solo across a ford on the Balmoral estate and breaks the drive shaft on a rock, stranding her vehicle in shallow water. As she gets out to check under the vehicle, suspecting the worst, she quietly mutters "Bugger" to herself. For this, parents are strong cautioned to consider keeping their twelve-year-olds home. We are going collectively insane.

In Casino Royale a naked man is tied to a chair with the bottom ripped out of it and is tortured by having a knotted rope slammed into his testicles again and again until he howls in agony. The film gets the same rating as The Queen: PG-13. Something is profoundly wrong with our beliefs about the evil powers of everyday language, and with the movie guidance that is being supplied to us.

Posted by Geoffrey K. Pullum at 12:40 AM

December 25, 2006

Holiday headline contest winner

We are pleased to announce the winner of the Language Log award for the best headline of the holiday season. The award goes to the Los Angeles Times for a headline in the Travel section on December 24, 2006, over a Travel Insider story by Jane Engle about how Canada's major airline has banned mammalian pets from passenger cabins as a precaution against medical problems caused by passengers' allergies to cat or dog dander. No linguistic prerequisites; only basic familiarity with Snow White and the Seven Dwarfs is assumed:

Air Canada believes Fluffy invites sneezy, dopey and docs

Congratulations to all involved.

Posted by Geoffrey K. Pullum at 07:07 PM

Merry... umm... Christmas, Will!

I think I heard a sociolinguistically rather neat piece of linguistic education take place on National Public Radio's Weekend Edition Sunday yesterday, Christmas Eve. I don't think it was scripted. Host Andrea Seabrook welcomed Will Shortz to the weekly word puzzle spot with a cheery "Happy Holidays", using what is increasingly thought to be the safe alternative to any politically risky Yuletide allusion. But Will responded in his calm voice with "Merry Christmas."

So then they announced the answer to the previous week's puzzle that young Bakovic got wrong (boy, was he getting the finger-pointing and whispering treatment around LLP this week!), and they did the on-air puzzle session, and Will announced the puzzle for next week, and it was time to say goodbye; and Andrea seemed to have been educated in the intervening minutes, because as she said goodbye she wished Will a "Merry Christmas"!

If she was initially avoiding mention of Christmas, I don't think she had it right. Amid all the appalling things that go on around the world in the name of religious intolerance these days, I really I don't think we need to keep it on our worry list that people might be offended by "Merry Christmas" (or "Happy Christmas"). For heaven's sake (oops! can I say "heaven" without offending you?), if you're concerned about religious freedom all of a sudden, put a Jewish friend up for membership at your golf club. Or send a letter of outrage to a real fan of unAmerican religious intolerance like the repellent Virgil Goode.

A friendly conventionalized Christmas greeting shouldn't ruin someone's day if they happen not to be of a religion that celebrates it. Not even if, like my sister-in-law, they happen to belong to the Jehovah's Witnesses and thus aren't religiously permitted to celebrate Christmas at all. (Since Barbara and I know about Sharon's religion, we simply don't express any specifically Christmas wishes to her or give her presents on the day; but if people who don't know about her religion say "Merry Christmas" to her she doesn't fly into a rage or anything.)

From Language Log Plaza, where you can put up a Christmas tree in the ground-floor lobby, or a menorah, or a whole nativity scene with live farm animals, or any other decoration that pleases you, as long as you respect the fire code and clean up after the animals, I wish you a merry Christmas.

[Non-relevant endnote: By the way, when Will Shortz made his remark that troop / troops (young Eric's answer) was not going to cut it, Will said they were "essentially the same word". What he really wanted was the term lexeme. But he didn't have the terminology at his fingertips, and only his Language Log-reading listeners would have understood it anyway. Knowing what a lexeme is stands in relation to knowledge of linguistics roughly as knowing what inflation is stands in relation to knowledge of economics. But while general knowledge is taken to include basic economics, it is not taken to include basic linguistics.]

Update: People have written to me about the above remarks, of course. And they say very sensible things. One said:

I am Jewish. Merry Christmas doesn't offend me. People assuming that the whole world is Christan does. I understand that statistically you have a good chance of getting it right with "Merry Christmas" and that many see "Happy Holidays" as a lame politically correct alternative. And, I truly appreciate the heartfelt warm wishes intended by the greeting. However, I honestly believe that many folks wish me a merry Christmas without ever pausing to realize that I might not be Christan. That is why I dislike it when strangers go around wishing me -- and each other -- a merry Christmas. Please don't assume that everyone is celebrating the same holidays as you. For me, today (Christmas) is just another day.

Enjoy the holidays...

And another correspondent said:

In regards to your Dec. 25 Language Log post, I'd like to raise the idea that perhaps the "education" going on in Weekend Edition was not "It is always in good taste to wish people of unknown religion 'Merry Christmas.'" Reading your account of what happened, it occurs to me that Ms. Seabrook could have been educated about Mr. Shortz's religion, based on the latter's "Merry Christmas" comment, making her use of "Merry Christmas" at parting more surely appropriate.

As a non-Christian, though wishes of "Merry Christmas" do not ruin my day, they are subtly alienating. I know others who share my feelings. Some people are sensitive to this fact, and I see no reason to make fun of these people.

Sincerely...

I take these sentiments and other similar ones very seriously. But let me repeat, this is not about being Christian: my sister-in-law is a Christian; but she is of the Jehovah's Witness sect, which is forbidden to make any celebratory mention of Christmas or even accept gifts on that day. This is about a very common conventional greeting that derives from times centuries ago when it could be assumed that everyone was celebrating the same Christian holiday. Today this greeting neither implies that the utterer is a Christian nor presupposes that the person greeted is.

I do agree, of course, about the danger of sliding over from mockery of silly excesses of political correctness into chastisement of people who are only trying to be cautious and respectful in their dealings with others. I do not want to follow the media nasties (you know who they are) who seem so happy to complete that slide. The praiseworthy efforts that liberals make to show genuine sensibility to religious diversity are not to be parlayed into further evidence that liberals are traitors.

However, what I would like to see is expression of true sensibility to religious diversity, not irrational worrying over imaginary offense caused by unthinking repetition of Christmas greeting clichés. Let's be serious about challenging religious intolerance. Let's not divert our energies into establishing pointless linguistic taboos.

Posted by Geoffrey K. Pullum at 03:44 PM

Japanese literacy: back to the future again?

Victor Mair sent in an article by Julian Ryall from the South China Morning Post of 12/16/2006, "Japanese forgetting how to write traditional characters", which begins:

So many Japanese are forgetting how to write kanji characters that cultural experts believe the country may eventually scrap the use of Chinese pictograms in favour of the 46 simplified hiragana characters.

Software maker Kanken DS has released a title that enables people to test their knowledge of characters - but was surprised to find that 90 per cent of the 400 people aged between 35 and 40 who took part in a study were unable to recall all the correct number and positioning of strokes for the 1,945 characters that are taught in public schools.

(A version also ran in The Scotsman under the headline "Keyboard may end traditional Japanese way of writing".) Victor's comment:

It's bound to happen, both with KANJI (Japanese) and with HANZI (Chinese), as it already essentially has with HANJA (Korean). The rapidity of character attrition is going to intensify within the next 5-10 years, so swiftly that people -- depending on their outlook -- will be astonished, dismayed, or overjoyed.

Ryall's article attributes the problem to use of cell phone texting and computer keyboards -- but at least in the case of Japan, I wonder if things are really very different now from how they were in a mythical golden age of Japanese kanji-literacy.

According to J. Marshall Unger, Literacy and Script Reform in Occupation Japan, 1996, p. 34:

The first full-fledged nationwide attempt to measure literacy in Japan was the survey conducted in 1948 under the auspices of the Civil Information and Education Section ... [which] involved the testing of about 17,000 Japanese men and women between the ages of 15 and 64 throughout the country. According to Ishiguro Yoshimi, who chaired the survey's Central Planning and Analysis Committee, the survey was of unprecendented scope and rigor not only by Japanese standards but by world standards as well. Although the survey is sometimes cited as proof that the level of literacy of the majority of prewar Japanese was high, it clearly shows that earlier government claims were grossly inflated. It was found that the rate of illitareracy (monmōritsu 'complete inability to read or write') was indeed very low, but it was also concluded that only 6.2 percent of the population were literate in terms of the survey definition, which was liberal.... By today's standards, all the questions were very simple. The ability to write kanji from dictation (kanji no kakitori), which was identified as the single most important skill tested, as found to be "remarkable low" in ALL groups surveyed. Performance was closely correlated with levels of formal education... Finally, the claim that the average Japanese experienced trouble dealing with the media of mass communication, a claim long made by script reform advocates, was deemed proven.

Neustupný points out that a second, smaller survey conducted in 1955-6 by the Ministry of Education produced similar results.

The survey covered subjects aged 14 to 26 in two selected areas, Tokyo and Northern Japan. The percentage of total illiterates in the survey was less than 1% in each of the two areas. On the other hand, those who were considered to "possess no competence in the use of the written language" and were expected to experience serious problems, made up approximately 10% of the Tokyo sample and 15% of the North-East Japan sample. However, another 50% or 60%, respectively, were also judged to lack sufficient competence, and some of these subjects definitely could be classified as functional illiterates (1984, 199)

Unger also cites data from a survey "conducted between December 1945 and January 1946 by the Kanamojikai, which tested 1,452 male and female workers in fourteen factories in and around Tokyo" using "stimulus materials ... that had been published ... in the Japanese press betwen 7 and 20 December 1945. Subjects were asked to give the readings for 20 comparatively uncommon kanji and kanji compounds and to explain the meaning of two sentences in the text, one considered easier than the other." He gives this summary of the results:

The results of the 1945-6 and 1948 surveys doubtless were affected by the war's interruption of education -- but the 1955-6 survey did not show a very different picture. I haven't been able to find any reference to comparable surveys done since 1955-6. It seems hard to believe that no such surveys exist -- if you know of any, please tell me -- but the situation should perhaps be interpreted in the light of this comment by Unger:

That the conservative elements in the LDP who were concerned with these issues worked so hard from 1959 to 1986 to achieve a reversal of perceptions is perhaps the highest compliment that the rōmaji and kanagaki enthusiasts of the 1940s and 1950s were ever paid.

An interesting perspective on the future of the Japanese writing system some from Christian Galan, "Learning to read and write in Japanese (kokugo and nihongo): a barrier to multilingualism?", International journal of the sociology of language, Issue 175-176, 2005. He begins with the premise that

Today, it has been fairly well established that assertions that there are no problems with teaching reading in Japanese schools, or evaluations of the literacy rate of the Japanese population at near 100%, are as much founded on ‘‘myth’’ as the supposed linguistic (or ‘‘racial’’) unity of Japan. In Japan, like anywhere else, there are problems with teaching reading in the schools, and there are various levels of literacy within Japanese society.

He provides a sort of flow-chart of Japanese reading instruction:

While kanji learning is a major part of Japanese children's education, most of it does not take place in the classroom:

...[T]he bulk of the work involved in learning the kanji (and thus learning to read) is relegated from fourth grade on, or even third grade in some schools, to work the children do on their own (Galan 2001: 31–37). The kanji, which up to that point were studied one by one during language classes, now become something the children study almost entirely ‘‘outside school’’. Although one hour a week is still devoted to the kanji, from this point on, school is really only the place where the results of the learning process are verified. ... (The parents, and especially the juku, play an extremely important role at this stage in the learning process).

Even before coming to (pre-)school, many if not most Japanese children have been taught hiragana by their mothers, and of course children of Japanese mothers will already know the language well. Galan juxtaposes the Japanese system of reading instruction with another dimension of Japanese life:

The Japanese statistics bureau’s projections, based on current birth rate figures, estimate the population of Japan in one century (2100) at just over half (±67,000,000) the population of today (±127,000,000). Whatever general solution the Japanese authorities decide on to resolve the impending shortage of workers (projections show 1 retired person forevery 1.5 workers in 2050) (Sōmushō 2000), it is hard to see how they could avoid resorting to foreign immigration on some scale. This is a topic of great debate in Japan, and should the country ever implement an active immigration policy, it would certainly be highly controlled and regulated.

However, as Galan already pointed out,

The Japanese school system is currently set up to ‘‘teach reading’’ to Japanese children who are born of Japanese parents and raised in Japan, who speak Japanese from birth and live in an environment in which the only language spoken, heard and written is Japanese.

Despite this, the system has apparently never done a very good job of teaching the requisite list of kanji to the population as a whole. At least, it didn't succeed very well in the 30s, 40s and 50s, according to the surveys, and if the Kanken results are to be trusted, it's not succeeding very well now. It's hard to imagine a system of immigration, however controlled, that could overcome the educational problems created by insisting on the continued large-scale use of kanji.

In fact, there is a confluence here of practical educational problems with the more general problems of group identify that make the issue of immigration so contentious. Galan quotes Unger 1987 in support of an obvious generalization: ‘‘The Japanese attachment
to kanji is intimately tied to the shared experience of mastering a complex body of knowledge that defines group membership.’’

Linguistic pluralism, assimilation, integration through language, internationalization . . . no matter how the linguistic situation in Japan evolves in the future, or how the country’s leaders try to make it evolve, theses leaders will not, in our opinion, be able to avoid reassessing the issue of their writing system. We fully agree with Unger’s statement that ‘‘Japanese society may have turned its back on script reform for the time being, but the underlying issues have not gone away.’’

Perhaps it will be possible to finesse the script-reform aspects of this problem by allowing the baneful influence of cell-phone texting and other modern innovations to accomplish changes de facto, without ever agreeing that the current writing system is in any essential need of reform. Then the cultural problems of immigration and its impact on nihonjinron can be engaged without further loss of face.

[Experienced Language Log readers should immediately wonder whether Julian Ryall's description of the alleged results of an (unsourced) study, associated somehow with a company's efforts to sell video games that test the knowledge whose deficiency the study allegedly uncovered, might be misleading or even completely fabricated. We have no special reason to mistrust Mr. Ryall, though his use of the term "pictrograms", and his misconstrual of the name of the game "KanKen DS" for the name of the company selling it ("Rocket"), suggest a certain lack of background for writing about this topic. However, it's a good general rule never to believe what you read (at least about scientific topics construed broadly) in the traditional media, which lack any mechanism to enforce the elementary standards of accuracy and accountability that we take for granted in the blogosphere.

For example, it's possible that the study simply showed that 90% of subjects (from what sample?) missed at least one character in a test of (what subset of?) the 1,945 jōyō kanji. Depending on how such an experiment was conducted, it might not mean much. And then again, maybe there was no actual experiment at all, but just someone's guess about how such an experiment would probably come out if someone were to do it. We've seen plenty of stories of both kinds widely and prominently printed in the world's papers... see here, here, here for a few examples. Then again, the study that Ryall cited (or rather, alluded to) might well support the conclusions implicit in his article. As a result of the almost complete lack of journalistic standards with respect to such things, we just can't tell.

If you know anything more about this game, the "study" alluded to, or the current state of kanji knowledge among the Japanese, please let me know.]

[Update -- Matt at No-Sword has found the Rocket Co. press release, and confirms my suspicions:

What seems to be Rocket Company's press release about the survey does not mention any actual testing. The questions are more along the lines of "Do you think your kanji skills have weakened in the past few years?" and "Do you have less occasion to write kanji than you used to?" and "Do you think that the kids today, they don't learn kanji properly, the way you did when you were their age? If yes, do you also find that they should get off your lawn and/or put a sock in that damn rocks-and-rolls 'music', if you can even call it that?" (I may have embellished that last one.) [...]

So, let's not give this survey more credence than it deserves, which is, "As much as any other opinion poll conducted on behalf of organizations with directly related products and services to sell."

Matt concludes:

Of course, it makes perfect sense that as the need to actually write kanji diminishes, people's ability to write them will go down too. But down to zero? Within five years to a decade? That's either doom-saying, wishful thinking, or straight-up non-sense. Sure, they'll probably continue to get gradually rarer in written documents (you know -- priceless cultural artifacts like shopping lists and post-it notes on computer screens saying "12:30 Tanaka-san called")... but why would people stop using them in electronic documents when the UI itself is a willing scribe?
If you combine handwriting and electronic entry, people's ability to produce kanji one way or another is probably going through the roof -- and isn't that exactly the kind of thing humans invented computers for in the first place?

In this case, I think I'm inclined to agree with Matt more than with Victor. But if the projections about population decline are valid, then the questions about how to make the writing system work in the context of significant immigration may be genuine and serious. ]

Posted by Mark Liberman at 12:51 PM

December 24, 2006

ADS WOTY: Make your nominations

The American Dialect Society's annual "Word of the Year" selection is rapidly approaching. We've already had some WOTY announcements from dictionaries and other organizations (see here, here, and here for coverage), but the ADS event is the granddaddy of them all. Geoff Nunberg recently referred to the ADS selection of WOTY as "the linguistic Oscars," with all the others "merely the Golden Globes and People's Choice awards of lexicography."

The ADS selection process is not open to the public in the way that Merriam-Webster and Dictionary.com have run their online voting this year (both of which simply proved that fans of Stephen Colbert and truthiness will swamp any such competition). However, nominations are being accepted from the public. The announcement reads:

Your nominations are also welcome. Send them to wayne.glowka@gcsu.edu. Remember, the word of the Year is interpreted in its broader sense as a "vocabulary item" — not just single words but phrases can be nominated, too. Nominated terms do not have to be brand new, but they should be newly prominent or notable in the past year, usually by being a part of widespread discussion or importance.

The announcement also links to nomination lists from Wayne Glowka and Grant Barrett. Grant's list has some overlap with his Glossary for 2006, appearing in today's New York Times "Week in Review" section. (For more WOTY-ish discussion, check out the latest Open Source public radio show, featuring Grant Barrett, Geoff Nunberg, and New Oxford American Dictionary editor-in-chief Erin McKean.)

Posted by Benjamin Zimmer at 10:24 PM

Panel discussion

I was generally pleased with Andrew Newman's New York Times piece yesterday about the American Heritage Dictionary's usage panel, of which I bear the august title of chair (or as I like to put it, Chair). Apart from a few minor misquotations, there was only one point -- albeit an important one -- where the article might have left things a unclear: what exactly is the usage panel for?

As the article explains the panel, it's made up of "200 established writers, artists and thinkers":

Every year, panelists complete a questionnaire with a number of emerging and evolving linguistic issues. For instance, whether "domestic partners" is an acceptable term for same-sex couples (75 percent approved) or whether "factoid," as in "each issue of the magazine begins with a list of factoids," is acceptable (only 43 percent approved). . .

Their tallies are cited in more than 500 usage notes that accompany the dictionary’s definitions and are online at yourdictionary.com. [They're also available at Bartleby.com -- GN]

But can a panel whose vote is often close really be relied on to pick the season’s hottest intransitive verbs?

Not necessarily, said Erin McKean, editor in chief of United States dictionaries for Oxford University Press, which publishes The New Oxford American Dictionary. Ms. McKean pointed out that the panel is often nearly evenly divided.

"Where the usage panel gets less than helpful is when it is split, when it is 49 yea and 51 nay," Ms. McKean said. "Someone who had a bad cup of coffee that morning could have pushed it over to nay."

Such a split does not unsettle Barbara Wallraff, a panel member who writes the syndicated column "Word Court." "It doesn’t mean half of us are right and half are wrong," she said. "It means that educated opinion is divided and you won’t look like an idiot either way. And if you want to be more traditional, that will be pretty clear; if you want to be in the vanguard, that will be clear."

Let me explain why I think Barbara Wallraff got this point right, and Erin McKean got it dead wrong. Linguists and lexicographers who take an assiduously descriptivist approach to usage sometimes find the very idea of the panel uncongenial, as if it were an attempt to establish a kind of academy that would lay down the law on usage matters. And if you take the panel in that way, you'd find its members opinions interesting, as Erin does, only when they speak with more-or-less a single voice, decisively ruling a usage in or out as "correct English."

That may in fact have been what the American Heritage company had in mind when it published the first edition of the dictionary back in 1969 as a reaction to the "permissive" Webster's Third. But over the last few decades -- really, since Houghton Mifflin acquired the dictionary in the early 80's -- we've thought of the panel simply as a source of valuable information about the linguistic attitudes of a selection of well-known writers, editors, linguists, and others those who take a professional interest in language.

The idea -- and I would assume this is unexceptionable -- is that this is the sort of information that a dictionary user ought to be provided with, and that it can't always be deduced simply from the facts of usage. It may be, for example, that the majority of educated writers use enormity these days to mean simply "great magnitude," but a writer might also want to know that there are many people who still feel the word should be reserved for things of particular horror or monstrousness, or at least who restrict the word to that meaning in their own usage. (I'm in the latter group, for what it's worth.) And while of course you can simply make that point in a usage note by saying "some people insist that such-and-such word should only be used to mean such-and-such" or the like -- most dictionaries do that -- those reports don't give you any idea of how widespread or insistent the objections are. That's where the panel's votes can come in handy.

For example, it can be instructive to know that only 29 percent of the panelists have a problem with saying "more equal," whereas 59 percent still hold to the older use of enormity and fully 98 percent disapprove of the use of dialogue as a verb, as in Critics have charged that the department was remiss in not trying to dialogue with representatives of the community. Not that those reports are the only information a reader might want in the course of deciding whether to venture the usage in question -- in fact the dictionary's usage notes also provide information about how the item is actually used, the history of the objections, and the linguistic fact that bear on the problem. And often, the panel's votes shed light on the changing acceptability of a particular usage -- as the Times article observed, for example, prioritize was rejected by 97 percent of the panel when the item was first polled back in 1976, but was acceptable to almost half in a survey 20 years later. The idea, in short, is simply to give readers the resources they might want in order to make up their own mind about a controversial usage, and the panelists' opinions are one useful part of that. (For more background the role of the panel, you can read my introductory essay to the dictionary here and can find a list of all the usage notes here.)

A couple of minor points. The Times's editorial process being what it is, there were a few errors and misquotations, which I mention not out of captiousness, but because this is, after all, the linguistic blog of record.

A sidebar to the piece, for example, gives three usage questions that were submitted to the panel and quotes several members about the usages in question. The first asks for the pronunciation of niche, and quotes me as saying "Neesh. What else do people say? Pronunciation is about being as good as your neighbors and not better." A reader might come away from that thinking that I was unaware that people said anything other than "neesh." Since I helped to write the ballot item in question, which asked about three variant pronunciations ("neesh," "nitch," and "neetch") that would be a misapprehension.

A second item gave the question

Is this sentence acceptable? "Members of the League of Women Voters will be manning the registration desk."

The sidebar then quoted me as saying "I wrote that. I'd probably avoid it." That answer might seem puzzling or contradictory if you took the reference of that to be the sentence Members of the League of Women Voters will be manning the registration desk. As it happens, though, what I actually said was that I had written

Is this sentence acceptable? "Members of the League of Women Voters will be manning the registration desk."

Or more broadly still, I wrote both the ballot item on this issue and the 500-word usage note on the use of man that included it, which you can find here. (File under "pronominal reference, pitfalls of, iia. importance of context").

Finally, the article referred to some of the members of the panel as "Mr. Nunberg's choices." Actually, the panel members are selected by the editors of the dictionary, under the direction of Joe Pickett -- I have input, but don't make the decisions myself.

Posted by Geoff Nunberg at 12:31 PM

The ghost of Christmas past, and the entropy of (C)han(n)uk(k)a(h)

Merry Christmas to our readers! Some seasonally-appropriate reading from past editions:

"Same-sex Mrs. Santa: 'The semantics are confusing'" 11/27/2003
"'Twas the night before Christmas" 11/24/2003
"A 'Boxing Day Election' -- or not?" 12/5/2004
"Talking animals: Miracle or curse?" 12/24/2004
"Homo Hemingwayensis" 1/9/2005
"For linguists only" 2/4/2005
"Christmas trees and holiday trees" 12/2/2005
"Negation, over- and under-" 12/21/2005
"L(a)ying snow" 12/24/2005
"Zogby: Bill O'Reilly's bitches?" 12/22/2006

In other holiday news, a new survey by Language Log labs has found that Hanukkah is second only to Muammar al-Gaddafi in public spelling uncertainty.

We learned of this problem by data-mining the web. Ignoring case, here are some of the counts:

	hanukkah	hanukah	hannukah	hannukkah	hanukka	hanuka	hannuka	hannukka
Google	24,100,000	1,160,000	1,430,000	85,200	194,000	957,000	125,000	9540
Yahoo	55,900,000	56,600,000	57,200,00	71,200	33,600,000	55,000,000	126,000	2,010
MSN	2,097,292	537,348	159,167	12,823	21,469	39,290	9,031	1,352

	chanukkah	chanukah	channukah	channukkah	chanukka	chanuka	channuka	channukka
Google	461,000	5,380,000	560,000	975	359,000	835,000	3,040	697
Yahoo	33,800,000	38,600,000	33,900,000	1,750	291,000	33,200,000	35,200	1,320
MSN	56,078	767,919	46,153	638	24,662	52,053	4,282	577

(Note that Yahoo is almost certainly doing some curious sort of "query expansion".)

The orthographic background of this problem is discussed in the wikipedia article, from which I learned about Khanike, the "YIVO standard transliteration from the Yiddish and/or Ashkenazic pronunciation of the Hebrew", which has 10,300 Google hits; and also about Robert Siegel's entertaining and informative exploration of the issues on All Things Considered last year.

In our survey results, 31.2% of the American public claimed to know how to spell Hanukkah, while 63.4% said they had no clue, and 5.4% responded that "it's people like you who are ruining Christmas". When we asked those who claimed to know the spelling what it actually is, we got 11 different versions from the 15 people who actually made it though to the end of word. A typical response from the others: "Hey, man, what is this, fifth grade?"

For those who care about such things, the entropy of the MSN distribution is almost exactly 2 bits, corresponding to the amount of uncertainty in four equally likely alternatives.

[Several readers have pointed out that it's strange that the only consistent part in the many common spellings of this word is the vowel sequence 'a u a', which is also the only part that isn't specified by the Hebrew orthography (heth nun vav kaf hey). Others have pointed out that this is completely expected, given the first letter-name itself has the common variants Ḥet, H̱et Khet, Kheth, Chet, Cheth, Het, and Heth. And then there are those who have pointed to additional variants in which the vowels are also altered. like "Hanakah" (13,200 Google hits). Well, as Don Rumsfeld said about the looting of Baghdad, "Freedom's untidy".]

Posted by Mark Liberman at 08:22 AM

December 23, 2006

Originality, expertise and seriousness in action

In a striking demonstration of what Joseph Rago has called the "institutional culture that screens editorially for originality, expertise and seriousness", the Voice of America has now joined a long list of other traditional media organizations in publicizing Louann Brizendine's scientific proof that women talk a lot more than men do. A story by Ted Landphair, under the headline "Now There's Proof: Women are the Gabby Sex", ran on the VOA wire yesterday, Dec. 22, 2006.

In The Female Brain, Dr. Brizendine reports that the average woman utters 20,000 words each day. Men, two-thirds fewer: just 7,000. Of course, some men ... would argue that they also have plenty to say but cannot get a word in edgewise!

This is not just a stereotype, Mr. Landphair hastens to tell us -- it's science, based "[Brizendine's] own study, and more than 1000 others she's examined".

Landphair's article is dated is roughly five months after Dr. Brizendine's book was published, and three months after I called the 20,000/,7000 claim into question in the Boston Globe, and almost a month after her retraction of the claim was published in the Guardian, and about two weeks after her semi-retraction in the NYT Magazine. I won't mention any of the discussion in the blogosphere.

Just for the record, one more time:

Dr. Brizendine has never done any research on this topic, and none of the references cited in her book deal with any relevant research either.
There are many studies that compare how much talking men and women do -- they find small differences, often in the direction of more talk from men.
The fall-back position that "communication events" rather than words were counted does not appear to be based on any empirical research either. Published counts of gestures and facial expression produce essentially the same results that word counts do.

For details, if you want them, consult the links collected here.

I have to agree with Rago that "[p]eople ... like validation of what they already believe", so that traditional media tend to engage in "endless rehearsings of arguments put forward elsewhere" and have "a tendency to substitute ideology for cognition". Oh wait, that's weblogs. Never mind.

All the same, it's a shame that American taxpayers are footing the bill to distribute pseudoscientific urban legends around the world. I suppose it's not our fault, though, since there's no word in English for accountability. Oops, I got it wrong again, that's those other languages like French and Spanish and Hebrew and Japanese. That's blogospheric instantaneity for you -- just one mistaken cliché after another.

For another recent comment on gendered talk -- perhaps lacking Landphair's "originality, expertise and seriousness", but more to the point -- here's Chris Muir's Day by Day for 12/23/2006:

Posted by Mark Liberman at 06:15 PM

These troops, I tell ya

Boy, do I need to work harder on Will Shortz's Weekend Edition Sunday puzzles. I thought for sure that the answer to last week's puzzle was "troop~troops", given this post by Arnold Zwicky from earlier this month. But I was wrong, wrong, wrong: the answer Will had in mind, to be revealed on WESun tomorrow morning, was "these~theses".

I wrote to Will to ask him about the "troop~troops" possibility, and he wrote:

TROOP/TROOPS may work in a technical sense, but because the two words are basically the same, I don't think I'll accept this answer.

My apologies if I led any Language Log readers astray -- please write to me at Language Log Plaza for double your money back on your subscription.

Language Log reader George Pollard gets a free year-long subscription for pointing out how "Unix helps to solve linguistic puzzles":

$ grep "^t\w\w\w\ws$" /usr/share/dict/words
[...many words...]

...and I'm guessing these -> theses? :)

Personally, I like figuring the puzzles out on my own, but clearly that's not really working out for me ...

Posted by Eric Bakovic at 04:47 PM

Read on, imbeciles

Joseph Rago, who is "an assistant editorial features editor at The Wall Street Journal", took over Joseph Conrad's observation that newspapers are "written by fools to be read by imbeciles" and transfers it to weblogs ("The Blog Mob", 12/20/2006). After a litany of complaints about the many faults of blogs, Rago sheds editorial tears over the "lost [journalistic] establishment" that "has over centuries accumulated a major institutional culture that screens editorially for originality, expertise and seriousness". Chris Muir commented in graphical form on the editorial screening part:

(A more charitable interpretation might have given Rago credit for omitting the period on purpose, as an emblem of his grief.)

Eugene Volokh commented in a more substantive vein:

[I]f you asked me whether I'd put more trust in (1) a randomly selected article from a randomly selected newspaper or (2) a randomly selected post on the same topic from a randomly selected blog, I'd probably choose the newspaper. I imagine that the average newspaper writer has somewhat more training in accurate writing, and feels somewhat more pressure to be accurate, than the average blogger.

But I don't read either randomly selected blogs or randomly selected newspapers, and neither does anyone else. And if you ask me whom I'd trust more on coverage of sentencing law and policy, Sentencing Law and Policy or the New York Times, I'd surely choose the blog, since it's written by one of the nation's foremost experts on sentencing law and policy. More broadly, if you ask me whom I'd trust more on news analysis (not so much raw news, but news analysis) related to topics that I'm interested in, I'd probably say bloggers rather than newspapers: On those topics I care about, I'm familiar with who the best bloggers are, and on balance those best bloggers tend to be more expert (and more aware of the danger that if they err, they'll be promptly contradicted) than reporters at even the best newspapers.

And isn't that the way we deal with most media? We love books not because the average book is great, but because we've found the best authors (from our perspective), and their work is great. Likewise, judging blogs by the "average blogger" or even by "most bloggers" makes as much sense as condemning books as boring because 99% of all books will surely bore you.

It's gratifying that Prof. Volokh uses Language Log as a positive example, in his opening paragraphs:

Are blogs bad? Or are they good? Well, are books bad, or are they good? How about newspapers? Conversations?

Some blogs are good, some are bad. A few provide very good reports of breaking specialty news (e.g., How Appealing). Some provide very good expert commentary on topics that few journalists know much about (e.g., Language Log). Some provide very good commentary by thoughtful people (e.g., Virginia Postrel's Dynamist), even outside relatively technical areas. Some provide high-quality selection services, pointing readers to interested sources they might otherwise have missed (e.g., InstaPundit and GeekPress). The overwhelming majority are of no interest to me or to most people — but that's true of books, too, and you don't see me ranting about how books are all tripe or all boring (even though most of them are).

Meanwhile there are some things in Mr. Rago's screed that made me wonder what blogs he reads, if any. For example:

Every conceivable belief is on the scene, but the collective prose, by and large, is homogeneous: A tone of careless informality prevails; posts oscillate between the uselessly brief and the uselessly logorrheic; complexity and complication are eschewed; the humor is cringe-making, with irony present only in its conspicuous absence; arguments are solipsistic; writers traffic more in pronouncement than persuasion . . .

Ironically, I was working on a self-critical post about writing things that are too complex, too didactic and (especially) too ironic, faults that I feel I share with most of the other bloggers that I read regularly.

In any case, I'm glad to see that Mr. Rago is cheerleading for "the technology of ink on paper" and its digital reflections. As we've often had occasion to remark, the traditional media have enormous promise as sources of information, but this promise will remain unfulfilled until journalists can find a way to impose some of the elementary standards of accuracy and accountability that we take for granted in the blogosphere.

Posted by Mark Liberman at 01:11 PM

Foreign Service material

Last Sunday, the NYT Week in Review section had a story (p. 4) on the Foreign Service exam, "a half-day of questions on geography, English usage, history, math, economics, culture and more." Yes, English usage is in it. And apparently will continue to be in the new streamlined exam now under development.

In a box headed "Until Now, You Were Foreign Service Material If ..." there are three sections illustrating what you needed to do to be Foreign Service material: take college-level courses in a long list of subjects; read texts from a list of more than 150 recommended publications, seven of which are given here; and be able to answer dozens of questions like the six provided. English usage turns up in the first two of these: the list of courses begins with "English composition/rhetoric"; and the list of texts includes, oh dear, The Elements of Style. There seems to be no escaping Strunk & White. So those who aspire to be Foreign Service officers should have learned to avoid the passive, the intensifier very, and beginning sentences with linking however, among other things.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:15 PM

December 22, 2006

Zogby: Bill O'Reilly's Bitches?

In yesterday's "Numbers Guy" column, Carl Bialik reported that a Zogby online survey found that nearly a third of Americans are offended when a store clerk wishes them "Happy Holidays" rather than "Merry Christmas." The story was duly picked up by Reuters and a number of papers, blogs, and web sites, particularly those who see retailers' use of the more ecumenical greeting as a sign of anti-Christian persecution by "secular crusaders who want to neuter Christmas" with the ultimate aim of driving Christianity from the public square.

But some eyebrows went up at the wording of the survey item:

Are you very offended, somewhat offended, or not at all offended when a store clerk wishes you a Happy Holiday instead of a Merry Christmas?

As Mark Blumenthal, the author of the estimable Mystery Pollster blog, pointed out to Bialik, "Before you even have heard the subject, before you're asked to process the information, you've been asked three times whether you are offended." Blumenthal added that results were stacked still more by the "asymmetric" format that provides two answers for people who are offended and only one for the people who aren't. (Compare: "Do you find George Bush to be incredibly arrogant, moderately arrogant, or not arrogant at all?") Indeed, it's hard to believe that a third of Americans would have answered yes if the question had been put in a genuinely neutral form like "Do you mind when a store clerk wishes you 'Happy Holidays'?"

Zogby spokesman Fritz Wenzel defended the item: "Zogby International stands by the methodology and results of its polling." But Wenzel and the Zogby people had to know perfectly well how the language of the query would skew the results -- the literature on these linguistic priming effects in polling is extensive, to put it mildly.

If there was any doubt of that, in fact, the Zogby people put it to rest when they included another, even more blatantly loaded item on the same survey: "Are you extremely bothered, somewhat bothered, or not at all bothered by stores that try to be politically correct by wishing customers a Happy Holiday out of fear of offending those who do not celebrate Christmas?" That could serve as a textbook example of a push poll question, whose point is to shape attitudes rather than record them -- if you equate any solicitude for the sensibilities of Jewish, Muslim, or nonbeliever customers with "political correctness," it's no wonder that 51 percent of respondents will answer with "extremely bothered" or "somewhat bothered."

So there's no question these items were intentionally designed to produce a result that would play in the media's familiar "war on Christmas" template and earn the press coverage that leads pollsters to do unsponsored surveys on items like this in the first place.

In fact the questions underscore the specious grammar of "offense" that the right has appropriated from the language of civil rights in order to cultivate a sense of grievance and victimization among its constituencies. You could hear that in the way many in the media contrasted the "happy holidays" result with another Zogby query that showed that 95 percent of respondents said they were not offended when they were greeted by "Merry Christmas" greeting from clerks. As Zogby put it in its December newsletter:

The greetings war heated up last year with many stores opting out of saying "Merry Christmas" for fear of offending customers, opting for the more generic "Happy Holiday." That fear may be unfounded, as our polling shows 95% of respondents say they are not at all offended when a clerk wishes them a "Merry Christmas" -- and only 1% of respondents said they are very offended at that greeting. It may be not saying "Merry Christmas" that is the real offender -- 32% say they are offended when a clerk says "Happy Holiday" and half (51%) say they are bothered by stores trying to be politically correct by using that greeting.

Or as the Christian Post headed its story about the poll: "More Americans Offended by 'Happy Holidays' than 'Merry Christmas.'" Other writers downplayed the objections of Jews, Muslims, and nonbelievers -- note the selective use of only in an article on the Zogby survey in the Baltimore Examiner:

According to a new report from Zogby International, a New York-based opinion research firm, 32 percent of shoppers are offended when a clerk wishes someone "Happy Holidays," while only 5 percent take offense to "Merry Christmas."
. . . Only 32 percent of those polled who identified themselves as Jewish said they took offense to being wished a “Merry Christmas,” with the same going for only 10 percent who identified themselves as non-Christian.

But the "offense" that some Christians report on not being greeted in a way that presupposes they're Christians isn't quite the same sense of injury that some non-Christians might feel when they hear a greeting that does. Or to put it another way, it's one thing to feel offended when a clerk doesn't automatically give pride of place to your religion and another to feel offended when he or she automatically gives pride of place to somebody else's. A generation ago, in fact, nobody would have used "offend" in the first situation at all, much less described the use of "happy holidays" as "insulting to Christian America," the way Bill O'Reilly does. But that was before the culture warriors learned to use the bogus parallelism of "offensiveness" to depict a business's effort to respectfully acknowledge the religious heterogeneity of its clientele as an instance of "discrimination" and "bias" against a Christian majority.

The "War on Christmas" crowd seems to be winning this battle -- over the past year or two, a number of chains, like Wal-Mart, have announced that they'll be returning to the "Merry Christmas" greeting. Still, things aren't about to return to an age of Frank Capra innocence -- for a lot of people, a salutation that used to be merely a cheerful (if sometimes casually insensitive) nod to the seasonal spirit has become a belligerent salvo in the culture wars. That's disturbing even to many thoughtful conservatives. As National Review's Jonah Goldberg noted last year:

Just as it is counterproductive for a secular liberal to take offense at a well-intentioned "Merry Christmas," it doesn't help if a conservative says "Merry Christmas" when he really means "Eat yuletide, you atheistic bastard!"

Indeed, non-Christians aren't the only ones who find the whole flap a little disturbing. As the conservative columnist Cal Thomas observed:

The effort by some cable TV hosts and ministers to force commercial establishments into wishing everyone a "Merry Christmas" might be more objectionable to the One who is the reason for the season than the "Happy Holidays" mantra required by some store managers.

Still, those theological scruples aren't likely to move either the culture warriors or the pollsters who find it expedient to pander to them. In which connection, I offer the first annual LanguageLog Holiday Reader Survey:

How disturbing do you find Zogby International's willingness to trash its reputation for disinterested and scientifically responsible survey design in order to score a few lines of cheap media coverage?

A. They disgust me.

B. Sluts!

C. I'm rather disappointed.

D. Hey, no problemo.

Operators are standing by. Have a happy whatever.

Added 12/23: In an email, Lal Zimman writes:

What's always really gotten to me about this entire debate is that even Christians generally celebrate more than one holiday around this time. Don't these folks want to have a happy New Year too? Or are we only allowed one happy holiday each year?

Actually, that underscores another odd feature of the Zogby survey question:

Are you very offended, somewhat offended, or not at all offended when a store clerk wishes you a Happy Holiday instead of a Merry Christmas?

"Happy Holiday"??? When was the last time you heard anybody say that greeting in the singular? But by ignoring the fact that the word "holidays" is invariably in the plural, the item precludes the conclusion that the greeting is intended to cover New Years, as well. You have to wonder: if Zogby will go to these lengths to load an item just to get some free ink, what wouldn't they do to accommodate a paying sponsor?

Posted by Geoff Nunberg at 10:39 PM

Turkmenbashi is dead

Saparmurat Niyazov, the self-obsessed nutball who ruled as President of Turkmenistan from 1991 to this week, and probably stole two or three billion dollars from its treasury, is dead, thank goodness. Journalists reporting on this development have been reporting correctly that the old fool renamed himself Turkmenbashi, meaning "Turkmen leader", but the way broadcasters are pronouncing it is nowhere near correct. The International Phonetic Alphabet (IPA) representation of how they say it would be [tɚkmɛnbæʃi]. But none of those vowels are correct.

Turkmen is a Turkic language with vowel harmony, which means that vowels within a root usually have either all back vowels or all front vowels. The word that President Niyazov chose as the new name announcing his permanent status as the leader of all the ethnic Turkmen people is a compound formed from two stems, both obeying vowel harmony. It could be accurately spelled Türkmenbaşı in the modern Turkish alphabet. The IPA representation would be [tyrkmenbaʃɯ], or perhaps [tyrkmenbaʃɨ].

The word is a compound in which the first stem is pronounced [tyrkmen], where the [y] is the IPA symbol for the high front rounded vowel heard in French tu, and the second is pronounced [baʃɯ] or [baʃɨ], where the [ɯ] is the IPA symbol for the high back unrounded vowel that is found in Turkish and Thai and various other languages but not in any of the well-known Western European languages, and [ɨ] is similar but a bit more central (Russian has a sound rather similar to the latter).

Not at all an easy word for English tongues to tangle with, since it crucially involves both a high front rounded vowel in the first syllable and a high back or central unrounded vowel in the last, and English has no vowel reminiscent of either of these.

My suggestion would be that we cease to use the name Turkmenbashi. I propose that instead we refer to the late President by the more technically accurate sobriquet, "awful, corrupt, brutal, authoritarian, self-obsessed, little old madman who ruled Turkmenistan for fifteen years and renamed April after his mother and spent millions on a gold-plated statue of himself that revolved so it would always face the sun." It rolls more trippingly off the tongue, don't you think?

Posted by Geoffrey K. Pullum at 08:49 PM

Cultural specificity and universal values?

Alain Bentolila, a linguist at the University of Paris, wrote the recent "Rapport de Mission sur L'Enseignmement de la Grammaire", which the French minister of education cited in announcing his program to increase grammar teaching. (See Heidi Harley's post "French Report: It's lucky Copernicus had grammar", 12/18/2006.) When I do a web search for Alain Bentolila, the second item is an interview from L'Express in October of 2002, which covers some interesting ground. Much of it deals with vocabulary rather than with syntax:

Certes, mon oreille souffre lorsqu'on rate un subjonctif, mais l'essentiel est ailleurs: aujourd'hui, un certain nombre de citoyens sont moins capables que les autres d'exprimer leurs pensées avec justesse: 10% des enfants qui entrent au cours préparatoire disposent de moins de 500 mots, au lieu de 1 200 en moyenne pour les autres. Cela a deux conséquences. La première est que leur pouvoir sur le monde s'en trouve limité. La seconde, c'est que cela les enferme dans un ghetto et favorise un communautarisme croissant. Il existe ainsi en France une véritable inégalité linguistique, qui se traduit par une grave inégalité sociale.

Of course, my ear suffers when someone muffs a subjunctive, but the essential problem is elsewhere: today, a certain number of citizens are less capable than others of expressing their thoughts accurately: 10% of children entering elementary school have the use of less than 500 words, instead of 1,200 on average for the others. This has two consequences. The first is that this limits their power over the world. The second is that this shuts them up in a ghetto and encourages a growing sense of ethnic identity. Thus in France there is a genuine linguistic inequality, which translates into a serious social inequality.

[For the decision to translate "communautarisme" as "ethnic identity", see this discussion, e.g. "Communautarisme means that people classify themselves according to some private attributes instead of feeling that they belong to a whole."]

This business of quantifying word usage and vocabulary size, for different groups, has come up a lot recently. I know that I'm not going to get anywhere suggesting that journalists should ask people to cite a source for numbers like these. Unfortunately, though, the numbers are nearly meaningless otherwise.

I don't mean that there aren't large individual, class and cultural differences in vocabulary size, and I don't mean to suggest that these differences don't matter. But the particular numbers that Bentolila cites in this interview are surprisingly low, and I wonder where they came from and how they were measured.

There's an excellent review of relevant literature available on line: Scott Baker, Deborah Simmons and Edward Kameenui, "Vocabulary Acquisition: Synthesis of the Research". Here's part of what they have to say about it:

In their review of vocabulary acquisition, Beck and McKeown (1991) noted that estimating vocabulary size was probably the oldest type of vocabulary research. Thus, during the 20th century, scores of studies have focused exclusively on estimating vocabulary size. Given the complexity of defining word knowledge (Baumann & Kameenui, 1991), it is not surprising that such estimates have varied considerably. For example, Graves (1986) reported that studies of vocabulary size conducted prior to 1960 resulted in estimates ranging from 2,500 to 26,000 words for typical first-grade students, and from about 19,000 to 200,000 words for university graduate students. These discrepancies were due to lack of specificity regarding (a) differences between words and word families (e.g., is a student who knows the meaning of run , ran , and running credited with knowing one, two, or three words?); (b) definitions of word knowledge (e.g., recognizing the meaning of a word in a multiple-choice question versus producing a definition for the word); and (c) the source used to represent English vocabulary (e.g., dictionaries versus word frequency lists) (Beck & McKeown, 1991).

As researchers began to specify more precisely the parameters of vocabulary knowledge, more accurate and consistent estimates of vocabulary size were generated. For example, Nagy and Anderson (1984) attempted to determine the number of printed words used in English materials in grades 3 through 9 by examining the textbooks, workbooks, novels, magazines, and encyclopedias used in the classroom. Their estimate of 88,533 word families is now widely used as the domain of words that students in grades 3 through 9 can be expected to know.

Beck and McKeown (1991) provided another estimate of the number of words students know by examining recent studies that used more defined criteria following the tradition established by Nagy and Anderson (1984). Through more precise measures, for example, estimates of the vocabulary size for 5- to 6-year-olds dropped from a range of between 2,500 to 26,000 words to between 2,500 to 5,000 words.

Beck and McKeown's estimates of vocabulary sizes for kids who are 5 or 6 years old should be roughly comparable to the estimates that Bentolila gives. But 3,750 (the middle of Beck and McKeown's range) is more than three times larger than the 1,200 average given by Bentolila. My guess is that Bentolila is talking about a technique that simply measured the number of different words used in a given time period (or word count) of transcribed speech -- but I don't know. These numbers are invoked to support important social policy choices, and it seems worthwhile to be careful to make it clear where the numbers come from, and what they mean. (After all, we've seen plenty of recent examples where people seem to invent striking numbers to bolster general conclusions about group differences.)

There is little question that large differences exist -- continuing the quote from Baker et al.:

Even as methodological improvements in vocabulary research have occurred, one unequivocal finding has remained: Students with poor vocabularies know alarmingly fewer words than students with rich vocabularies. For example, Beck and McKeown (1991) discussed a study conducted by Smith in 1941, who reported that high-achieving high school seniors knew four times as many words as their low-achieving peers. Smith also reported that high-achieving third graders had vocabularies that were about equal to those of low-achieving twelfth graders.

In 1982, Graves, Brunetti, and Slater (cited in Graves, 1986) reported a study on differences in the reading vocabularies of middle-class and disadvantaged first graders. In a domain of 5,044 words, disadvantaged first graders knew approximately 1,800 words whereas the middle-class students knew approximately 2,700 words. Using a larger domain of words (19,050), Graves and Slater (cited in Graves, 1986) reported that disadvantaged first graders knew about 2,900 words and middle-class first graders approximately 5,800 words.

However the differences are measured, the usual explanation for their cause has to do with differences in childhood experience and perhaps in child-rearing culture. See the discussion of Hart and Risley's work in this post for a sketch of current theories about this. Another relevant piece of research is Martha J. Farah, et al., ("Childhood poverty: Specific associations with neurocognitive development", Brain Research 1110(1) 166-174, September 2006) -- discussed briefly here -- which found a large difference in language-related cognitive measures (an effect size of about 0.95 for vocabulary and sentence-understanding tests) between between middle SES and low SES African-American girls between the ages of 10 and 13.

Group stereotypes sometimes also enter into this, as Bentolila observes:

Q: Il y aurait une forme de fierté, et même d'identité, à se proclamer inculte?
A: Exactement. L'échec devient un signe de reconnaissance du clan. Autre exemple: dans une classe de CP, dans une ZEP de Villeneuve-Saint-Georges [Val-de-Marne], une enseignante de 21 ans tentait désespérément de faire apprendre le mot «succulent». Un enfant s'est levé et a dit: «Ça, c'est un mot pour les filles.» A 6 ans, cet enfant vit déjà dans un monde coupé en deux, celui où le mot rare est un trésor et celui où il est ridicule.

Q: There would be a kind of pride, and even of identity, in declaring oneself uneducated?
A: Exactly. Failure becomes a sign of clan membership. Another example: in a CP class, in a ZEP of Villeneuve-Saint-Georges [Val-de-Marne], a 21-year-old teacher is trying desperately to teach the word "succulent". A child stands up and says "That's a girl's word". At the age of six, this child already lives in a world cut in two, one where a rare word is a treasure and another where it is ridiculous.

Leading up to this passage, Bentolila offered another anecdote:

Dans une étude récente en Seine-Saint-Denis, on a demandé à des collégiens ce que représentait pour eux la lecture. Plusieurs ont fait cette réponse surprenante: «La lecture, c'est pour les pédés!» Cela signifie que, pour eux, la lecture appartient à un monde efféminé, qui les exclut et qu'ils rejettent. Accepter le livre et la lecture serait passer dans le camp des autres, ce serait une trahison.

In a recent study in Seine-Saint-Denis, they asked schoolboys what reading meant to them. Several gave this surprising answer: "Reading is for faggots!" This means that, for them, reading belongs to an effeminate world, which excludes them and which they reject. To accept a book and to read it would be to cross into the others' camp, it would be treason.

This is reminiscent of the language in Leonard Sax's works about the feminization of education in the U.S., and the need to give schoolboys manlier books to read. I'm sympathetic with the complaint and the concern, but I wish the analysis depended less on evocative anecdotes and more on carefully controlled research.

For a start, it would be nice to have a developmental series of speech samples from large, demographically-balanced samples of children through elementary and secondary school. This would help us start to understand what the situation really is, and (if the collection was properly done) to distinguish between general linguistic impoverishment (to the extent that it exists) and imperfect knowledge of the standard language (which is surely widespread).

Bentolila addresses this question indirectly. First he claims that "les gamins de banlieue" simply lack linguistic resources entirely:

Q: Mais en quoi la pauvreté du vocabulaire favorise-t-elle le ghetto et le communautarisme?
A: Il y a une loi simple en linguistique: moins on a de mots à sa disposition, plus on les utilise et plus ils perdent en précision. On a alors tendance à compenser l'imprécision de son vocabulaire par la connivence avec ses interlocuteurs, à ne plus communiquer qu'avec un nombre de gens restreint. La pauvreté linguistique favorise le ghetto; le ghetto conforte la pauvreté linguistique. En ce sens, l'insécurité linguistique engendre une sorte d'autisme social. Quand les gamins de banlieue ne maîtrisent que 800 mots, alors que les autres enfants français en possèdent plus de 2 500, il y a un déséquilibre énorme. Tout est «cool», tout est «grave», tout est «niqué», et plus rien n'a de sens. Ces mots sont des baudruches sémantiques: ils ont gonflé au point de dire tout et son contraire. «C'est grave» peut signifier «c'est merveilleux» comme «c'est épouvantable».

Q: But in what way does a poor vocabulary encourage the ghetto and ethnic identity?
A: There a a simple law in linguistics: the fewer words one has at one's command, the more one uses them and the more they lose precision. You then have a tendency to compensate for imprecision of vocabulary by conniving with your interlocutors, no longer trying to communicate beyond a small circle of people. Linguistic poverty encourages the ghetto; the ghetto reinforces linguistic poverty. In this sense, linguistic insecurity creates a sort of social autism. When the banlieue kids only master 800 words, when other French children have more than 2,500, there is an enormous imbalance. Everything is "cool", everything is "heavy", everything is "fucked", and nothing has meaning anymore. These words are semantic bladders: they have inflated to the point of meaning everything and its opposite. "C'est grave" (= it's serious, it's heavy, etc.) can mean "it's marvellous" as well as "it's dreadful".

The interviewer raises the obvious objection, which is that these areas of "linguistic poverty" have been the source of much linguistic innovation:

Q: On vous dira que, dans les banlieues, on invente aussi des mots nouveaux qui sont, eux, très précis.
A: C'est de la démagogie! Ces néologismes sont spécifiques des banlieues et confortent le ghetto. L'effet est toujours centrifuge. Les enfants des milieux aisés vampirisent le vocabulaire des cités, mais ils disposent aussi du langage général qui leur permet d'affronter le monde. L'inverse n'est pas vrai. Arrêtons de nous ébahir devant ces groupes de rap et d'en faire de nouveaux Baudelaire! La spécificité culturelle ne justifie jamais que l'on renonce en son nom à des valeurs universelles.

Q: Some will say that in the banlieues they also invent new words, which are quite precise.
A: That's demagogy! Those neologisms are specific to the banlieues and reinforce the ghetto. The effect is completely centrifugal. Children from comfortable backgrounds steal the vocabulary of the cities, but they also control the standard language which allows them to engage the world at large. The inverse is not true. Let's stop getting giddy over rap groups and making them into new Baudelaires! Cultural specificity never justifies renouncing universal values.

John McWhorter has engaged a similar set of issues in his books "Losing the Race: Self-Sabotage in Black America" (2001) and "Winning the Race: Beyond the Crisis in Black America" (2005). However, although John is a linguist (in fact, a Language Log contributor), and he has often argued for the value of teaching the standard language, his emphasis has been on content rather than on vocabulary counts and grammatical analysis. For example, he has argued against rap music on the basis of the attitudes and actions it glorifies and encourages, not on the basis of its deviations from standard English.

And I won't put words in John's mouth, but I bet he agrees with me that it's odd to describe the vocabulary of standard French as embodying "universal values" while other vocabularies are "culturally specific". I mean, if you want universal values, you're talking about English, right?

Of course, being broad-minded here at Language Log, we're happy to allow the French to retain their cultural and linguistic specificity, even though their linguistic insecurity does create a sort of social autism, limiting their opportunities for international communication and forcing them to turn inwards and connive, in a lexically-impoverished idiom, with their narrrowing circle of francophone interlocutors.

Posted by Mark Liberman at 04:51 PM

Another trip down Random Rd.

In my last adventure with the random number 17, I followed it back to Princeton mathematicians in the 1960s and speculated about where I got the idea. My senior thesis adviser Paul Benacerraf was one of three likely sources. Now Mark Kalderon has written from the Department of Philosophy at University College London to fix on Paul:

Paul Benacerraf was my advisor as well. Seventeen was indeed his favorite number and used it in many examples. He was desperately disappointed when his, then, young son reported that his favorite number was eight. (I tried to cheer him up by suggesting that it was a coded representation of seventeen.) Anyway, Paul attributed his obsession with seventeen to Hilary Putnam, his advisor, who had a pseudo-proof that seventeen was the most arbitrary number. I once saw a copy of the proof in Hilary's hand in Paul's office, but I cannot now remember how it went.

Hilary and Paul confirm most of this, though Paul's not so sure that the proof was written down. Paul continues:

But it was surely from him that I got it, although I learned later that it was a well-known fact: True, it was known hereabouts as the Feller Number. According to my story, he frequently said: "Take a number, any number, say, 17." Feller, by the way, lived on Random Road -- a further, if indirect, proof of the randomness of 17. As for Hilary's proof, the one I remember best is this [a proof by cases]: A completely random number can't [=should not] be too large. Say, </=20. After that it's a breeze. Working from below, it clearly can't be 1, nor can it be even, and hence neither 2, 4, 6, 8, 10, 12, 14, 16, 18, nor 20 [proof left to the reader]. 3 is for the Trinity and 5 is too important in base 10 notation; 7 and 11 are lucky, and 9 is a perfect square and hence hardly random. 13 is unlucky; 15 is a multiple of 5; and 19 is too close to 20. That leaves 17 as the only possible candidate. Q.e.d.

If you have a good memory and have been following things carefully, you'll recognize the Putnam proof as a more detailed version of the argument I gave in favor of 17 back in my first posting on the matter. Pretty clearly I got it from Paul, who got it from Hilary, and then things disappear in the mists of time past.

Some notes and additions:

We had mathematicians Kelly and Spivak putting 17 together with yellow pigs in the 1960s, and then Lander creating a club devoted to the randomness of 17 in the 1970s. Now Lance Knodel recalls that the name of Lander's club was, yes, the Yellow Pigs. Cultural continuity!

Once you start looking for 17, you of course find it all over the place. Stalag 17. Band names, including (again) Stalag 17, Heaven 17 (in Burgess's A Clockwork Orange and now in real life), East 17. Alex Baumans suggests that 17 has a special resonance for bands -- though there's the Mile 21 Band, also Level 42, with other significant numbers in their names.

Meanwhile, other candidates vie with 17 for exemplary random-number status. Don Porges says:

I've always heard that 37 is the number that most commonly comes to mind when people are asked for a random number (between 1 and 100, maybe?) I was gratified a few weeks ago when Penn Jillette was on Stephen Colbert, and Colbert asked him "What number am I thinking of?" and Penn quite offhanded answered "37". See video here, 5:00 minutes in...

(Note: Colbert continues with "Close...it was 4".)

Obviously, it's time for someone to do that research.

You could try various ranges. Maybe 1-20 instead of 1-100. Here's Jonathan Ferro writing on the range 1-4:

One of my favorite bar tricks is prepared by writing the numbers "1 2 3 4" with a fat pen on one side of an index card, and the number "3" alone on the reverse. When I have an excuse to be looking through my wallet, or at some other lull in the conversation, I pull out the card and shove it under the nose of an unsuspecting neighbor with the instructions "Pick a number". I can flip over the card to reveal that I predicted his answer far more often than the 25% predicted by pure chance. Out of these four numbers, "3" appears to be "more random" than the others.

Well, 1 and 3 are the odd numbers, and 1 is very special indeed, so that leaves 3. Simple Putnamian reasoning.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:25 PM

December 21, 2006

Dog whistles for linguists

I swear Herb and Jamaal are trying to send a message to linguists. But what?

Maybe they're just trying to catch the attention of linguists, and then the message is coming later. If that's the plan, they're succeeding with me. Here's today's cartoon:

Do you see what I mean? Isn't that bolding in the ultimate panel on the wrong word? I take it to indicate contrastive stress. But for the joke to work the way I think it's supposed to work, the bolding ought to be on the verb know. See, there's a whole lecture in this cartoon about how contrastive stress is introduced by a speaker to distinguish two parallel propositions with a single distinct element, in this case, the different predicates taking 'how much she spent' as a complement, trying to figure out versus know. This cartoon seems odd because the stress is indicated on a non-contrasting element, she, instead. (I don't think there's any other character who she could be referring to that would contrast with my wife). Depending on the direction of the lecture, you could talk about the phonetic properties associated with contrastive stress, or the semantics of the contrasting propositions, or their syntax (island effects appear, I believe). All blogworthy stuff, really, but seemingly based on nothing more than just an accident of inking.

But this is far from the first time H&J have produced this kind of 'accident'.

Just a few days ago, pretty much at the three-week point after the Brizendine kerfuffle, H&J published this:

I nearly posted under the title, 'Noooooooo!' but then thought really, the less said the better. But then this thing today...and see, I've been saving up H&J's for another post on 'whom' someday, because as far as I can tell they are the last bastion of whom in modern-day America, certainly on the funny pages. Check it out:

It's not just one character who says 'whom', so it's not supposed to just be a personality quirk, but rather a whole community of speakers for whom (!) straight-up, prepositionless accusative whom is still au courant. I was going to post about how it really leaps out as a marked/odd usage to me in the context of a cartoon, where the norm is for the dialogue to be represented in fairly colloquial style, lots of contractions and informal speech, even some &*$%&!!# now and then.

And then, also not long ago, this was the Sunday strip:

This one illustrates the nonce formation of a denominal location/locatum verb meaning 'to put/get into a sink' (like to corral, to box, etc.), a process that I've been professionally interested in. Moreover, this particular denominal verb is homophonous with an irregular English verb, sink, meaning 'to descend', past tense sank, participle sunk. The difference between the nonce denominal verb sink and the established irreguar verb sink has been experimentally investigated by Steven Pinker, inter alia. Subjects reliably form the past tense of denominal sink as sinked, not as sank, illustrating the psychological reality of the invisible layer of nominal structure which verbalizes the noun in the denominal verb.

[sink]_V + past = sank
[[sink]_N]_V + past = sinked

So this cartoon is basically another whole lecture (or a whole nother lecture). And of course there was the cartoon from a few weeks ago presenting an opinion about the use of nigger. And a few months ago, there was this cartoon, which has so far occupied several precious hours of my conscious existence:

This one incorporates a version of a semi-famous example sentence as the punchline, illustrating the peculiar problem of parsing a negation inside a too/enough-construction. (For some technical discussion of the properties of these constructions, you could see this paper). I blogged about this cartoon on Heideas here; see the comments for a dramatic debate between readers who blinked at the cartoon (like me) and readers who didn't.

So anyway, I'm starting to think that H&J are trying to tell us something. Or maybe their creator is reading some linguistics on the side and he's just messing with me. I'll let you know if I ever figure it out.

Posted by Heidi Harley at 11:16 PM

17: The Princeton math connection

Mail on 17 as the exemplary random number takes me to mathematicians, especially those at Princeton. As it happens, I have an A.B. in mathematics from Princeton (before I was wooed away by the siren charms of linguistics), so it's entirely possible I picked up the idea there.

(One of the oddities of blogging is that you can post on some topic, get an assortment of responses, post a follow-up, and then get a fresh set of responses going off in some new direction. No doubt my habit of not getting around to posting follow-ups for months or even years -- hey, some topics are evergreen -- means that I'm getting a new audience each time.)

In any case, Douglas Davidson writes to say: "Mathematicians are particularly fond of 17 (going back to Gauss, at least) and are likely to use it as an example." He points me to the wonderfully silly Yellow Pigs page on 17, noting that Hampshire College math professor David C. Kelly gives, or at least used to give, an annual lecture on the number; to further Yellow Pig information on another site about 17s, where it is noted that mathematicians Kelly and Mike Spivak were graduate students together at Princeton in the 1960s and "reportedly got the yellow pig 17 idea in a bar"; and to the Wikipedia page on 17, where the Princeton connection is solidified:

17 is known as the Feller number, after the famous mathematician William Feller who taught at Princeton University for many years. Feller would say, when discussing an unsolved mathematical problem, that if it could be proved for the case n = 17 then it could be proved for all positive integers n. He would also say in lectures, "Let's try this for an arbitrary value of n, say n=17."

Then comes Lance Knobel, reporting that

Thirty years ago I roomed for a year at university with the best mathematician I've ever known. He and some fellow math whizzes, who had all been on the US team for the international math Olympiad, created a club, whose principal tenet was that 17 was the most random number.

What university? Princeton, of course. And the extraordinarily talented roommate? Eric Lander, biology professor at both MIT and Harvard Medical School, founding director of the Broad Institute, and many many other things. Bachelor's in math from Princeton in 1978.

When I was an undergraduate at Princeton (1958-62, omigod), I didn't take any courses from Feller, but there were three professors who might have introduced me to the fabled randomness of 17: Bob Gunning, Paul Benacerraf, and Ray Smullyan, all three of them people with delightful senses of humor (and all three of them wonderful teachers).

Bob Gunning taught the honors calculus course I took in my first semester, thereby leading me to expect, unrealistically, that the rest of the math faculty would be as extraordinary in the classroom (and outside of it) as he was. Here's a piece of a Princeton Alumni Weekly story about his receiving the distinguished teaching award from Princeton three years ago:

Colleagues and students alike praised his skills at explaining difficult mathematical concepts and his devotion to students. "He has a truly unusual ability for exposition of mathematics at any level," according to one colleague. "He has a clear insight into the subject matter and in the capacity of his students to absorb the material. His classes are superbly organized and his lectures have just the right mix of theory, examples and humor."

Gunning's humor was mentioned many times in his nomination letters for the award. "What sets [Professor] Gunning apart are the intangibles," wrote one student. "He is always smiling. He tells horrendously nerdy math jokes that never fail to make everyone laugh."

The philosopher (of mathematics, among other things) Paul Benacerraf was the adviser for my senior thesis in mathematics. A wryly funny man then as now. There's a Wikipedia page.

Mathematician, logician, philosopher, and magician Ray Smullyan is, of course, famous for the good humor of his books on recreational mathematics and logic. He too has a Wikipedia page.

Any one of these three might have planted the random-17 seed. Or I could have gotten it from another student. So hard to tell at this distance in time.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:46 PM

Bad lingo

The 12/15/06 page of Gawker ("daily Manhattan media news and gossip") is a rant (entitled "Bad Lingo") on clichés in media outlets and elsewhere. Some of these (like yo and oy) are just annoying overused expressions, but a few are clearly snowclones, some familiar here (Best. X. Ever. and X-y Goodness), some not yet blogged on here.

The inventory ends with The New Y, "The unkillable grandaddy of them all, a Protean monster capable of adapting to any topic, discussion, situation, or writer." Also the snowclone most discussed in e-mail to me.

(Thanks to Jim Lewis, Matthew Hutson, and Vishy Venugopalan, all of whom pointed me to the site.)

Among the new figures are Made My Y Bleed (where Y is a sensory organ, and the cause of the figurative bleeding is some undesirable experience) and X-gasm.

As for The New Y, since my last posting on it, the mail has been pouring in. Some of it I appended to the earlier posting. But here are some highlights from the weeks following,

Valerie Reed supplied a cascade of The New Y from the pop culture blog The Hater:

If you're Katie Couric, and who am I to say you're not, how would you get people to take you seriously as a newswoman after presiding over segments like "Dress For Your Body Type" and "Today Throws A Wedding" for years and years? Maybe you would tone down on the tanning. Or concentrate on honing your "serious" voice. But the most important thing you would do is find a historical precedent for showing pictures of a celebrity's baby on the news so people won't call your broadcast "infotainment." [YouTube segment from CBS news here]

See, uh, important babies have always been important! It's like Suri is the new Prince Charles (that full head of hair could almost be a crown). Which makes Katie Holmes the new Queen Elizabeth (sorry, Britain), and Tom Cruise the new Prince Phillip.

[picture of Vanity Fair cover here]

And I think this makes Vanity Fair the new paper of record.

Then, from Victor Steinbok, a sighting in PENNumbra, the Penn Law Review:

The same could be said, I think, about the latest Court-related mantra, "judicial independence." Indeed, when it comes to constitutional sloganeering, "judicial independence" might be the new "judicial activism."

And from Benita Bendon Campbell, "100 [years old] is the new 80" in the AARP Magazine.

Then one I found on my own:

One of the things that have changed in the last few years is the number of people saying that lots of things have changed in the last few years. There are more of them, and what they have spotted are trends. Many trends. In fact, Reinier Evans has taken to saying that "trends are the new trend." (Rob Walker, "Trend Wrap", NYT Magazine 12/10/06, p. 26)

Yes, trends are the new trend. That's right up there with "black is the new black", reported on earlier. Inspired by these, I thought to search for "old is the new new" and "new is the new old". Only a small number of the latter, but a huge number of the former, communicating something like 'an old thing is now back in fashion'. Here's a more emphatic version, from Dustin Staiger's Casual Fridays blog:

The Old Old is the New New

I've grown quite fond of my old-style hats. I have a fedora and a willis hat. When I wear these hats I get comments from people (and sometimes stares). Yet, these were the hats everyone used to wear. Now, it seems like a very new thing to do.

Of course trends come and go and come back again. That's nothing new. But it has made me think about how some of the recent trends in marketing are not new, but old. When business became modern, the old way became passé. In our postmodern world, old has become new...

As Gawker said, the thing looks unkillable.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:08 PM

Watch your grammar, comrade

Just in case you needed any further reason to hate grammarians, those bloodless stuffy old innumerate bores who don't know how to kiss, those grumpy old grammar cranks always nitpicking your whiches and thats, and lecturing seraphim on quantifier use:

It turns out that not only was [Stalin] an intellectual, he was a compulsive and professional editor who corrected any manuscript that crossed his desk for style and grammar as well as for ideology.

[In a review of Robert Service's Stalin: A Biography;
originally in The Moscow Times; reprinted in
The Globe and Mail (Toronto), May 5, 2005.]

Oh, dear. Mark up a few violations of arbitrary grammar rules; pick an arbitrary collection of dissidents to be sent to die in the forced labor camps; a good day's work... One more bad role model for professional grammarians. Will it be left entirely to Dan Brown (God bless him) to do something about giving specialists in language a better popular image than the average B-movie mad scientist?

[Thanks to Jila Ghomeshi for the reference.]

Posted by Geoffrey K. Pullum at 12:38 PM

The magic number 17

A while back I considered Christopher Buckley's picking 17 as the number of Eskimo words for snow:

Buckley has obviously pulled the number 17 out of his, um, hat. This number is what you're likely to come up with when you're asked to pick a random number: it's the smallest prime number without any special cultural significance. The numbers 2, 3, 5, 7, and 13 are clearly special; 11 is not quite so special, though it is the number of players on a football team (American or Association), and then you're up to 17.

That elicited a wry comment from John Cowan, plus several queries about whether research had actual been done on the question. I didn't know of any at the time, and I still don't, but I can report on some famous research on a related question.

Cowan got the paradox in my original posting -- I thought about whether I should point it out and eventually decided to just leave it out there -- and elucidated it poetically:

Thanks for explaining to us
readers of Language Log the special cultural
significance of the number 17.

Then came Mike Morrison and Natan Cliffer, asking about research. Cliffer had a special interest:

I lived in Random Hall (a dorm) during my undergrad years [at MIT], and that assertion is part of the mythology of the culture there...

Well, it wouldn't be terribly hard to do research on the question, though just asking your friends to name a random number wouldn't do; you'd need a carefully designed questionnaire, with suitable distractor items ("Name a bright color", that sort of thing). Whether such a study would come up with 17 as the favorite random number is an open question, but it's very likely it would show that people treat randomness as a gradable property of numbers: some numbers are more random than others, in the minds of ordinary people, even though such a view doesn't make sense mathematically.

This is the sort of result that Armstrong, Gleitman (of this parish), and Gleitman got when they investigated people's views of exemplars for various categories that are usually thought of as well-defined rather than gradable: odd number, even number, female, and plane geometry figure. The paper is:

Armstrong, Sharon Lee; Lila R. Gleitman; & Henry Gleitman. 1983. What some concepts might not be. Cognition 13.263-308. [Available on-line here.]

Their study also looked at exemplars of prototype categories that other people had studied: sport, vehicle, fruit, vegetable. Rather than directly asking people to supply exemplars, they replicated two experiments reported on by Eleanor Rosch in 1973 -- one in which people were ask to give a rating to exemplars on a 7-point scale, the other measuring response times in judging whether sentences like "An orange is a fruit" and "An orange is a vehicle" were true or false (the idea here is that less good exemplars of a category take more time to verify as members of that category than good examples do). The results for "odd number" etc. were comparable to those for "fruit" etc. (There was a third set of experiments not directly relevant here.)

AG&G also mention an earlier similar study by Eric Wanner on "prime number".

So if anyone decides to replicate Rosch's experiments with "random number", or just to look at the exemplars people supply for the category "random number", we should expect that they'll find gradability here too. But the final answer doesn't seem to be in yet on 17.

zwicky at-sign csli period stanford period edu

[More here and here.]

Posted by Arnold Zwicky at 12:33 PM

Doing what comes naturally

There've been several interesting posts recently at Pinyin News, including a note from Victor Mair about "Simplified Characters Inside and Outside of the People's Republic of China". A friend of Victor's from Taiwan, who "is vocally opposed to the simplification of characters, decrying the mainland communist bandits as destroyers of Chinese civilization", nevertheless in her own handwriting uses not only PRC-sanctioned simplifications, but also additional more-or-less nonce simplifications, "often seen ... in informal writing". According to Victor:

The same is true of Chinese writers the world over when they let their hair down and do what comes naturally. The simplification of Chinese characters has been going on for more than two thousand years (see, for example, the many simplified forms in the stele inscriptions of the Six Dynasties period and the profusion of simplified characters in the pinghua [“expository tales”] of the Song period).

I should not neglect to observe that there are also numerous unofficial simplified characters in widespread use on the mainland. For example 午 wǔ (“noon” – four strokes) is a common substitute for 舞 wǔ (“dance” – 14 strokes [!]), 江 jiāng (“[Yangtze] river” – six strokes) frequently replaces 疆 jiāng (“border” – nineteen strokes [!!]) in Xinjiang (the name of the Uyghur region in the far west), and so forth.

What does all of this boil down to? In a nutshell, people are not fools. They do not want to waste their lives writing a dozen or more strokes for a single syllable when they can convey the same amount of information in four or five strokes. I contend that the natural process of simplification – without artificial (e.g., heavy-handed government) intervention – inevitably results in the development of a syllabary or an alphabet. In fact, this is what happened with Japanese hiragana and katakana, as well as with the nüshu (“women’s script”) of southwestern Hunan. Absent strong government controls and/or elitist models, the same would happen with mainstream hanzi (“sinographs”) in China, and we even see a tendency toward greater emphasis on phoneticization and de-emphasis on semanticization in the official writing system of the PRC. For instance, 云 yún is used both for “cloud” and “say” (ironically, the graph for “cloud” on the oracle bones started out with the simple form, and the “rain” radical 雨 was only added about a thousand years later with the seal form of the graph), while fā (“emit, occur”) and fà (“hair”) share the same graph. This is not, of course, to mention the hundreds of so-called “letter words” (zimuci) that are creeping into Chinese dictionaries, nor the thousands of English words that are invading Chinese speech and writing.

This is an interesting reversal of (what I take to be) the standard story about the transition from logographic to alphabetic systems.

The standard view: systematic phonological awareness is hard to come by, so that development of phonologically-based orthography is a rare event.

Victor's view: logographic writing systems are so inconvenient, relative to phonologically-based systems, that only massive infusions of prestige and/or coercion can prevent a logographic system from turning over time into a phonological one.

I guess that both of these propositions might be true.

[More on the nature and history of Chinese-character simplification can be found here]

Some earlier LL posts on the forces that keep users of (even alphabetic) spelling systems from doing what comes naturally:

"Writing system reform" 6/4/2004
"More on spelling reform" 6/6/2004
"Delightful chaos" 6/18/2004
"State-ordered dyslexia" 8/7/2004
"More on spelling unreform" 8/7/2004
"Many new rules a little meaningfully" 8/7/2004
"Superfluity and uselessness" 8/8/2004
"Update on the Germanspellingreformoppositionmovement", 8/21/2004
"Unnecessarily unclear and ugly" 12/21/2005
"Spell simply and carry a big stick" 12/21/2005
"Pioneers of word rage" 3/5/2006
"Plain spelling" 11/3/2006
"Partial credit for 'pigeon English': not new in New Zealand" 11/10/2006
"Alarming decline in literacy among publicists and journalists" 11/12/2006
"Wanna: neither slang nor language murder" 11/14/2006

Posted by Mark Liberman at 12:17 PM

December 20, 2006

On all fours

Over at The Volokh Conspiracy, Orin Kerr has been all over "on all fours" ("The origin of 'On all fours'", 12/19/2006; "Running on all fours", 12/20/2006):

One of the legal profession's stranger expressions is that a case is "on all fours" with another case. It means that the former case raises the same facts and legal principles as the latter and is therefore highly relevant as a precedent. You might wonder, what's the origin of the phrase "on all fours"?

Orin cites Michael Quinion's World Wide Words, which offers this explanation:

In the eighteenth century, people started to use to run on all four as a figurative expression to describe some proposition or circumstance that was fair or equitable, well-founded, sturdily able to stand by itself. To be on all four or to stand on all four meant to be on a level with another, to present an exact analogy or comparison with something else (presumably the image is of two animals standing together, both on all four legs, hence in closely similar situations).

But Orin tracks the phrase back in American legal contexts as far as 1798, and discovers that the early uses are all of the form "run on all fours", not "stand on all fours", and suggests that

the context suggests that the visual image is more an animal running alongside the observer than two animals standing next to each other. If an animal is running on all four legs beside you, the thinking might be, it means that it remains close to you and goes where you go.

The OED (it's always a mistake not to check the OED) notes the 19th-century s-addition,

[formerly all four, sc. extremities. The -s was added prob. during the 19th century; not in Johnson 1808.]

invokes a metaphor of the form "not limping = fair or even, not lame", and gives an earlier citation, from a British legal context, which also involves running, and is applied to a comparison:

2. fig. to run on all fours, i.e. fairly, evenly, not to limp like a lame dog. to be, or stand, on all fours: to be even or on a level, to present an exact analogy or comparison (with).

1710 SIR J. ST. LEGER in Somers Tracts (1751) III. 248 Tho' the Comparison should not exactly run upon all four when examined.
1877 Daily Tel. 15 Mar., It must stand on all fours with that stipulation.
1883 Daily News 8 Feb. 3/7 The decision I have quoted is on all fours with this case.

Note that the use of lame with respect to arguments is much older, and surely related:

2. fig. a. Maimed, halting; imperfect or defective, unsatisfactory as wanting a part or parts. Said esp. of an argument, excuse, account, narrative, or the like. Phr. lame to the ground (cf. Antrim & Down Gloss. s.v. Lame ‘A stab of a bayonet which has lamed me to the ground.’).

c1374 CHAUCER Troylus II. Prol. 17 Disblameth my yf ony word be lame. For as myn auctor seyde so sey I.
1390 GOWER Conf. II. 218 The gold hath made his wittes lame.
1531 ELYOT Gov. I. xxv, That the knowlege and contemplation of Natures operations were lame and..imperfecte, if there followed none actuall experience.
1581 J. BELL Haddon's Answ. Osor. 164b, Let us yet helpe his lame Logicke as well as we may.
1604 SHAKES. Oth. II. i. 162 Oh most lame and impotent conclusion.
1634 CANNE Necess. Separation (1849) 287, I will not contend much with him about the proposition, which is lame to the ground.

Perhaps in a legal system in which so many arguments involve an analogy between the present case and some other case, the notion of "(un)sound comparison" needed its own special metaphor, and picked it up from this particular way of talking about an animal (perhaps a horse, which would be running under you, or ahead of you pulling your carriage) that can (or can't) run properly on all four legs.

Outside the legal context, an example of "going upon all four" as an expression for "running well" can be found in Laurence Sterne, Tristram Shandy, Vol. 3, Chap. XXIV (1760):

Tho' the shock my uncle Toby received the year after the demolition of Dunkirk, in his affair with widow Wadman, had fixed him in a resolution, never more to think of the sex,---or of aught which belonged to it;---yet corporal Trim had made no such bargain with himself. Indeed in my uncle Toby's case there was a strange and unaccountable concurrence of circumstances which insensibly drew him in, to lay siege to that fair and strong citadel. ---In Trim's case there was a concurrence of nothing in the world, but of him and Bridget in the kitchen;---though in truth, the love and veneration he bore his master was such, and so fond was he of imitating him in all he did, that had my uncle Toby employed his time and genius in tagging of points,---I am persuaded the honest corporal would laid down his arms, and followed his example with pleasure. When therefore my uncle Toby sat down before the mistress,---corporal Trim incontinently took ground before the maid.

Now, my dear friend Garrick, whom I have so much cause to esteem and honour, ---(why, or wherefore, 'tis no matter) ---can it escape your penetration,---I defy it,---that so many play-wrights, and opificers of chit chat have ever since been working upon Trim's and my uncle Toby's pattern. ---I care not what Aristotle, or Pacuvius, or Bossu, or Ricaboni say,--- (though I never read one of them)--- there is not a greater difference between a single-horse chair and madam Pompadour's vis a vis, than betwixt a single amour, and an amour thus nobly doubled, and going upon all four, prancing throughout a grand drama. ---Sir, a simple, single, silly affair of that kind,--- is quite lost in five acts,---but that is neither here or there. [emphasis added]

In this context, "going upon all four" is kind of like the mid-20th-century expression "running on all eight (cylinders)".

By the way, a vis-a-vis was

1. A light carriage for two persons sitting face-to-face. Obs. exc. Hist.

apparently drawn by a single animal, e.g.

1768 J. BYRON Narr. Patagonia (ed. 2) 230 The common vehicle here is a calash, or kind of vis-à-vis, drawn by one mule only.

So perhaps the metaphor was originally not so much "level" as "running properly, like a sound horse".

[Update -- but Tom Recht observes that Sterne may be talking about wheels, not feet:

I think there's simpler and more concrete reading of Laurence Sterne's "going upon all four" than the one you suggest in today's post (an idiom for "running well"). A single-horse chair has two wheels, a vis-a-vis has four; a single amour is like the first, a nobly doubled one like the second. So I don't think we need to assume any contemporary idiomatic meaning of the phrase at all: Sterne is just improvising a characteristically fanciful analogy between lovers and carriage wheels, rather than using an existing idiom.

The "prancing" part suggested legs rather than wheels to me, but Tom might well be right.]

[And Jim Lewis has yet another version of the metaphor to contribute:

Odd, because I heard a version of the phrase often, when I was doing grad work in philosophy: one problem was said to be "down on all fours" with another. It meant something like 'eye to eye', or 'fair' or 'on equal ground': I always pictured it as a version of being down on one's hands and knees (i.e., when talking to small child, or a dog), thereby leveling the playing field.
I don't know if this is directly relevant to "running on all fours", but it seems close, both in its derivation and its meaning. I'd be curious to know if you can find any examples of "down on all fours" in the relevant data which would indicate that they're connected, and that one gave part of its meaning to the other.

Mary Blockley adds:

An Australian-born Oxford don (b. circa 1920) used "on all fours" in a non-dynamic sense, for an argument that stood firm and level as a well-made table does on a level floor.

Horse's legs, table's legs, wheels, hands-and-knees -- there are lots of possible meanings for "all four(s)", and even more available metaphors, some of which modify or even reverse the sense of the expression. An earlier example "upon all four" that seems to refer to hands-and-knees can be found in John Flavel's 1691 "Planelogia, a succinct and seasonable discourse of the occasions, causes, nature, rise, growth, and remedies of mental errors written some months since, and now made publick, both for the healing and prevention of the sins and calamities which have broken in this way upon the churches of Christ, to the great scandal of religion, hardening of the wicked, and obstruction of Reformation". Here "it runs upon all four" seems to mean something like "it crawls on hands and knees, and thus is slow and inadequate":

I will neither tire my Reader in a foolish chase of such weak and impertinent Arguments as he there produceth, nor yet wholly neglect them, lest he glory in them as unanswerable. And therefore to shew him the fate of the rest, I will only touch his first Argument, which being his Argumentum Palmarium, deservedly leads the Van to all the rest. And thus it runs upon all four. [...]

Your Major Proposition takes the Law in its large complex body, as appears by your 3d page. Your Minor Proposition, which you would confirm by Gal. 3. 12. takes the Law strictly and abstractly, as it is set disjunctly from, yea in opposition to Faith and the Promises, and so there are two sorts of Law in your Argument, and consequently your Argument is fallacious, as all its fellows be, and runs (as I told you before) upon all four.

In Thomas Hardy's The Woodlanders (1887) there's an example of the modern legal meaning outside of the legal context (Vol. III, Chapter VI):

After stating how extremely glad he was to hear that she was better, and able to get out of doors, he went on:

"This is a wearisome business, the solicitor we have come to see being out of town. I do not know when I shall get home. My great anxiety in this delay is still lest you should lose Giles Winterborne. I cannot rest at night for thinking that while our business is hanging fire he may become estranged, or in his shyness go away from the neighbourhood. I have set my heart upon seeing him your husband, if you ever have another. Do then, Grace, give him some temporary encouragement, even though it is over-early. For when I consider the past I do think God will forgive me and you for being a little forward. I have another reason for this, my dear. I feel myself going rapidly down hill, and late affairs have still further helped me that way. And until this thing is done I cannot rest in peace." [..] .

The paternal longing ran on all fours with her own desire; and yet in forwarding it yesterday she had been on the brink of giving offence. While craving to be a country girl again, just as her father requested; to put off the old Eve, the fastidious miss---or rather madam---completely, her first attempt had been beaten by the unexpected vitality of that fastidiousness. Her father on returning and seeing the trifling coolness of Giles would be sure to say that the same perversity which had led her to make difficulties about marrying Fitzpiers was now prompting her to blow hot and cold with poor Winterborne. [emphasis added]

And in Mrs. Humphry Ward's Robert Elsmere (1888) there's a similar example (Vol. II, Book IV, Chapter XXX):

By the time they parted Robert had arranged with his old enemy that he should become his surety with a rich cousin in Churton, who, always supposing there were no risk in the matter, and that benevolence ran on all fours with security of investment, was prepared to shield the credit of the family by the advance of a sufficient sum of money to rescue the ex-agent from his most pressing difficulties. He had also wrung from him the promise to see a specialist in London---Robert writing that evening to make the appointment. [emphasis added]

The meaning of "ran on all fours with" in these examples is clearly something like "matched" or "was congruent with"; but by 1887-8 the phrase seems simply to be a familiar idiom, without any plain indication of its metaphorical source.

In the end, it still seems most likely to me that the basic metaphor here deals with legs, probably of horses (or perhaps pairs of horses?), and has to do with "sound" vs. "lame" arguments or comparisons. But these last two examples certainly do "run on all fours with" Orin's image of "an animal running alongside the observer". ]

[Susan M. Harrelson suggests yet another interpretation:

I am a lawyer, and graduated from law school in 2002. While I was in school, I occasionally heard the phrase, "on all fours [with]." A far more common term was, "the four corners of the document." This was usually used in the sense of having to rely only on what was actually expressed in a contract, rather than going outside it to gather extrinsic evidence of its meaning.

I always thought that "on all fours" was related to "the four corners," and my mental picture is a congruent (in the geometric sense) figure, as in laying a rectangle on top of another and having them touch at all four corners. The image of two dogs squaring off at each other seems bizarre to me, as a way of describing congruence between one case and another. Same with the carriages.

I think the phrase refers to four corners, rather than to four-legged animals, or four-wheeled vehicles, since the concept being described is congruence, rather than stability.

A very creative contribution. Given the clear history of "on all four(s) with" as involving running rather than standing, Susan's interpretion is not a good theory of the origin of this phrase.. However, as an account of its contemporary usage, it does make considerably more sense than talk of dogs, horses and tables. Unfortunately, lexicography is like accountancy: creativity and common sense are not encouraged as methods.

Just for completeness, here's what the OED has for "within the four corners of":

within the four corners of (a document): (emphatic for) within the limits or scope of its contents.

1874 MORLEY Compromise (1886) 37 The spirit of the Church is eternally entombed within the four corners of acts of parliament.

]

Posted by Mark Liberman at 04:12 PM

Truthiness wins another one

As boldly predicted here two weeks ago, the Stephen Colbert-ism truthiness followed up its resounding win as Merriam-Webster's 2006 Word of the Year with a similar victory in the Dictionary.com competition. Since both of these selections were made by tabulating votes in online surveys, this was an easy call. Fans of "The Colbert Report" have proven their mettle in stuffing online ballot boxes ever since they topped the voting to name a Hungarian bridge. In fact, the top seven entries in the Dictionary.com results are all Colbert coinages: following truthiness we find Lincolnish, Wikiality, it-getter, grinchitude, factinista, and superstantial. The Dictionary.com press release actually declares that eight of the top ten words derive from Colbert's show, but I'm pretty sure he can't claim credit for #8-#10: love, sex, and defenestrate. (Perhaps they thought Colbert had something to do with defenestrate, but that word clearly has its own zealous fan base.)

The folks behind Dictionary.com are none too happy about the Colbertization of their contest, as their press release hilariously makes clear:

"We were surprised by how many of our dictionary users are fans of the moderately popular fake political talk show, which is obviously reflected by the words they nominated," said Brian Kariger, CEO of Lexico Publishing Group. "In light of the evident voting irregularities, we are launching an investigation into our electronic vote counting procedures. Something needs to change before the next Word of the Year is chosen to preserve the dignity of this prestigious annual award."

Honestly, I don't think any number of investigations or procedure changes will subvert the power of the mighty Colbert Nation. Resistance is futile!

[Update: SFGate Culture Blog is having its own WOTY competition, with online visitors voting for one of five finalists: carbon-neutral (already selected as the New Oxford American Dictionary's WOTY), decider, hyphy, sectarian violence, and of course... truthiness. In the poll, truthiness is trouncing the other finalists with about 60% of the total votes. Really, why even bother with a poll?]

Posted by Benjamin Zimmer at 02:29 PM

On the track of the squean

Until yesterday, one item from the Zits list of activities unsuitable at a school dance had still not been tracked down anywhere: squeaning. Then Keith Handley found the squean on a fonts website, where a comic-book style font (called MarkerMan) is being distributed that

Includes 5 useful cartoon symbols, leaned from Mort Walker's Lexicon of Comicana and ABC Etcetera: The Life & Times of the Roman Alphabet by A. & N. Humez. From left to right of the bottom row above: the squean (which might float around a drunken character's head)...

The squean is an asterisk with an empty center.

("Leaned from" puzzles me. "Learned from"? "Loaned from"? "Gleaned from"? Or what?)

There's also the phosphene, for a character who's "seeing stars", and three substitutes for swearing -- the grawlix, the jarn, and the quimp -- which can be used on their own or combined with one another and with standard symbols like @#$%*. (The grawlix, a spiral, figures prominently in a "Mother Goose & Grimm" comic strip that Ben Zimmer posted about a while back: "Grimm just said the {grawlix}-word.")

The labels are presumably inventions of Walker's. The Amazon book description tells us:

Written as a satire on the comic devices cartoonists use, the book quickly became a textbook for art students. Walker researched cartoons around the world to collect this international set of cartoon symbols. The names he invented for them now appear in dictionaries.

[Addendum: Dick Margulis writes to say that phosphenes are the "stars" you see when you close your eyes and press against the lids, and the OED pretty much agrees with this. So this one isn't an invention.]

Handley speculates that the Zits cartoonists borrowed the comics term to fill out their list of teen activities. He also wonders if knurling (also on the list) might be used by cartoonists to refer to a kind of cross-hatching. I'm on the case.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:29 PM

Language as music

With Noam Chomsky in a cameo appearance in Zippyland...

But, but... I thought I read somewhere that language is only 7% of the message, while music is 38% and dance is 55%. Maybe I got it wrong. Or maybe it's the 7% that keeps us sane.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:38 AM

Chomsky and the Pope: separated at birth?

As Heidi Harley observes, Gilles de Robien, the French ministry of culture is saying things about grammar that strike many people as rather odd ("French report: it's lucky Copernicus had grammar", 12/18/2006). "Teaching grammar could stop mob violence". Yeah, sure. Dennis Baron imagines Tony Blair chiming in with a tribute to the power of grammar as a weapon, oops, excuse me, as a method in the Global Upholding of Shared Values as a Means to Discourage Unpleasant People: "'Give the enemy a good dose of grammar,' he told a BBC interviewer, 'and they’ll go right to sleep.'"

But in fact, the idea that there are consequential connections between grammar and ethics has been coming up quite a bit recently, even outside of France.

According to Pope Benedict XVI's message for the 2007 World Day of Peace:

Peace is an aspect of God's activity, made manifest both in the creation of an orderly and harmonious universe and also in the redemption of humanity that needs to be rescued from the disorder of sin. Creation and redemption thus provide a key that helps us begin to understand the meaning of our life on earth. My venerable predecessor, Pope John Paul II, addressing the General Assembly of the United Nations on 5 October 1995, stated that “we do not live in an irrational or meaningless world... there is a moral logic which is built into human life and which makes possible dialogue between individuals and peoples.” The transcendent “grammar”, that is to say the body of rules for individual action and the reciprocal relationships of persons in accordance with justice and solidarity, is inscribed on human consciences, in which the wise plan of God is reflected. As I recently had occasion to reaffirm: “we believe that at the beginning of everything is the eternal word, reason and not unreason(4).” Peace is thus also a task demanding of everyone a personal response consistent with God's plan. The criterion inspiring this response can only be respect for the “grammar” written on human hearts by the divine creator. [emphasis added]

Except for the part about God, this way of talking is strikingly similar to recent work by Marc Hauser, Moral Minds: How Nature Designed Our Universal Sense of Right and Wrong. As Marc explains in an American Scientist interview:

I argue that we are endowed with a moral faculty that delivers judgments of right and wrong based on unconsciously operative and inaccessible principles of action. The theory posits a universal moral grammar, built into the brains of all humans. The grammar is a set of principles that operate on the basis of the causes and consequences of action. Thus, in the same way that we are endowed with a language faculty that consists of a universal toolkit for building possible languages, we are also endowed with a moral faculty that consists of a universal toolkit for building possible moral systems.

By grammar I simply mean a set of principles or computations for generating judgments of right and wrong. These principles are unconscious and inaccessible. What I mean by unconscious is different from the Freudian unconscious. It is not only that we make moral judgments intuitively, and without consciously reflecting upon the principles, but that even if we tried to uncover those principles we wouldn't be able to, as they are tucked away in the mind's library of knowledge. Access comes from deep, scholarly investigation.

Hauser's work is framed as a Chomskian interpretation of (some aspects of) John Rawls' 1971 A Theory of Justice. When Marc says "we are endowed", he's thinking as an evolutionary psychologist, and he means the unspoken agent to be our evolved genome -- but he's echoing Jefferson's Deist language in the Declaration of Independence:

We hold these truths to be self-evident: that all men are created equal, that they are endowed, by their Creator, with certain unalienable Rights, that among these are Life, Liberty, and the Pursuit of Happiness.

As for the work by Rawls that inspired Hauser, Mark Johnson described it this way ( "Moral Imagination: Implications of Cognitive Science for Ethics", p. 27):

Contemporary philosophy's obsession with language has led, as one might expect, to strong analogies between morality and the speaking of a language. One consequence of this linguistic emphasis has been the emergence of a metaphor of moral grammar as a way of articulating our traditional notion of practical reason as moral force. In A Theory of Justice John Rawls suggests that constructing a theory of justice (or, more generally, a theory of morality) is akin to constructing a grammar of a natural language.

Rawls argues that in constructing a grammar we seek principles that would account for our intuitive sense of grammaticality, keeping in mind that the principles so formulated might later require criticisms of some of our intuitions about what is grammatical. In moral theory, by analogy, we would search for principes that would generate our considered moral intuitions (i.e., our reflectively considered judgments about what acts are right or wrong in specific kinds of situations), keeping in mind that our confidence in these principles might later lead us to question some of our moral intuitions.

The metaphor of "morality as grammar" has ontological as well as methodological implications, and it's not only in the 20th century that philosophy was obsessed with language. According to the Stanford Encyclopedia's entry for Thomas of Erfurt,

The notion that a word, once it has been imposed to signify, carries with it all of its syntactical modes, or possible combinations with other words, had been around since the twelfth century. What the Modistae did was to posit the origins of the modi significandi [modes of signification] in terms of parallel theories of modi intelligendi (modes of understanding) and modi essendi (modes of being). The result was a curious amalgam of philosophy, grammar, and linguistics. Thomas of Erfurt's De modis significandi became the standard Modist textbook in the fourteenth century, though it has since enjoyed even greater fame later thanks to its misidentification as a work of Duns Scotus. The text appeared in early printed editions of Scotus's Opera Omnia, where it was read and commented upon by later figures such as Charles S. Peirce and Martin Heidegger, whose 1916 doctoral thesis, Die Kategorien- und Bedeutungslehre des Duns Scotus, should have been entitled, Die Kategorienlehre des Duns Scotus und die Bedeutungslehre des Thomas von Erfurt.

The intellectual influence of the modists (otherwise known as "speculative grammarians") was deep and persistent, with recurrences to the present day. This paragraph from the wikipedia entry on the modistae evokes some echoes that are wider and more consequential than odd statements by French culture ministers:

Opposing nominalism, they assumed that the analysis of the grammar of ordinary language was the key to metaphysics. For the modistae, Grammatical forms, the modi significandi of verbs, nouns, and adjectives, indicate deep ontological structure. Roger Bacon inspired the movement with his observation that all languages are built upon a common grammar, a shared foundation of ontically anchored linguistic structures: Grammar is substantially the same in all languages, even though it may undergo in them accidental variations.

However, the history of speculative grammar was not without its bumpy stretches, as charted by the glosses for dunce in the OED:

1. The personal name Duns used attrib. Duns man, a disciple or follower of Duns Scotus, a Scotist, a schoolman; hence, a subtle, sophistical reasoner. ... Obs.
2. A copy of the works of Duns Scotus; a textbook of scholastic theology or logic embodying his teaching; a comment or gloss by or after the manner of Scotus. Obs.
3. A disciple or adherent of Duns Scotus, a Duns man, a Scotist; a hair-splitting reasoner; a cavilling sophist. Obs. exc. Hist.
4. One whose study of books has left him dull and stupid, or imparted no liberal education; a dull pedant. Obs.
5. One who shows no capacity for learning; a dull-witted, stupid person; a dullard, blockhead.

This reputational transformation had taken place by the late 16th century:

1577-87 HOLINSHED Chron. Scot. 461/1 But now in our age it is growne to be a common prouerbe in derision, to call such a person as is senselesse or without learning a Duns, which is as much as a foole.

A long way down for the man that Gerard Manley Hopkins called "of reality the rarest-veinèd unraveller". But whether they were subtle reasoners or dunces, the speculative grammarians of the 14th century live on in the influential ideas of that unlikely pair of contemporary thinkers, Noam Chomsky and Pope Benedict XVI.

[As noted here before ("Chomsky testifies in Kansas", 5/6/2005), Chomsky is in fact not a fan of evolutionary psychology, at least as a theory of the genesis of what he has famously called the "language organ". He's a rationalist, not a nativist, who has speculated that the emergence of language might be "explained in terms of properties of physical mechanisms, now unknown", that would "reflect the operation of physical laws applying to a brain of a certain degree of complexity".]

Posted by Mark Liberman at 10:33 AM

Does Glaswegian really have five different filled pauses?

Robin Lickley writes:

In your blog of May 02 2005, you ask the question above. ["Um, em, uh, ah, aah, er, eh".]

Well, a student of mine has been measuring formants of FPs in Glaswegian speakers for her undergrad honours project.

The answer is that some Glaswegians (in the HCRC map task corpus), do seem to have 4 different FPs: orthographically eh, em, uh and um, with two very distinct vowels. She has also looked at the two English English speakers in the corpus, and 5 Canadians in the DCIEM Map Task corpus. We get different distributions of vowels - in some accents the FP vowel is [e]-like, some [a]-like, some [uh]-like and, so far in these data, not many are very schwa-like, where schwa is taken from long instances of 'the' (not theee).

I recently heard on the radio and kept a recording of another Scottish speaker with a [ai] diphthong in his FPs - [ai] and [aim]. Unfortunately, he has not responded to email.

We will no doubt be presenting these findings and more at DiSS 07 (www.disfluency.org) just before ICPhS, in a paper called, eh...

"Um not so schwa"

How many different filled pauses do you have? If you speak a dialect with some ums and uhs that are interesting either in quantity or in quality -- or you're around someone who does -- send me an audio clip.

Posted by Mark Liberman at 08:34 AM

December 19, 2006

"Those grammarians hate freedom"

Heidi Harley has posted about the French government's promotion of grammar teaching in the schools, with links to reports about the initiative. Among them is a link to Dennis Baron's Web of Language coverage. Baron's blog has a three-paragraph summary of the story. But go to the site for the rest of this page, which quotes various authorities -- British Minister of Schools Jim Knight, U.S. Secretary of Education Margaret Spellings, U.S. Secretary of Defense Bob Gates, George W. Bush, and Tony Blair -- on the controversial plan. The tone of this part of Baron's blog can be judged from two quotations from Spellings:

"Bring formal grammar back to the classroom," she predicted, "and you'll have angry mobs of teachers overturning cars in the parking lot and torching them."

[with reference to Noam Chomsky] "Those grammarians hate freedom," she concluded.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:35 PM

Does Julia Gillard know subjects from objects?

Australian politician Julia Gillard, deputy leader of the Labour Party, speaking about attitudes toward her and party leader Kevin Rudd, remarked:

We have got a long way to go to ensure that the Australian community knows Kevin and I, trusts Kevin and I and wants us to be the prime minister and deputy prime minister of this country.

Columnist Christopher Pearson commented bitchily in The Australian:

It shows she can't tell the difference between the subject and the object of a sentence; when to use I and me.

The issue of whether this alleged grammar error should have been picked up and criticized, or whether Pearson is just an uptight grammar fascist, will not be discussed here; the comments below this post on an Australian group politics blog discusses them thoroughly (the commenters also consider whether Language Log people are a bunch of ugly chain-whipping mofos who think they know better than everyone else etc. etc.). But I do want to make one epistemological remark: it's remarkable that Pearson thinks the quoted material in red above shows that Gillard "can't tell the difference between the subject and the object of a sentence" when in fact all the evidence supports exactly the opposite conclusion.

To defend his claim, Pearson needs to show Gillard saying things like *Me love you, or *Do you love I?. Cases of that sort would really show she can't tell when to use I and when to use me. But he can't do this.

Pearson must know (he surely should, since I was able to find out immediately) that Ms Gillard uses me as the form of the first person singular pronoun when it is a direct object, because she used both of the following sentences in her first speech to Parliament in 1998:

Australia has offered me opportunities that would have been beyond my parents' understanding...

I have only been able to take up those opportunities because of the excellent state education system which flourished in South Australia...

So that settles that: we have relevant evidence available in a very public source showing that of course she knows the difference between subject and object, and of course she uses the nominative form I for a pronoun subject and the accusative form me for a pronoun object, just like we all do.

Now, I think there may be many people who imagine that in a sentence like The Australian community knows Kevin and I we have an occurrence of the pronoun I showing up as an object. We certainly do not. We have the pronoun I showing up as the word following a coordinator in a phrase and I which is the second of two phrases making up the coordination Kevin and I. It is the coordination that is an object. Being a part of a phrase that serves as an object is not at all the same as being an object. Consider I resent the fact that he lied. The object of resent is a noun phrase, the fact that he lied. Inside it is a pronoun. But that pronoun (he) is a subject. It just happens to be inside an object.

So, clearing that possible confusion away, what does Pearson's quote actually teach us? Two things. First, that when a pronoun follows the coordinator and, Ms Gillard invariably uses the nominative (at least for the first person singular she does). Indeed, that first speech to Parliament confirms it again:

My father John and my mother Moira, who is watching from the gallery today, migrated to this country with my sister Alison and I as assisted passage migrants in 1966.

And the second thing we learn from the Pearson quote is that in the only instance of a pronoun that does not follow and, a first person plural pronoun, it is not the subject of a finite clause, and so Gillard chooses the accusative form: us. This confirms again that (like anyone who can say I love you and Do you love me? and get both of them right) she knows subjects from objects perfectly well.

One hundred percent of the evidence we have is accounted for by the following very simple generalization:

Ms Gillard uses nominative forms for
(a) pronouns that are subjects of tensed clauses, and
(b) pronouns that follow the coordinator and.
She uses accusative forms for objects, and everywhere else.

Whether we want to regard it as correct or acceptable in Standard English to use the nominative after and is another matter, and much more difficult to adjudicate. We can say that it's very common; huge numbers of Standard English speakers do appear to follow that rule (see pages 9-10 of The Cambridge Grammar of the English Language for a discussion of this highly controversial point). Shakespeare apparently did (at least, he has one of his characters say between you and I in The Merchant of Venice). But whether people should be following this rule is off the agenda here — like whether Pearson is a stuck-up right-wing snob or whether Gillard is a jumped-up illiterate Welsh immigrant or whether Language Log writers are chain-swinging anti-correctness thugs. Here I'm just making a single point about the use of evidence.

I'm saying it is truly striking that Pearson can get away with saying in Australia's most serious national newspaper that the linguistic evidence reveals "she can't tell the difference between the subject and the object of a sentence; when to use I and me", when in fact the available evidence (including what he quotes) can be shown in a minute to refute that claim completely.

I'm saying I really find it interesting that, when it comes to grammar, people who write for newspapers and magazines feel no need to check their facts or analyses; they just present lofty pronouncements, as Pearson does, and everyone caves.

[I rewrote this a bit on the evening of December 19. Thanks to Linda Seebach and Rob Sears and Christopher Mackay for some feedback that suggested to me I should clarify what I was saying. They may still not agree with me.]

Posted by Geoffrey K. Pullum at 01:32 PM

Toddler eggcorning

Little kids are given to misanalyses (any number of children have been reported answering the command "Behave!" with "I AM being have") and mishearings ("Olive, the other reindeer" in the pestilential Rudolph song), and occasionally they branch out into eggcorning. As, apparently, in this report from Elizabeth Daingerfield Zwicky about her daughter Opal (2 years, 9 months):

Opal has a stuffed jaguar, which she says is "Jaggy the Jagwater". She's very consistent about "jagwater", which apparently makes sense to her in some way "jaguar" doesn't. She doesn't correct me when I say "jaguar" but she gets that patient look...

Oh yes, that look. I have my way and you have yours, and mine's better, but I'm not going to dispute it right now.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:27 AM

December 18, 2006

The Portuguese pidgin that never died?

For some reason, the idea that the world's creole languages all, or mostly, trace back to a 15th century Portuguese trade pidgin language seems to stick in people's minds. Today, a New York Times review of David Crystal's new HOW LANGUAGES WORK reminds us of this tasty factoid, whose memory-friendly nature likely has something to do with its piggy-backing on our school-days familiarity with the Great Explorers, Vasco Da Gama, Magellan, and so on.

Too bad it isn't true.

There indeed was a Portuguese pidgin used on the west coast of Africa starting in the 1400s. It was used in contacts with Africans, and was later adopted by subsequent explorers of the coast such as the English, French, Dutch, Spanish and even Swedes. Varieties of the pidgin were used as far afield as India and today's Indonesia.

In the fifties and sixties, a few linguists proposed that the reason that creoles like Jamaican "patois," Haitian Creole French, Papiamentu Creole Spanish, etc. have interestingly similar grammars is because slaves brought to those colonies had learned the Portuguese pidgin in Africa, and plugged words from the European language of their colony into the Portuguese pidgin's sentence structure.

So, in Haiti, the result would be a French creole, while over in Jamaica, the creole would be English -- but both would share a sentence structure inherited from the Portuguese pidgin. Creole language specialists call this the monogenesis hypothesis.

This idea was a nimble surmise based on the evidence available back in the day. But a great deal of linguistic and historical research has been done since then, and no working creolist has subscribed to the monogenesis hypothesis for over twenty years.

There is no evidence that an appreciable number of slaves spoke Portuguese pidgin -- it was used by African traders but would have been of little use to a typical villager inland. There is no creole that looks anything like a descendant of the pidgin in terms of sentence structure -- rather, the pidgin left behind a few words like SAVVY "to know" and PICANINNY "small" in creoles around the world, just like O.K. is sprinkled around the world from English.

Probably the most summary and conclusive statement that put the monogenesis hypothesis to rest was an article by Morris Goodman in the anthology PIDGIN AND CREOLE LANGUAGES edited by Glenn Gilbert in 1987.

There's a whole passel of ideas as to why creoles' grammar tend to be almost oddly similar. Derek Bickerton of the University of Hawaii has argued that children, deprived of coherent linguistic input by pidgin-speaking parents, spontaneously generated creoles as direct products of our innate mental configuration for Universal Grammar. Others have argued that the native languages spoken by slaves in most colonies were all quite similar themselves, such that the creoles the slaves created are equally similar. As always, the truth is likely a combination of all of the hypotheses bandied about.

That is, of the LIVING hypotheses. The monogenesis hypothesis does not, in the scientific sense, explain or predict anything about how creoles are currently known to pattern grammatically or geographically. It is, at this point, an archival matter.

That as awesome a scholar and writer as Crystal has included the Portuguese pidgin idea in his entry on creoles in HOW LANGUAGE WORKS is, however, not really his fault. There is a kind of ingrained custom in textbooks on pidgins and creoles to genuflectively list the monogenesis hypothesis along with the living ones, such that one pretty much has to be one of the small gang of working creole specialists to know that the monogenesis hypothesis today dwells with the likes of phlogiston and the four humours.

Posted by John McWhorter at 03:58 PM

French report: It's lucky Copernicus had grammar

Newflash from the Times Educational Supplement courtesy of Dennis Baron's language news blog: Beginning next year, two hours per week of grammar will be taught in French schools from elementary grades through high school.

This is only good, of course, from the point of view of us here at the Language Log -- more actual grammatical education means fewer pained posts about how 'God' is not, in fact, a verb -- but some of the discussion surrounding this curricular revision is, frankly, pretty funny. The minister seems to think¹ it's about improving communicative effectiveness, which the TES report says he thinks can reduce the incidence of violence in riot-prone urban areas. [I haven't been able to find the original quote to this effect myself].^†

E. Orsenna, A. Bentolila, G. de Robien et D. Desmarchelier
present the report on grammar - Photo CP

The substance of the pedagogical plan in the report itself, authored by linguist Alain Bentolila of Université de Paris 5, involves recommendations about content for the curriculum at the various levels of instruction, with a great deal made of the 'organized progression' of grammatical concepts, some of it sensible enough (particularly the bit about how example sentences must be carefully chosen and presented so as to clarify rather than obscure a particular point under investigation). A lot is omitted, though -- phonology and morphology, for example, are not even mentioned. And in several places, in amongst the incredibly fanciful high-toned phraseology, bits of just plain silliness flap around:

Le verbe, catégorie reine de la grammaire, donnant à la langue son véritable pouvoir d'explication et d'argumentation. Le verbe qui ouvre les horizons du futur, qui fait resurgir les récits du passé. Comme le français fait bien les choses en nommant de la même façon le mot qui articule la phrase et l'outil linguistique qui articule notre pensée : verbe qui se conjugue, Logos qui impose au monde l'intelligence de l'homme.

"The verb, queen of grammatical categories, giving language its true powers of description and argumentation. The verb, which opens the horizons of the future, which reanimates the stories of the past. How fitting that French names in the same way the word that structures the sentence and the linguistic tool which structures our thoughts: verb which conjugates, Logos which imposes the intelligence of Man on the world."²

...and later:

Dans la phrase «Les maçons ont construit la maison.», l'action qui relie maçons à maison, c'est bien la construction. Si l'on parle de verbes transitifs c'est tout simplement parce que cette action «transite» des maçons vers la maison

"In the sentence The masons constructed the house, the action which relates the masons to the house is of course the construction itself. We speak of transitive verbs simply because the action 'transits' from the masons towards the house." *

The introduction of the report veers lyrically from rhetoric about how making generic universal assertions (like "Dogs bark") invokes a particular kind of personal responsibility, to the importance of grammatical instruction while learning to read, to the observation that a language without the combinatoric power of grammar would be "condemned to infinite multiplication of its vocabulary in a hopeless attempt to cover the immense diversity of perceived and imagined reality." The report opens with an elaborate just-so story about a collection of almost impossibly dense children and a teacher of infinite-resource-and-sagacity who gets them to observe that the change in position of a shadow over the course of a day indicates something about the movement of something or other. It culminates in the observation that without grammar, Copernicus couldn't have communicated his conclusion that the earth orbits the sun, since the words can equally express the notion that the sun orbits the earth.

As noted in this commentary by Sylvia Plane, throughout the report, there is a peculiar conflation of the actual, internalized grammar that is deployed by speakers in any linguistic act, and the study of that grammar, which is what the report is recommending. They imply that studying grammar will have beneficial effects in using one's internal grammar, which is what is supposed to motivate its study. The report claims that an explicit understanding of the mechanisms of grammar facilitates learning to read, communicative effectiveness, and literary study and appreciation. To justify this kind of assertion, the authors of the report should be citing pedagogical studies contrasting the reading test scores of children who had two hours of grammatical instruction per week with the scores of children who spent those two hours in practicing their reading and writing straight up. The report doesn't cite any such evidence, and as far as I know, no research has been done that would support this kind of claim. All the verbiage about the importance of grammatical instruction is subject to refutation, if its functional benefit to the student is presented as the main justification. (The argument is analogous to saying that to be a competent musician, several years of explicit study of music theory is essential -- it could help, perhaps, but it's clearly not essential.)

If you ask me, explicit grammatical instruction should indeed be part of the regular school curriculum, not because of any claim that it improves reading comprehension or any other language-use skill (though, of course, hooray if it does) but because it's an important, fascinating and accessible human science. Despite the report's insistence on the parallelism between the structure of a sentence and the structure of a thought, it never touches on the implications of grammar for cognitive science, nor mentions the revolution in the study of the mind that was precipitated precisely by Chomsky's recognition that human linguistic behavior could not be the result of strictly associative learning. In studying grammar, students can learn what it is to apply the scientific method directly to their own species, with the concomitant discovery that one can ask scientific questions about a whole host of areas which don't leap to the eye from the physics-chemistry-biology canon -- grammatical study can teach one how to be a scientist of humanity. Besides that, it's practical: the raw material for linguistic study is abundant and directly accessible to everyone in a classroom without special equipment. And the range of applications for linguistic expertise are expanding at an almost unbelievable rate. Besides the obvious ones -- law, language teaching, speech therapy, translation (machine or otherwise), any kind of speech technology, editing -- there are many information-age professions for which grammatical expertise is important. A young French citizen who would like to work for Microsoft or Google will find that a grounding in grammatical study will serve amazingly well.

Plus, of course, when such an educated Français(e) comes across an ambiguous sentence or unclear expression in their own or others' writing, they will be able to describe explicitly and sensibly why it is the way it is. One doesn't need grammatical training to spot ambiguity or infelicity; one doesn't need grammatical training to eliminate ambiguity or infelicity, but one does need linguistic training to talk sensibly about what was wrong and why the reformulated version fixed the problem.

¹Links that appear in green are to documents entirely in French.
²Suggestions for improving these translations are gratefully accepted -- hharley AT email DOT arizona DOT edu.

^† Update I: Geoffrey Nunberg sends in the following news story from Libération in which the minister's original quote appears: Des cours plus simples et plus ludiques, and also this piece from Les Echos, which has a similar theme (see the last paragraph). He also notes:

"In this connection, some years ago John Rae, the headmaster of the Westminster School, blamed the disruptions of the 1960's on the abandonment of grammar:

The overthrow of grammar coincided with the acceptance of the equivalent of creative writing in social behaviour. As nice points of grammar were mockingly dismissed as pedantic and irrelevant, so was punctiliousness in such matters as honesty, responsibility, property, gratitude, apology and so on. (Observer 7 Feb 1982)"

Who'd have thought that grammatical understanding could have such far-reaching consequences, of a kind normally only attributed to vigorous participation in team sports or Bible study?

* Update II: Chirs Waigl wrote to help me out with this translation, which made it actually considerably more sensible and less flappy although still not entirely transparent. She also writes:

I used to teach English at French lower secondary state schools for two years, so I know that scene a little bit, and the lingo the curriculum and its authors employ.... As for explicit grammar teaching in French schools, the one things that is immediately apparent to any observer who has not grown up in the French system is how much there is, and for how long. And verbal morphology plays a large role. One of the main explanation I've found for this emphasis on "conjugaison" is quite simply the spelling-system-induced gap between the morphology of written and spoken French. As a simple example, children will build their knowledge of the present tense of a verb like "manger" (eat) with three distinct forms -- they all sound alike, except for the first and second person plural. It must come as a bit of a shock to the 6 to 8 year olds when they realize that there are actually five different spellings involved (je mange, tu manges, ils mangent sound alike except before vowels, when there just _might_ be an audible ending).

And finally, Mark Etherton sends in the link to Erick Orsenna's web page, novelist and language maven, and member of the Académie Française, who collaborated on the report. Mark reports that his grammatical writings have a highly individual character; perhaps his influence may account for some of the more poetic turns of phrase.

Posted by Heidi Harley at 03:54 PM

Freaking

You can stop sending me messages about freaking and freak dancing. Freaking (usually called freak dancing, apparently) clearly belongs on the list of lewd behavior at a school dance, as a dozen or so correspondents have patiently explained to me. There was a NYT article yesterday about freak dancing (banned in Manlius, New York), and five years ago even Bill O'Reilly took notice of it; there are references to it all over the place. Pelvic thrusting is involved.

On the rest of the items in the Zits list, there's less agreement.

(Freak(ing) might be an echo of fuck(ing). It is, after all, a common avoidance substitute for the tabooed fuck(ing).)

To refresh your memory: I divided the list of 16 items into three sets: words (like bumping) I recognized as referring to behavior that would be considered lewd at a high school dance; words (like moshing) I recognized as referring to other sorts of activities that might be considered inappropriate there, because they are aggressive or dangerous; and words (like pronking and knurling) that I was just dubious about. Remember that the activities in question were presented in the cartoon as being specifically LEWD, not just inappropriate, so that even the items in group 2 are somewhat problematic. I was aware of freaking in a drug context, but had somehow missed the dirty-dancing sense (no one can be on top of everything in every part of every culture; we all miss things). Now it turns out that some of the other words have drug-related senses, according to Ben Lavender: rolling and mashing as well as freaking. I have no idea how widespread these terms are.

Several correspondents suggested that I should have consulted the Urban Dictionary site, where I would have found, for example, sexual definitions for mash: 'to have sex', 'to engage in sexual foreplay or heavy petting' (hat tip on this one to Elise Stickles). I've found the Urban Dictionary very frustrating to use. You can get no sense of how widespread a usage is; some entries are likely to be reports of items used by only a few friends on very specific occasions, and some might be sheer exuberant inventions. And the definitions (provided by ordinary people, not lexicographers) are often hard to interpret. In addition, they are from all over the English-speaking world; many are clearly British or Australian, but most of the time you can't be sure. Ben Judson pointed me to the Urban Dictionary entries for pronk and sledging, as well as mash and freak, and they are typically problematic.

All four versions of pronk seem to be inventions: 'a joke involving someone else's genitalia' (a play on prank); a style of music (portmanteau of prog and rock); 'a person who likes rock, punk and some pop' (another portmanteau); and the nickname of baseball player Travis Hafner (yet another portmaneau: "It stands for 'half project, half donkey.' ").

For sledging there's one definition that's clearly not American ('to protrude your jaw outwards whilst rolling your eyes back into your head', with a puzzling exemplar that doesn't contain the word but does contain the vocative lad) and two from Australian cricket slang.

Nothing relevant here, nor anything relevant for wallow, knurl, or squean. (Derry Earnshaw notes that the OED has an entry for squean, but from the early 17th century and in an irrelevant sense.)

Most of my correspondents, including some who are in fact teenagers, agreed with my breakdown of the list, except for freaking. But Laura Petelle reports that

Pronking and knurling are spastic dance moves. (Pronking I assume from the springbok antelope's similar movements, although may be just because it's a cool-sounding word; knurling I don't know the origin of but I think the move comes from street dancing post-breakdancing. Both are after my time.)

Mashing was a dance move when I was in high school but it was part of house dancing; I think it means something different now. I don't know wallowing, but I've heard my teenaged brother use sledging.

It's not clear to me why any of these moves should be banned at school dances.

In any case, the picture that emerges is that most of these words -- wallowing and squeaning are still unattested with relevant meanings -- have been used, by at least some teenagers on some occasions, to refer to dancing or sexual display or both, and some have been used to refer to drug use. But most of them are scarcely widespread, and it's unlikely that any single teenager has ever had the relevant uses for all 14 of the currently attested items. Petelle thinks that Zits is unusually accurate in its portrayal of teenagers. It is certainly sympathetic to them, but it often takes the viewpoint of baffled adults. In fact, we've pointed out before (here and here) that the cartoon sometimes offers stereotypes of teenage behavior. In the latest case, I still think that there's a lot of sheer invention going on, rather than keen observation of the actual adolescent world.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:25 PM

Perforating database search?

Not long after Google made patent search available, Lubos Motl posted a list of "nine patents [that] depend on string theory". I guess that the idea is to show that string theory is not only not "not even wrong", but is actually providing the theoretical foundations for practical invention. (Or more likely, Motl's post is just a joke.) I'll leave it to others to evaluate the "Method for ameliorating the aging process and the effects thereof utilizing electromagnetic energy" and the "Space vehicle propelled by the pressure of inflationary vacuum state", but I was fascinated to find several language-related patents in the list. An echo, decades later, of Ed Witten's undergraduate minor in linguistics at Brandeis?

Not so, at least for the patent that I personally found most interesting, namely US Pat. 6862586, issued Mar. 1, 2005 to a group of researchers at IBM, describing a method for "Searching databases that identifying group documents forming high-dimensional torus geometric K-means clustering, ranking, summarizing based on vector triplets."

Seriously, that's the title. Talk about your bag of words. I can only imagine that a few phrases have been transposed and/or a few words left out -- anyone case to speculate about what the title was supposed to be? Anyhow, the "high-dimensional" part seems vaguely promising, so let's read on.

This patent's first claim is:

1. A method of perforating a database search comprising:

searching a database using a query, said searching identifying a group of hyperlinked documents;

forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;

clustering said result items into clusters based on said high-dimensional torus geometric representation;

ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation;
[etc. etc.]

"Perforating a database search"? Could this be a radical new database operation for accessing the extra curled-up dimensions of (the meaning of linguistic) strings? No, after reading the rest of the document, I'm pretty sure that "perforating" is just a scribal error for "performing".

In fact, alas, this patent's only connection to string theory comes at the end, where "string theory" is used as one of several sample queries:

Results for one query are discussed below (e.g., the query "latent semantic indexing"). [...]

Similarly, in response to the query "string theory" the invention brings up "The Official String Theory home page" as S and in reponse to the query 'Information Retrieval" the invention brings up 'The SIGIR home page" as S.

What a disappointment. I was hoping to learn about the bulk of meaning beyond the brane of text, penetrated only by the force of insight (thus explaining why insight is so many orders of magnitude weaker than other interpretive forces).

But I'm left with a question. Aren't patent examiners supposed to, you know, like actually read the patents they approve, and determine that at least some of the text -- say, the title and the first claim -- make some kind of sense as written? (I'm not objecting to the content of this patent, which at a quick read looks sensible and interesting once you get past the random errors in presentation.)

[Update -- Rob, a patent-lawyer, explains:

You are right that the examiner (and even the administrative folks who are looking at these applications during the process) is supposed to review things like the title to make sure they make sense and comply with the rules (for instance, the rules require that the title "must be as short and specific as possible" - unlikely in this case as the title is nearly as long as the Abstract). Some typos in final documents (perhaps "perforating" in this case) come from the scanning and printing processes. This particular application was filed before the advent of the on-line electronic file histories, so it is not possible to see what it looked like as filed without ordering the file history from the PTO. If there was actually an error in the papers as filed, then it is all the more remarkable because not only did an examiner look at them, but the board of patent appeals did as well. Presumably, someone in that three judge panel should have noticed that the title and claim 1 were nonsensical.

]

Posted by Mark Liberman at 10:01 AM

Cats and dogs

I've read Smolin, and I've read Woit, but Rina Piccolo's cartoon explained the arguments about string theory in a new way:

Posted by Mark Liberman at 08:18 AM

December 17, 2006

The traditional Chinese name "Beautiful Atlanta"

News reports on the naming of the new panda cub at the Atlanta zoo tell us that her name, Mei-Lan, means "beautiful Atlanta". You might wonder about that, especially when the article goes on to quote the director of the Chinese panda center as saying that the name is one bestowed on girls whose parents want them to step outside traditional women's roles. How on earth would the city of Atlanta have acquired such a role in traditional Chinese nomenclature?

Actually, Mei-Lan 美蘭 is a common, traditional, Chinese girl's name. It means "beautiful orchid". Here, for example, is a Yootube video of a young lady named 周美蘭 whose English name is "Jessica Chow", a finalist at a recent talent competition in Vancouver.

The article isn't entirely wrong, though. There is a connection to Atlanta. The usual Chinese spelling of Atlanta, using the characters for their sound, is 亞特蘭大 ("Asia", "special", "orchid", "big"), and in compounds it is the syllable /lan/ that is extracted to represent Atlanta, so the /lan/ of Mei-Lan can be seen as standing for Atlanta.

Posted by Bill Poser at 04:23 PM

More adolescent vocabulary

Popular treatments of teenage language usually assert simultaneously that adolescent vocabulary is desperately impoverished -- girls use like as almost every third word in their sentences, guys communicate entirely through exchanges of the word dude -- and that teenagers have "a language of their own", packed with vast numbers of vocabulary items that make their speech incomprehensible to outsiders. I'm more entertained by critiques of the second type, because they're at least based on some shreds of fact (though they still treat perceived features of teenage speech as deficiencies, in this case a failure to be clear, rather than mere differences). Today's Zits cartoon (drawn by Jerry Scott and Jim Borgman) includes a catalog of words for types of behavior that a high school judges to be unacceptable at a school dance, including a fair number that most readers will find uninterpretable in this context:

It's a long list of V-ing nominalizations. Some of them I recognize as naming activities that would count as lewd at a school dance: grinding, bumping, licking, booty dancing, fondling. Some name activities that a school might view as inappropriately aggressive or dangerous: moshing, shoving, rolling, kicking. Some of them I have interpretations for, but not ones that would be relevant to school dances: mashing, sledging, wallowing, freaking, pronking, knurling. One is totally new to me, and a Google search yields nothing useful: squeaning.

I know, you're saying that some of this is just made up, and some of it is teen slang imported from other contexts into this one, and no doubt you're right. I'm especially suspicious of "squeaning" and "whole- or half-body knurling". Possibly kids these days are inclined to leap and bound like antelopes while dancing -- I'm familiar with pogoing (which has its own wikipedia page!), after all -- and they call this, quite appropriately, "pronking", but I wonder, I wonder. I also wonder about mashing, sledging, wallowing, and freaking.

My scorecard: of the 16 words, 5 I recognize as referring to lewd behavior and 4 to other behavior inappropriate for a school dance, while I have doubts about the other 7 (almost half the list).

Of course, the point of the list is to provide a lot of stuff that readers will find incomprehensible in this context -- the message is that teens use lots of words we don't understand -- but this is done not by using actual teen slang, but to a large part by listing words of the sort that teenagers MIGHT use. That way the list is mostly incomprehensible to EVERYONE who reads it, even those who know something about teen talk. The message is that the way teenagers talk is not only impenetrable, but hopelessly impenetrable.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:11 PM

Plurals

Attentive readers of Language Log should be able to quickly answer the challenge from Will Shortz on this week's Weekend Edition Sunday puzzle:

Think of a five-letter word starting with T. The word is plural. Add an S at the end, and you'll get a six-letter word that is also plural. What words are these?

Especially attentive readers should also be able to explain how misleading the challenge is. (I'll supply what I think the answer is after Thursday of next week, after the puzzle answer submission deadline.)

Posted by Eric Bakovic at 01:33 PM

From description of difference to discourse of deficiency

I just got a note from Tony McEnery in connection with our recent posts on the Vicky-Pollardization of British teen English:

Apologies for not replying earlier - your emails were swept up by my over-zealous spam filter (which has to be set to over-zealous to catch even a fair proportion of the spam I get). I clear out my spam on Sundays, hence the email now. The work I did for Tesco was, predictably, confidential. However, you are right to surmise that the work itself was widely misrepresented in the press. I wrote a study looking at difference and, predictably, the press translated that into a discourse of deficiency. If you would like me to comment a little more in confidence I would be happy to do so. Three things I would note, however are:

1.) Tesco was not marketing a product on the back of the report - they wanted to make a charitable donation to schools and used the report to guide the form of the donation. I appreciate that there is publicity value for them in the donation, but nonetheless I am happy that they are making a £750,000 investment in schools.
2.) The original report had a very different slant - if you look at the Lancaster University version of the story you can get something closer to the spirit of the original report. Even there, however, I have had to object to the title of the story (I hope that will be changed soon). See it here
3.) I was dismayed that the Vicky Pollard angle on the story arose and took off, though in hindsight there seems a grinding inevitability to this. I spent the whole day Friday giving radio interviews (26 in all) setting the record straight. Given that I started at 6:40 in the morning at finished at four in the afternoon, it felt something like a penance! I have been promised copies of some of these interviews. If I get one I will rip it and send it to you if you like.

BTW - I thought your blog was both thoughtful and gracious given the information you had at your disposal.

I'm about to be late for a brunch appointment, so I'll limit my comments to the observation that I tried to follow the standard rule of thumb in cases of attributional abduction: "When it's not clear where a piece of media foolishness comes from, blame the journalists".

And I'll quote something I wrote about Glenn Wilson's experience with the "email lowers your IQ more than pot" story:

When a piece of scientific research comes to the attention of the media, those who know it best should make available a simple account of what the research is and what it means (or doesn't mean). If misinterpretations become rampant -- which is just another way of saying, if there's widespread media interest -- then it's in everyone's interest for the authors to address the misrepresentations directly. This clarifies things for the more sensible fractions of the public and the media. And it should also help reduce the "bonkers" factor, since even reporters often use web search before they start making phone calls and sending emails, and if they don't, you can still send them off to read your "what my research is and is not" page instead of repeating the same explanations again and again.

Posted by Mark Liberman at 09:06 AM

Solving the world's problems with linguistics

Over at The Austrian Economists, Frederic Sautet suggests a radically expanded role for linguists in global government and industry ("Is language a determinant of reform success?", 12/11/2006):

Last week Graham Scott gave a lecture on public sector management and governance at the Mercatus Center. Dr Scott was the Secretary of the New Zealand Treasury between 1986 and 1993, which was a very important position at the time of the NZ reform process. [...]

During his lecture Graham Scott remarked that the word “accountability” has no translation in many languages. For instance, it has no direct translation in French and Spanish. I presume it is the same with other Latin-based languages, such as Italian or Portuguese. While the word “responsibility” is Latin in its origin (and thus has equivalents in French and Spanish and other languages), it encompasses more than just accountability and, for that reason, is much less precise. In Scott’s view, the concept of accountability is at the core of the public management reforms in New Zealand. But its absence in many other languages may limit (and perhaps has already limited) the adoption of similar reforms elsewhere. Or it may lower the quality of their results. This would show the power of language in shaping institutions.

This is an unusually fine specimen of the "No word for X" fallacy.

Is there any language that is unable to express the concept of the obligation to explain or justify one's actions to someone else? I doubt it.

It's true that the English word accountable usefully connects two sets of concepts -- financial record-keeping and story-telling -- because account in the sense of "financial record" and account in the sense of "story or explanation" are derived from a medieval split in the meaning of words derived from Latin com- "together" + putāre "to reckon", whose derivatives came to mean both "to tell" and "to count".

But this split also exists in Spanish, where contar means both "to count" and "to tell", and where expressions like dar cuenta and rendir cuentas retain or recreate the ambiguity. The French spell compte and conte differently, these days, but they use compte in a generalized sense in lots of expressions: rendre compte for "report" or "explain", se rendre compte for "realize" or "recognize", devoir rendre compte pour for "need to answer for", tenir compte de "take into account", etc.

Does having a single word for "accountability", and using that word frequently, entail implementation of the concept in one's actions? Apparently not.

And if you use a short phrase instead of a single word to refer to a concept, how seriously are you handicapped in understanding or implementation? Well, we Americans have done a decent job in the area of computer science, despite this defect; and the French have not yet managed a successful challenge to Silicon Valley, although they can wield the single word informatique in planning their efforts. Does anyone really think that American efforts in this arena would be more effective if we adopted more widely the one-word neologism informatics?

But wait a minute. Here I am, being all negative about what is, after all, a suggestion that members of my profession ought to be given a central role in addressing the world's problems. Let's try this one again.

Taking it from the top... It's clearly our duty as linguists to help the speakers of Romance languages to share the anglophone world's bounty of accountable institutions, and we here at Language Log are ready to offer consultation services to any well-endowed NGOs who would like to engage this important problem. The first step, clearly, would be a fact-finding tour through Paris, Nice, Madrid, Barcelona, Rome, Lisbon, Rio de Janeiro, etc.

But this is only the tip of the linguistically-driven-reform iceberg. For example -- speaking of ice -- the concept of "global warming" is named by a short phrase in most of the world's major languages: réchauffement climatique, calentamiento global, riscaldamento globale, aquecimento global, globale Erwärmung, etc. If we follow the thinking of Graham Scott and many others, the key problem here is not climatological or technological or economic -- it's linguistic. So for a suitable fee, Language Log Industries LLC can provide a One World One Word Solution®. This will be a term for "global warming" that can be adopted as a single lexeme in all the world's languages. The result? Analytic consensus and an effective global action plan, outcomes otherwise impossible.

[Hat tip to Joel Thibault]

[Update -- Emmanuel Ruellan writes:

A native speaker of French, I heartily agree with your idea that the French language is by no means inferior to the English one.

As regards the lack of direct translation of “accountability” in French, the sentence “Ils ont des comptes à rendre à (leurs actionnaires | leurs électeurs | whoever one might be accountable to)” springs to my mind. It not a direct translation, but the phrase "comptes à rendre" denotes the obligation to explain or justify one's actions, as you put it, and also suggests the idea of a ledger, therefore it is pretty close to the English term.

Some years ago, my father, whose job consisted in verifying public accounts at local and regional levels, met British counterparts so to compare British and French methods. He gave me a quick account of their findings, and we were both amused by the constant use of the phrase "value for money" by the Brits. To us, it sounded more like shopping at Tesco, or Carrefour for that matter, than public accounting. As far as I remember, French public accountants do not have an equivalent phrase, but it does not prevent them to check that the taxpayer's money is spent according to rules and regulations and to check that it is well spent.

When I was in "Terminale" (that would be the equivalent of the last class of high school in the US, I guess, and that of sixth form in the UK), our philosophy teacher explained us that the language one speaks influences the way one thinks. He did not expand a lot on the subject, nor did he give an explanation. I was rather put off by what I saw as a sloppy discourse, and I found it satisfying, some years later, to discover that most linguists rejected the Sapir-Whorf hypothesis, at least in its strong version.

Shhh, the World Bank is on the line, discussing our project to bring accountability to... well, you know.

And David Fried writes:

The absence-of-a-word-for-accountability meme is constantly raised in Israel re Modern Hebrew. It was thoroughly dismantled by the truly excellent columnist Philologos in the Forward on October 13 ("Accountability").

Here's a brief quote from this indeed excellent piece:

I don’t know how many times I’ve heard it said that Hebrew has no word for “accountability,” the implication being that, if you don’t have a word for something, you can hardly be expected have the thing itself. It’s one of those myths that speakers sometimes have about their own language, which, once established, are nearly impossible to get rid of.

Why a myth? Well, as someone who believes in being familiar with Jewish sources, Yakir Segev should know the little Mishnaic book called Pirkei Avot and commonly referred to as “Ethics of the Fathers” in English. Pirkei Avot is, for good reasons, one of the most studied and most beloved of all Jewish texts, so much so that it is regularly printed in the standard prayer book. And what is the very first sentence in it? In the somewhat dated but still serviceable translation of the British scholar R. Travers Herford, it is:

“Akavia ben Mehalalel said: — Keep in view three things and thou wilt not come into the power of sin. Know whence thou comest and whither thou goest and before whom thou art to give strict account. Whence thou comest, — from a fetid drop. Whither thou goest, — to the place of dust, worms and maggots: and before whom thou art to give strict account, — Before the king of the kings of kings, the Holy one blessed be He.”

“And before whom thou art to give strict account” could also, of course, be translated as, “And before whom you are accountable.” The Hebrew reads: Ve-lifnei mi ata atid li’ten din ve-h.eshbon –”And before whom you are to give a din and a h.eshbon.” Din in classical Hebrew is a “judgment” or “legal process,” and h.eshbon is “account,” “bill,” or “arithmetical sum,” and the combination of the two, din-ve-h.eshbon, means a report in modern Hebrew, as in “book report” or “committee report,” while in its acronymic form of doh. it can also denote a traffic ticket.

Elsewhere in rabbinic Hebrew, to be accountable is shortened to li’ten [or la’tet] et ha-din, “to give a din.” And when converted into the noun “accountability,” it becomes matan din or matan ha-din. It’s a perfectly common expression.

I'm starting to get the idea that "no word for accountability" has become a sort of limited-circulation cliché, recirculated among civil servants and NGO professionals world-wide, in the category of raw materials for speechifying:

For example, this report about consultation of Asian Development Bank officials in Washington, D.C., quotes Robert Salamon, director of the ADB's office of external relations:

One little linguistic sidelight, Salamon mentioned that in most of the countries they have visited there is no comparable word for "accountability."

And this DEN discussion list post notes:

When Homer Sarasohn got to Japan he found there was no equivalent Japanese word for accountability. He told me this in the video I made of him several years ago.

And from a "workshop on the accountability and governance of NGOs":

Cultural implications are important in accountability. For example, there is no word for accountability in Portuguese! Both language and imagery must speak to as many cultures as possible.

And from the Soros organization, this note:

Another piece in the counter-intuitive puzzle is the missing word for "accountability" in the Russian language.

And onward: "The researcher did not find a direct equivalent Bemba (local dialect) word for accountability." "In Italy, they don’t have a word for ‘accountability.’" "The Chinese language does not seem to have an equivalent word for accountability (Minxin. Pei 1999)." Etc.

It seems that this lexical deficit is perceived as a problem worldwide. The secret of America's success? Not its geography, not its people, not its natural resources or its political system -- no, it's this unique lexical invention. You might think that the problem could be solved by simple lexical borrowing (as words like "coffee", "BBQ" and "blog" have been borrowed over the centuries), but the world's NGOs are clearly unable to arrange this without expert guidance. This is a major marketing opportunity for Language Log LLC's soon-to-be-released AccountiLex product line -- I can see the IPO taking shape on the horizon already!]

Posted by Mark Liberman at 07:55 AM

December 16, 2006

Flashy frequency finder

Here's a quick footnote to the fatuous folderol about British teens' vocabulary deployment currently being laid waste by Arnold, Mark and Geoff.¹ Squires, over at Polyglot Conspiracy, recently linked to a funky little online app called WordCount which shows you in order of frequency and in a graphically enhanced format the 86,800 words that occur more than once in the British National Corpus.

The description of the app at the site says:

WordCount^TM is an artistic experiment in the way we use language. It presents the 86,800 most frequently used English words, ranked in order of commonness. Each word is scaled to reflect its frequency relative to the words that precede and follow it, giving a visual barometer of relevance. The larger the word, the more we use it. The smaller the word, the more uncommon it is.

It's supposed to be an intuitive interface, and certainly it's easy to scroll through, and easy to type in a word and find its relative frequency. The interest of the graphic scaling wears off quickly, though, since the long-tail distribution of the English vocabulary means that after the first few words, the size difference between any two words in the window at the same time is minimal to nonexistent. It's not useful for anyone actually interested in doing anything with the numbers, who are much better off working with some other datasource, but for just a quick look (like the one the teen-word-use journalists didn't bother to take), it can be fun. At the moment most use of the site seems to have to do with 'conspriacies', funny or suggestive conglomerations of words in the same frequency range, which doesn't strike me as being of any intrinsic linguistic interest.

But one could imagine applications that would be more substantial and would still have some kind of 'kewl!' factor for the casual clicker-by. For example, how about allowing users to identify their vocabulary recognition rate in various frequency ranges? The site could scroll to a random sequence of ten words some medium frequency range, and allow readers to identify which words they recognize; then another ten words higher or lower in the frequency count, and so on. Then the site could report to the reader the frequencies at which their average recognition rate was, e.g., 10 in 10, 7 in 10, 5 in 10, 3 in 10... and compare the user's recognition rate to the average of other users on the site. Or something like that.

Anyway, the main thing is, it's a user-friendly visual presentation of the point that the senior Loggers have been making about the relative percentages of any corpus taken up by the most frequent words in the English lexicon. They're all 'function' words -- you can hardly make a grammatical English sentence without one or more of them -- and they have essentially nothing to tell us about the vocabulary range of any English speaker. If you don't use a goodly sampling of these words in nearly every sentence, you're almost certainly not a native speaker of English.

Update: Daniel Ezra Johnson writes to let us know that the scaling on WordCount adjusts the height of a word in proportion to its frequency. As a consequence, the overall area occupied by a scaled word is proportional to the square of its frequency, likely resulting in a rather different set of perceived comparisons than intended. A word with 4 occurrences in 1,000,000 would be twice as high as a word with 2 occurrences in 1,000,000 (4:2), but that makes it four times as big, area-wise, as a 2:1,000,000 word of the same length (16:4). The bigger the difference in the numbers, of course, the worse it gets. It's a little trickier even than that, since the words are of different lengths, so a long high-frequency word is really going to look a lot bigger than a short low-frequency word.

¹ It's a bit dicey going into the Senior Writers' Lounge these days; they're all looking a little wild around the eyes. This on top of the battle with the Brizendine hydra has almost been too much. Some major news source had better publish a report on some well-documented, novel and interesting language research soon, or the fog of despair that occasionally threatens at the Plaza may congeal decisively, with something of a negative effect on posting rates, I'm afraid.

Posted by Heidi Harley at 10:14 PM

The spread of bogus numbers in the meme pool

There's a sort of Darwinian effect of the media spotlight -- survival of the most sound-bitable. And some of the most rapidly reproducing sound-bites are quantified statements like "teenagers use just 20 words for a third of what they say", or "email lowers IQ by ten points", or "Eskimos have dozens (or hundreds, or thousands) of words for snow", or "women use three times more words than men do".

That's why PR types, journalists and public intellectuals create and spread a vast array of meaningless, mis-interpreted or just plain made up numbers. Sexy numbers prosper in the struggle for memetic hegemony. And unless there's some process to enforce honest signaling by penalizing over-interpretation, exaggeration, or flat-out lying, bogus numbers will dominate the meme pool, because for every truthful number, there's always a nearby bogus number that's sexier.

This afternoon, when I turned on the radio, I happened to hear a few seconds of an interview with Louann Brizendine ("Comparing Mars and Venus in Neuroscience"). The bit that I heard was:

Q: Now, I saw you quoted in the New York Times, speaking of pregnancy, that the female brain shrinks about eight percent during pregnancy? And doesn't return back to its normal size until about six months after delivery?

A: Yes, Debbie, that's a surprising study that uh has found eight percent shrinkage, even after you account for any increased water weight. And scientists don't know really why that happens, except that the female brain is doing all kinds of rewiring during that period, to get the mom ready to do maternal behavior. And also remember, the fetus is more like a parasite, and ((that)) it gets fed whatever it wants, and lots and lots of lipids and special fats exist inside the brain cells, and some scientists speculate that the fetus is sort of snacking on the mother's brain.

I noticed this same factoid in the NYT magazine interview ( "He Thought, She Thought") that we discussed here earlier, and I wondered whether it was true. In Brizendine's book "The Female Brain", the chapter "The Mommy Brain" discusses a brain shrinkage of unspecified size (p. 100):

Between six months and the end of pregnancy, fMRI [sic] brain scans have shown that a pregnant woman's brain is actually shrinking. This may be because some parts of her brain get larger as others get smaller -- a state that gradually returns to normal by six months after giving birth.

[There's a typo here, by the way -- fMRI stands for "functional magnetic resonance imaging", and it's a technique for measuring local changes in cerebral blood flow as a consequence of different sorts of brain activity, like looking at funny vs. unfunny cartoons. Measurements of brain size would use plain old structural MRI, with no f-for-functional involved.]

Anyhow, Dr. Brizendine has clearly learned the lesson about quantitative sound bites. The "mommy brain" shinks a bit: ...bore-ring... The "mommy brain" shrinks 8%: wow!

Now, 8% brain shrinkage isn't nearly as sexy as the business about men thinking of sex every 52 seconds, or women using three times as many words as men and talking twice as fast. But still, it's something.

And given previous experience, it occurred to me to wonder about the number. So I checked, back when I read the NYT article. As you'll see below, the number is somewhat bogus. But it's only inflated by 86%, which is a pretty small exaggeration compared to the tall tale about sexual thoughts every 52 seconds, which appears to be inflated by 23,736%.

So I decided not to write about it, since the content of the claim has nothing to do with speech and language. And if I tried to document every bogus statistic in the mass media, I'd never have any time for anything else. Just writing about Dr. Louann Brizendine's statements about sex differences in communication is starting to make me feel like the circus clown that follows the elephant around the ring with a shovel. However, after the "20 words for a third of what they say" business this afternoon, Geoff Pullum suggested by email that we should think more broadly about the public rhetoric of science. So here's some more raw material. Not as raw as some, but still.

Dr. Brizendine's endnotes refer the business about brain shrinkage in pregnancy to Oatridge, A., et al. "Change in brain size during and after pregnancy: Study in Healthy Women and Women with Preeclampsia", Am J Neuroradiol 23(1): 19-26, 2002. That article, I'm happy to say, gives a convenient table of the values that they measured, which I was able to copy as html directly from the online version:

Table 3: Absolute brain volumes (cm³) before, during, and after pregnancy

Subjects Before Pregnancy 15 Weeks’ Gestation 20 Weeks’ Gestation 25 Weeks’ Gestation 30 Weeks’ Gestation 35 Weeks’ Gestation Before Delivery (Term) 6 Weeks after Delivery 24 Weeks after Delivery 40 Weeks after Delivery 52 Weeks after Delivery

Healthy group

1 1437.4 1494.5 1494.7

2 1089.2 1109.7 1122.0

3 1201.7 1245.4 1267.4 1252.5 1260.2

4 1371.8 1383.6 1415.1 1420.2 1415.8

5 1207.9 1237.7 1246.2 1253.6 1244.0

6 1197.4 1195.5 1184.0 1187.3 1172.2 1163.0 1150.7 1180.1 1205.7

7 1277.5 1265.9 1255.7 1238.0 1221.9 1268.8 1290.5

8 1288.9 1247.1 1233.5 1241.5 1208.7 1248.3 1269.9

9 965.5 953.9 946.2 975.2

Preeclamptic group

1 942.7 981.7 981.9 977.3 975.5

2^* 1036.2 1070.3 1080.6 1043.1

3 999.9 1027.5 1037.1

4 1274.3 1303.9 1312.0 1315.2

5 1183.4 1238.7

There were two women in the study, number 6 and 8, who were measured both before pregnancy and at term. Over that period, their brains shrank 4.06% and 6.6% respectively, for an average of 5.3%. There were eight women in the normal group whose brains were measured at term and 24 week (i.e. six months) after delivery. Their brains increased in size during that time by 4.0%, 3.0%, 5.5%, 3.2%, 4.8%, 5.6% and 5.1% respectively, for an average of 4.3%, with a 95-percent confidence interval of 3.4% to 5.2%.

How did this effect get roughly doubled to 8%? The only thing that I can think of is that the maximum value on the vertical axis of the paper's Figure 3A -- which is a graphical presentation of the same data as in Table 3 above -- is 8%:

I'm sure that this was an honest (though careless) mistake of memory. A dishonest author might have tried to inflate the effect still further -- would you believe 10%? 15%? Listen, it's a little-known fact that a woman's cerebral cortex completely disappears during preganancy! No, there really are some constraints on dishonest assertions in this arena. Even a journalist wouldn't believe that. Well, some journalists would, I guess -- the stuff in Leonard Sax's book about sex differences in sight and hearing is just about that far out. But they'd get teased about it later. I hope.

[I should mention in passing that there's nothing in the Oatridge et al. study about "allow[ing] for increased water weight", as far as I can see. They just measured brain volume in MRI scans. The values in the table above are their raw measurements, not adjusted for water weight or anything else.]

Posted by Mark Liberman at 08:08 PM

Only 20 words for a third of what they say: a replication

I report here a small experiment I conducted to follow up on Mark Liberman's discussion (and Arnold Zwicky's earlier mention) of the shocking news that Britain's teenagers use a mere 20 words for a third of everything they say. (Scientists like to make sure that results can be replicated.) I took the entire text of the actual BBC article reporting this news of verbal poverty (see it at this web page), computed the top 20 most frequent words in it, and worked out what percentage of the total it was. The answer is between 36 and 40 percent. (The difference depends on how much you collapse different word forms together into lexemes. Collapsing genitives and plurals with non-genitive singulars makes hardly any difference to the results, but treating is, are, was, and were as different words rather than as representatives of the verb be lowers the figure slightly. If you do the collapsing, the top 20 words make up over 39.5% of the text. If you don't, the top 20 account for just over 36%.)

So this is the situation. This staggeringly stupid news report states that Britain's teenagers are "held back by poor verbal skills" because the evidence shows that the top 20 words in their speech account for 33% of all the words they use — the implication being that they aren't using enough words, they're just repeating a few words like "yeah" and "no" and "but" and "like". But in the staggeringly stupid article itself, the top 20 words account for substantially more than that. So Britain's science writers (at least at the BBC) are even more verbally retarded. Hello? Is there anyone out there who thinks Mark is exaggerating when he says BBC science reporters are writing junk that brings science into disrepute?

In case you want to see the results I got (which you can easily check for yourself), here they are (with the lexeme collapsing done). There are 402 words in the text (if you replace hyphens by spaces), and this table shows the numbers of occurrences for the top 20 in frequency:

25	the
16	forms of the verb be
13	of
10	and
10	in
10	to
9	forms of the noun *word*
8	a
7	but
6	as
6	forms of the pronoun it
5	forms of the pronoun he
5	no
5	forms of the verb *say*
5	speech
4	by
4	forms of the noun *school*
4	that
4	which
4	with

These words account for 25 + 16 + 13 + 10 + 10 + 10 + 9 + 8 + 7 + 6 + 6 + 5 + 5 + 5 + 5 + 4 + 4 + 4 + 4 + 4 = 160 occurrences, and 160/402 = 39.8%.

The reason it seems to me sensible to collapse down to lexemes is that it would be absurd to say that a teenager wasn't using the word "parallelism" if the record showed that he regularly used the word "parallelisms", or to insist of someone who used "emancipator's" that she didn't have the word "emancipator" in her vocabulary. However, even if you insist on going with raw word forms with not even the singulars and plurals collapsed, my count shows the percentage only going down to 36%, which is still higher than the teenagers' alleged 33%.

Posted by Geoffrey K. Pullum at 03:00 PM

Britain's scientists risk becoming hypocritical laughing-stocks, research suggests

Back in April and May of 2005, there was a flurry of preposterous stories about how using cell phones and email lowers your IQ more than smoking marijuana does. You can read all about it here. The basic ingredients were:

a company with something to sell, which
hired a reputable scientist to do a (private, unreleased) study designed to publicize its products, and then
distributed misleading and partly false press releases, exaggerating the results of this research, which
lazy, credulous or opportunistic journalists vied with one another to publish in ever more sensationalist and misleading forms.

We seem to be going down that primrose path again. As often, the BBC is leading the way -- "UK's Vicky Pollards 'left behind'", 12/12/2006:

Britain's teenagers risk becoming a nation of "Vicky Pollards" held back by poor verbal skills, research suggests.

And like the Little Britain character the top 20 words used, including yeah, no, but and like, account for around a third of all words, the study says.

If, like me, you're a bit fuzzy about just who this Vicky Pollard person is, I can recommend the Best of Vicky Pollard I and Best of Vicky Pollard II on YouTube. It's curious. The Vicky character -- a broad satire of the accent, dress and manners of British lumpen-teen females -- is portrayed as hyper-verbal. One of the basic Vicky bits is her jabbering rapidly on automatic pilot, saying far more than she should. Yet the BBC sees her as someone who is unable to communicate due to an inadequate word stock, not someone who over-communicates with socially inappropriate content, accent, word choice and sentence structure. This is another piece of evidence that journalists these days are incapable of elementary observation and common-sense description, at least when it comes to speech and language.

Now, we're told, "research suggests" that the stereotype of low-verbal Vicky is correct. I'm not sure what's really going on here, since the primary source is the BBC science section, which has become a consistently unreliable source of information. The article attributes the research in question to Tony McEnery, who is a fine computational linguist. It quotes or paraphrases him saying a number of things that don't really make sense as written, like this:

His analysis of a database of teenage speech suggested teenagers had a vocabulary of just over 12,600 words compared with the nearly 21,400 words that the average person aged 25 to 34 uses.

It's essentially impossible to estimate someone's total vocabulary accurately from a sample of their speech or writing -- certainly not with a precision like "just over 12,600 words" or "nearly 21,400 words". And in any case, numbers like 12,600 and 21,400 are way too small to represent the vocabulary of contemporary English speakers -- credible estimates published long ago yield estimates for receptive vocabularies in the range of 40,000 "word families" for typical high-school graduates (corresponding to several times that number of distinct word forms). So I'll guess that what Tony did was to measure the number of different orthographic words (or word forms?) used in a given amount of text by two different groups.

It's well known that the rate of vocabulary display varies with age, socio-economic status, formality, and so on. And it's also well known that rate of vocabulary display is poorly correlated -- sometimes negatively correlated -- with communicative effectiveness. But without knowing what the databases were, and how they were collected, and what kind of analysis was done on them, it's hard to know what the cited disproportion in "vocabulary size" really means, and in particular whether it's a new property of today's British teens, or the same old story about vocabulary display as a function of age and class and context.

In any case, the factoid that makes the biggest impact is is the assertion that "the top 20 words used ... account for around a third of all words".

Thus "Are iPods shrinking the British vocabulary?" , Ars Technica, 12/15/2006, says:

McEnery found that one-third of most teenage speech was made up of only 20 common words like "yeah," "no," and "but." This is problematic for teenagers seeking jobs in the corporate world, where at least some level of professionalism is required when communicating with others.

And Sarah O'Grady, "The teenagers who just can't speak proper" , Daily Express, 12/13/2006:

The most frequent 20 words they speak account for a third of all words used in their conversations, a university study found. And the 10 most popular words are yeah, no, like and but.

[Um, that's 4 words -- where are the other 6? I don't expect journalists to be mathematically literate, but you'd think they could count to ten.]

The Daily Record tells us ("Vicky-speak warning"):

TEENAGERS need lessons in how to speak properly because so many sound like Little Britain's Vicky Pollard, experts say.
A study by Lancaster University linguistics specialist Professor Tony McEnery found teenagers rely on a limited vocabulary, as schools fail to teach verbal communication skills.

And Ruki Sayid, "IT'S LIKE, YEAH, WHAT, YOU KNOW ..AND THAT: 20 words in third of teen talk", Daily Mirror, 12/13/2006:

TEENAGERS use just 20 words for a third of everything they say, research reveals.

And the best, I think, is the satirical take at Anorak:

So it is time for the immigrant to learn how to speaka da Ingleesh. And the good news is that it is well easy, innit.

The Mail sees the work of linguists at Lancaster University. And it notes that while the over-25s use 21,391 words in daily conversation, the teenagers use just 12,682.

This seems impressive, until you realise that no less than 11,216 of those teen words are for chips. The teenage vocabulary, to which any immigrant should aspire, is pared down to 20 key words.

Now, I'm sure that Britain's teens would benefit from additional vocabulary instruction. But (as Arnold Zwicky pointed out a couple of days ago), the assertion that they "use just 20 words for a third of everything they say" is a spectacularly lousy argument for this conclusion.

Here's why. The Zipf's-law distribution of words, whether in speech or in writing, whether produced by teens or the elderly or anyone in between, means that the commonest few words will account for a substantial fraction of the total number of word-uses. And in modern English, the fraction accounted for by the commonest 20 orthographical word-forms is in the range of 25-40%, with the 33% claimed for the British teens being towards the low side of the observed range.

For example, in the Switchboard corpus -- about 3 million words of conversational English collected from mostly middle-aged Americans in 1990-91 -- the top 20 words account for 38% of all word-uses. In the Brown corpus, about a million words of all sorts of English texts collected in 1960, the top 20 words account for 32.5% of all word-uses. In a collection of around 120 million words from the Wall Street Journal in the years around 1990, the commonest 20 words account for 27.5% of all word-uses.

And in Tony McEnery's autobiographical sketch, the commonest 20 words account for 426 of 1190 word tokens, or 35.8% . . .

In fact, Tony used 521 distinct words in composing his 1190-word "Abstract of a bad autobiography"; and it only takes the 16 commonest ones to account for a third of what he wrote. News flash: "COMPUTATIONAL LINGUIST uses just 16 words for a third of everything he says." Does this mean that Tony is in even more dire need of vocabulary improvement than Britain's teens are?

I doubt it. In comparison, the first chapter of Huckleberry Finn amounts to 1435 words, of which 439 are distinct -- so that Tony displayed his vocabulary at a substantially faster rate than Huck did. And Huck's commonest 20 words account for 587 of his first 1435 word-uses, or 40.9%. So Tony beats Huck, by a substantial margin, on both of the measures cited in the BBC story. (And just the 12 commonest words account for a third of Huck's first chapter: and, I, the, a, was, to it, she, me, that, in, and all.) We'll leave it for history to decide whose autobiography is communicatively more effective.

The BBC article ends this way:

"When things are funny it is because they ring true with people," said Prof McEnery who conducted the research for retailer Tesco. [...]

Tesco, which commissioned the report, said it was responding by launching a scheme which allows all UK comprehensive schools to interact and communicate with other schools around the country using its internet phone technology.

So once more, we seem to have an unpublished study commissioned by a company that is using it to sell something, and is publicizing it using a striking but meaningless -- or actively misleading -- quantitative assertion. "Reading email and answering a cell phone reduces IQ by 10 points, compared to four for a joint"; "Teenagers use just 20 words for a third of what they say".

OK, right. So let's see, we'll improve the vocabulary of British teens by wiring them up for easier internet cell-phone access. And we'll also make sure they get plenty of cannabis. No, wait, um, I'm confused. Too much science journalism, do you see; research shows that it eliminates logical thought in favor of knee-jerk associations between press releases and popular culture. I wish I could give it up, but the bastards have got me hooked.

In particular, the main source of information here is the BBC, and only a fool would trust what the BBC prints about scientific topics. I wrote to Tony McEnery on December 12, shortly after the BBC article came out. I haven't heard from him yet, but when I do -- especially if he's able to give me some documentation of the cited research -- I'll update this commentary.

[See here for Tony's response... Indeed, as I suspected, the media reports substantially distorted what he had to say.]

[Anatol Stefanowitsch writes:

Your remarks on the BBC's claims about the poor verbal skills of British teenagers and Geoff Pullum's replication of the original study using the BBC article itself would not be complete, I feel, without taking a look at the undisputed master of the English language, the great Bard himself.

Surely William Shakespeare's verbal skills must have exceeded those of "UK's Vicky Pollards"?

Alas, no. At least not by much. The Comedy of Errors, for example, consists of 16,298 word-form tokens (here and below I use the files
provided by the Project Gutenberg with the header material removed). The top twenty words account for 5,578 tokens, i.e. 34.2 per cent. What is
more, the Bard's creativity seems to have been overestimated by scholars of literature: the top twenty are very bland words, such as of, I, and, the, to and you!

OK, but the Comedy of Errors is, well, a comedy. Perhaps the great tragedies will show more clearly the linguistic genius of the greatest poet of the English language?

Only by a very narrow margin. Hamlet consist of 32,040 word-form tokens. The top twenty account for 9,937 tokens, i.e. 31%.

I look forward to the reappraisal of the entire history of British literature that is sure to follow these discoveries -- one that measures the literary worth of authors by the degree to which their writing deviates from Zipf's laws.

Actually, all of these texts probably obey a form of Zipf's law about equally well -- the difference would be the parameters of the word-frequency distribution, not the basic type of distribution. As Cosma Shalizi is fond of reminding people, things that look like a power-law (Zipfian) distribution are often really log-normal; but it seems that for words, power-law distributions really are more predictive than log-normal distributions. In any case, we would be comparing Zipfian parameters, not deviations from Zipf's predictions.

By the way -- I started to compile a corpus of Vicky Pollard transcripts, since her speech (what I can understand of it) seems quite lexically inventive, and in fact is likely to compare favorably with the BBC (and perhaps with Shakespeare) in that respect. The problem is, there's about a third of what she says that I can't figure out. So if someone more familiar with her dialect will provide some transcriptions, I'll gladly do the statistics.]

Posted by Mark Liberman at 07:22 AM

Intonation contours and polonium poisoning

As Language Log has repeatedly pointed out, journalists simply don't have the vocabulary to talk about linguistic phenomena, but they just go ahead and write about it anyway, ignoring the need for the requisite concepts and terminology. The New Yorker this week (December 18, 2006; 60-69) has a feature article about an Australian social scientist, David Kilcullen, who has studied counterinsurgency warfare. He speaks, as many younger people do these days, in a way that often uses rising intonational contour on declarative clauses, a habit that often suggests to people that the speaker is seeking confirmation that the hearer has understood what he is saying. The writer of the article, George Packer, wants to describe this style of speech (which has been discussed carefully and critically a number of times on Language Log under the heading uptalk), but simply doesn't have the (very simple) technical vocabulary; so what we get is this claim about Kilcullen:

He has a talent for making everything sound like common sense by turning disturbing explanations into brisk, cheerful questions: "America is very, very good at big, short conventional wars? It's not very good at small, long wars? But it's even worse at big, long wars? And that's what we've got."

But Kilcullen is not using questions here. "Question" is a semantic term. Questions are defined by the property of having corresponding sets of statements expressing their sets of possible answers. For Are you cold? there are two statements in the answer set (they could be expressed as Yes, I'm cold and No, I'm not cold; don't confuse answers with responses, by the way: your response might be "As if you cared!", but that would not be an answer). For Who was your favorite Beatle? the answer set includes at least four fully appropriate statements (more when you count silly answers; it is no part of the semantics of questions to rule out silly or misguided answers, like "Eric Clapton", even though they cannot be correct). Notice, by the way, that questions do not all have rising intonation: many of them have falling intonation (Who was your favorite Beatle? normally would).

The sentences with question marks on the end in the above quote are not questions at all. They are statements. The question marks on the ends are supposed to indicate to us (by a reasonable compromise convention) that Kilcullen made them using the rising "uptalk" intonation. It is well known to be common among Australian speakers. But that doesn't mean he makes things "sound like common sense by turning ... explanations into ... questions." Is America good at big, short conventional wars? would be a question, and it would not sound like an explanation of anything.

Is it so hard to grasp this? I don't think so. It seems less complex than basic polonium chemistry, for example. Yet it seems to me that if George Packer were writing about poisoning by polonium-210, he would have headed for a reference source. He wouldn't just have blundered about. Nor would he have used 19th-century terms (as people so often do with language), talking about Litvinenko having been fed a noxious earth, or a poisonous herb. He would have found out what polonium is and what is known about it and what the right terminology is: it's an element, a metal; it's extremely rare in the terrestrial environment; the atomic number is 84; it's radioactive, and it decays by alpha particle emission; it's 250 billion times as poisonous (weight for weight) as hydrocyanic acid; and so on. He would have checked this because it is important and chemists will call you on it and you don't want to get stuff wrong.

But when the topic is language, nobody looks anything up or calls anyone. Calling a declarative clause with rising intonation a "question" is like calling polonium "thallium". Yet Packer feels he doesn't need to spend an extra minute or two to get the vocabulary right. When you're talking about language, it seems, you don't need to do that. No one will challenge anything. Except perhaps on Language Log, and that's just the bloggers in their pajamas.

Posted by Geoffrey K. Pullum at 12:03 AM

December 15, 2006

"The Way Forward": A New Platitude Swims Into Our Ken

Shortly after the election, I wrote an op-ed in the Los Angeles Times suggesting that the Democrats' victories signaled the final unraveling of the administration's predilection for encapsulating its policies and positions in snappy catchphrases.

The breakdown was most recently evident in October, when the White House announced that its "stay the course" slogan was inoperative ("We've never been stay the course," Bush said a bit rashly, providing the news shows with an irresistible setup for the series of clips that showed him saying exactly the opposite.) But then in retrospect, you could write the whole history of the administration's efforts at impression management as a string of linguistic miscues and slogan recalls.

Just six months after the 9/11 attacks, for example, Bush's insistence that his administration was focused on getting Osama bin Laden "dead or alive" had morphed into "I don't know where he is. . .. I truly am not that concerned about him." That was followed by the singularly ill-advised "Mission Accomplished" (With the wisdom of hindsight, the White House must wish that it had gone with something more noncommittal, like "Way to Go.") And so on; as I put this in the earlier piece:

"Axis of Evil," "War on Terror," "cakewalk," "Freedom is untidy," "Bring 'em on," "When they stand up, we'll stand down" -- the more pithily memorable the catchphrases were, the more they came back to haunt the administration when their disconnect from reality grew too obvious to ignore.

Now, only slightly daunted, the administration is back with "a new way forward," appropriated from the title of the Iraq Study Group report. Bush has been using the phrase incessantly, echoed by Condi Rice and Tony Blair (it came up 13 times in Bush and Blair's press conference last week).

You can understand the appeal of the phrase. After all, "forward" sits cheek-and-jowl with "progress" and implies a contrast with "retreat," which is how Bush has taken to pronouncing "cut and run" these days. But as slogans go, it's a pretty wan and unconvincing sequel to "mission accomplished" and "stay the course": it sounds like a tagline an ad agency would come up with for a railroad trying to emerge from Chapter 11. And its very vagueness underscores what most people have already concluded, that Bush really has no idea where he's heading. Indeed, the phrase already pregnant with the ironies it's sure to evoke later on. Because, let's face it, the last time this administration had a clear idea which direction "forward" lay was the day before the troops entered Baghdad.

Posted by Geoff Nunberg at 07:34 PM

Nugetre

One small footnote to Geoff Pullum's fond remembrance of Ahmet Ertegun. As Geoff mentioned, Ertegun was also a songwriter, receiving credits for such hit songs as "Chains Of Love" and "Sweet Sixteen" by Big Joe Turner and "Mess Around" by Ray Charles. (One of the high points of the movie Ray is the scene in which Ertegun, played by Curtis Armstrong, presents "Mess Around" for Ray's approval.) But oftentimes his songwriting credits didn't read "Ertegun" — instead, he used the mysterious pen name "Nugetre." It's surely one of the great ananyms of all time.

Posted by Benjamin Zimmer at 09:30 AM

More on Pinochet

I got an extraordinary number of comments on my post on the Chilean Spanish pronunciation of Pinochet. (For me, anyway; the senior writers here at Language Log seem to get at least this many comments on a daily basis.) I've been responding to most of the comments in the comments area itself, but I think it's worth compiling some of it and following up on the original post with some more details. If you're interested, those details can be found over on phonoloblog.

Posted by Eric Bakovic at 01:42 AM

Ahmet Ertegun

I just learned the sad news that Ahmet Ertegun is dead. Is there a linguistic angle, one that could make it a Language Log topic? Oddly, there is. Ertegun was the son of the legal counselor to the man who gave Turkish its excellent writing system. His father, later a distinguished diplomat, started out giving legal advice to Kemal Atatürk, who founded modern Turkey. On November 1, 1928, Atatürk ditched the Arabic system that the Ottomans had made do with, and introduced overnight by edict a much better one based on the Latin alphabet. General literacy went up from 20% to over 90%.

However, this indirect family connection to orthography reform is not the reason I shed a quiet tear in Ertegun's memory.

What was important about Ahmet Ertegun, I think, was the way his life reminded us that language, culture, race, and religion are distinct parameters of humanity, and can be transcended: they enrich us, they do not trap us or divide us. Ertegun was not an African American, but he fell in love with African American music in the 1940s (just as I did later, far away, as a white boy at a high school in England), and he founded Atlantic Records to immortalize it in recordings. He was a Turkish-born, Turkish-speaking Muslim who worked for years alongside American Jews like business partner Herb Abramson and producer Jerry Wexler, producing records with non-Muslim non-Jewish singers and musicians. He understood the music he recorded, and valued it, and loved it from when he first heard it at the age of 9. His record company (started in 1947 on a $10,000 loan from the family dentist) made the careers of many African American jazz, R&B, and rock musicians.

And white musicians too: Ahmet Ertegun didn't think good rhythm and blues was restricted to one race (even though it was called "race music" in the trade weeklies when he was starting his business). It's a cultural thing, not a racial thing. Ertegun recorded the legendary genius Ray Charles, but he recorded Bobby Darin too. (I guess I was never really a Bobby Darin fan, but any unbiased judge would have to say that some of Darin's early records rock with the best of them. My favorite of his performances on Atlantic: the little-known B-side Bullmoose, a classic piece of black-style piano-pounding rhythm and blues.)

At 21, Ertegun was in graduate school at Georgetown University, studying medieval philosophy — heck, that's just about as arcane as theoretical linguistics — but he was spending hours each day in a a rhythm and blues record shop in a black district of Washington DC, and hours at the Howard Theater or various jazz and blues clubs each night. In the end he decided not to become an academic but to devote his life to the music he loved, and went into the record industry.

I went the other way. After five years playing soul music in groups such as the Ram Jam Band in the late 1960s, I decided that being on the road as a musician was a tedious way to make a living, and went back to higher education, and into academia. But my life was forever enriched by the music that Ahmet Ertegun discovered and promoted and recorded and wrote (he has a songwriting credit on Chains Of Love and other songs) and even sang (he's one of the backup singers on Joe Turner's original Shake, Rattle and Roll). On October 29 he was backstage at the Beacon Theater in Manhattan for a Stones concert for Bill Clinton's 60th birthday (where else to be on such a night) when he had a fall and suffered a brain injury from which he didn't recover. But the music he and Atlantic Records brought into my life will be with me forever.

Posted by Geoffrey K. Pullum at 12:15 AM

December 14, 2006

Eggcorn alarm from 2004

While searching for a study reported in BBC News about teenagers and the paucity of their vocabulary, I came across the National Literacy Trust site, which has a database of media items on language-related topics -- many of them, alas, old acquaintances that we have already savaged in Language Log. Among them was a somewhat fevered story from the Telegraph of 6/8/04 about the "mass dyslexia" induced by... eggcorns.

What set me searching was this very brief piece on BBC News (with a quiz you can take):

Teenagers use just 20 words for a third of their speech, according to a new study by Lancaster University.

And their vocabulary is peppered with slang that can seem - to those old enough to vote and to drink legally - incomprehensible.

Test yourself on the slang used by the teens in the study.

You don't need to write us about this story. We're on it, and we'll report on it when we have some actual details.

It did occur to me that this could merely be a report on the frequency of the most frequent words in English, in general. If you look at the Brown Corpus word frequencies and add up the corpus percentages for the top 20 words (listed below), they account for 31% of the words in the corpus. But that would be ridiculous, and it wouldn't distinguish teenagers from the rest of us, so what would be the point?

The Brown top 20: the, of, and, to, a, in, that, is, was, he, for, it, with, as, his, on, be, at, by, I

While I was thrashing about on the net, i came across the National Literacy Trust and this piece from 2004, which I'll divide into two pieces:

Howlers of modern English usage

As part of compiling the 11th edition of the Concise Oxford Dictionary, researchers discovered an increasing confusion over simple words and phrases. One in five believes we should 'tow' the line instead of 'toe' the line. A further 10% 'pour' over a book when we should 'pore' over it. The alarming increase in 'mass dyslexia' was picked up by the dictionary's 100 researchers worldwide on the look out for new words. As they searched the Oxford English Corpus, a database of 400 million written words, they discovered that while spelling remained reasonably strong, more and more writers were mixing up like-sounding words and phrases. 'They are not so much spelling the words wrongly, as using the wrong words,' Angus Stevenson, the dictionary's co-editor said.

Yes, they're eggcorns, famous ones, and only 8 months before these errors got their name, right here on Language Log.

I'm suspicious about the things-are-getting-worse tone of the piece, though: "increasing confusion", "alarming increase", "more and more writers". Did the COD researchers actually have comparative data showing that more people were writing "tow the line" and "pour over a book" when the COD11 data were collected than they did, say, twenty years before? (It's not particularly hard to find such things from fifty years ago, though I have no idea of their frequencies then.) Or were they merely alarmed at the 20% figure for toe >> tow and the 10% figure for pore >> pour? I wouldn't be at all surprised if the percentages for these items were rising, since (like other frequent eggcorns) they make a lot of sense to people and so would be inclined to spread -- but is there any evidence on the question?

Now, the rest of the story:

His team believes that the chief explanation was the use of the computer spell check, which does not spot errors of meaning. He also thought that the explosion of the numbers of people writing, mainly due to the internet, meant that more errors were bound to creep in. Whether such mistakes will, in time, spill over into more formal types of writing is yet to be seen. The question is: does it matter if in a generation's time people are writing about 'pouring over magazines' or 'towing the line'?

The findings come after a campaign, backed by Bill Cosby, the American actor, began to stop British children from speaking patois in class. A south London school is piloting a scheme to ban slang, often based on the creole spoken in the West Indies, because it is thought to contribute to the educational failure of black pupils.

(Telegraph, 8 June 2004)

The explanation in terms of spell checking programs isn't entirely clear to me. The assumption seems to be that people used to learn to spell the hard way, by endless drill and correction by teachers and editors, until they got it right, but now they rely on spell checkers to fix things for them, so that errors like toe >> tow no longer get caught. This isn't entirely implausible, but is it true? Did students used to get a lot of correction on such items that would lead them to change their practice? Do they not get that now?

The question of whether, ceteris paribus, spelling is getting worse is not an easy one to answer. There are a number of corrections that have to be made before things are roughly equal -- to take into account the opening up of higher education to a much wider segment of the population after World War II, for example, and to adjust for the much wider appearance of informal writing in public in recent years, on the net in particular. The materials that are being compared have to be similar in character.

There's certainly a widespread PERCEPTION that things are sliding downhill. But that doesn't make it so.

In any case, as the Telegraph piece asks, does it really matter if toe >> tow and pore >> pour go most of the way to completion? The expressive resources of the language will not have narrowed, and people will still understand one another (so long as they're behaving cooperatively). Things change in a tiny way. That's no Sign of the Apocalypse, just business as usual.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:24 PM

I've heard some ideas that would lead to defeat

Under what conditions is it true that you have heard some ideas that would lead to defeat? Yesterday, on December 13 an AP reporter asked President Bush: "You've been gathering advice, as you said, from leaders here and from leaders in Iraq. As you've gone through that extensive process, have you heard any new ideas at all, anything that would change your thinking?" And in the opening lines of his response Bush said something that I strongly suspect is a flat-out lie:

I've heard some ideas that would lead to defeat, and I reject those ideas — ideas such as leaving before the job is done; ideas such as not helping this government take the necessary and hard steps to be able to do its job. [White House transcript here]

The President is making statements here (if he has understood the question) about linguistic acts having taken place: about things people have actually said to him in recent meetings in response to a request for new ideas. Picture a situation that would make his claim true: Bush asks some general or legislator what should be done, and the answer is, "Mr President, I think leaving before the job is done would be the best course"; or someone says, "Well, sir, in my opinion it is time to move to a policy of not helping this government take the necessary and hard steps to be able to do its job." Who has said these things? Who has said anything even remotely similar to them under extreme stretching of the notion of paraphrase?

Nobody in the public sphere has been saying anything of this sort. The report of the Iraq Study Group does not say such things. Even the most extreme of Bush's critics — The Nation, or the "Talk of the Town" pieces by Hendrik Hertzberg in The New Yorker — say anything even vaguely reminiscent of these things. I'll admit I was wrong immediately if anyone can produce evidence of someone in some meeting with Bush proposing that we leave "before the job is done", or declining to help the either the US or the Iraqi government "take necessary and hard steps", but I don't think anyone can. I think Bush just made this stuff up as a way of reiterating some things he likes to insist he will not do. He simply made the decision to pretend that people have been saying this (this cartoon hits the nail on the head) — to lie to us about what he has been hearing behind closed doors.

He has a right to say what he does not plan to do. But he does not have the right to put those claims in imaginary people's mouths before rebutting them. That is not just a figure of speech. It's lying. And it is not the first time he has told a lie of exactly the same sort. He did the same thing in January 2003, and his advisor Condoleezza Rice tell a very similar thing right afterward, as I pointed out in the first Language Log post I ever wrote. Do I really have to point out that lies about linguistic behavior, concerning what people have said, are just as untruthful as lies about anything else?

[Thanks to Heidi Harley for pinning up the cartoon in the Senior Writers' Lounge.]

Posted by Geoffrey K. Pullum at 07:13 PM

The sad task of headline writers

Geoff Pullum presents yet another wonderful ambiguous headline: "Leahy wants FBI to help corrupt Iraqi police force". We'll never be at a loss for such examples, given the nature of headlines.

So, as Geoff said, we don't really need to invent ambiguous sentences, like the celebrated We saw her duck, which he attributes to Jerry Sadock and me; you can find them all over. Just to keep the record straight, Jerry and I didn't invent that one; we took it, with attribution.

Second things first. We saw her duck comes from Zwicky & Sadock, "Ambiguity tests and how to fail them", in a 1972 Syntax and Semantics volume edited by John Kimball. In which we say that we got the example from Dennis Stampe (not linguist David Stampe, but his brother, philosopher Dennis Stampe).

By the way, sharp-eyed readers will have noted that there are more than two readings for this sentence, involving the present tense of the verb saw 'use a saw on' and/or the noun duck 'kind of cloth'.

The larger point is that it's almost impossible to write headlines without occasionally producing examples that are laughably ambiguous (or that induce garden-pathing). Two reasons for this: headlines lack many indicators of structure (and hence interpretation), which have been suppressed for brevity; and they come at the top of the story, where they have no linguistic context (and, often, you need a lot of cultural context as well). I once spent some time writing newspaper headlines, and it was a hell of a task (though interestingly challenging).

Of course, we here at LLP, LLP, are as prone as anyone to enjoy the headlines that run awry -- ambiguous, garden-pathic, or just puzzling headlines. Here are a few from my collection that I don't think I've posted about here. (Sources on request.)

Truck leads police to molest suspect

City ducks vote on transportation fee

Horse attacks trigger debate

Warnings on river, lake fish jump

Associate of Black savages report,
as media feeds on tycoon's fall

DNA leads police
to rape suspect

Mammoth Remains Unearthed

China Cabinet Orders a Drive Against Inflation

Reading X-Rays In Asbestos Suits Enriched Doctor

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:22 PM

Native languages of United Nations Secretaries-General

A new Secretary-General of the United Nations was sworn in today. The native languages spoken by the holders of this post up to the present day have been:

Norwegian	(Trygve Lie, Norway, 1946-1952)
Swedish	(Dag Hammarskjöld, Sweden, 1953-1961)
Burmese	(U Thant, Burma, 1961-1971)
German	(Kurt Waldheim, Austria, 1972-1981)
Spanish	(Javier Pérez de Cuéllar, Peru, 1982-1991)
Egyptian Arabic	(Boutros Boutros-Ghali, Egypt, 1992-1996)
Akan	(Kofi Annan, Ghana, 1997-2006)
Korean	(Ban Ki-Moon, Republic of Korea, 2007--)

Amidst all the talk of English as a global language that is wiping all other languages out, and notwithstanding the obvious fact that every Secretary-General has to give speeches in English, and granting that Kofi Annan's English is of near-native quality, it might be worth noting that we have here a top position in a US-headquartered organization that no native speaker of English has ever held, not even for a day.

Update: Of course, there are some further generalizations that put this fact in context, as several people have written to me to point out. One is that in the Cold War days, it was understood that the Secretary-General had to be from a country not aligned with either the Soviet Union or the USA and Britain, and that didn't really leave an English-speaking (or Russian-speaking) country for a Secretary-General to be from. And the other is that no national language of one of the permanent members of the UN Security Council is on the list above.

Posted by Geoffrey K. Pullum at 05:13 PM

Help corrupt Iraqi police

When illustrating structural ambiguity, linguists usually reach for the classic example Arnold Zwicky and Jerry Sadock gave: We saw her duck. Nice and short. The ambiguity of the syntactic structure stems from the interaction of at least two facts: (1) duck has a double life, as both a noun ("member of a bird species in the Anatidae family") and a verb ("quickly lower one's head as if to hide or avoid a missile"), and (2) her is both the genitive form and the accusative form of the 3rd person singular feminine pronoun. But it is hardly necessary to invent examples as long as CNN goes on publishing headlines like the one a correspondent just pointed out to me (thanks, Kelly): Leahy wants FBI to help corrupt Iraqi police force. Here the ambiguity arises from corrupt being both an adjective and a verb. There's a long tradition of ambiguous headlines of this sort in the newspaper business,of course, and we linguists are grateful for all of them. The most famous example is probably the one that became the title of the 1980 book Squad Helps Dog Bite Victim, and Other Flubs from the Nation's Press.

Posted by Geoffrey K. Pullum at 02:33 PM

Flacks and hacks and Hitchens

In the January 2007 Vanity Fair, Christopher Hitchens has taken up the fashion for biologistic accounts of sex differences ("Why women aren't funny"):

The chief task in life that a man has to perform is that of impressing the opposite sex, and Mother Nature ... equips many fellows with very little armament for the struggle. An average man has just one, outside chance: he had better be able to make the lady laugh. [...]

Women have no corresponding need to appeal to men in this way. They already appeal to men, if you catch my drift. Indeed, we now have all the joy of a scientific study, which illuminates the difference. ... To annex for a moment the fall-about language of the report as it was summarized in Biotech Week:

[...] "Women appeared to have less expectation of a reward, which in this case was the punch line of the cartoon," said the report's author, Dr. Allan Reiss. "So when they got to the joke's punch line, they were more pleased about it." The report also found that "women were quicker at identifying material they considered unfunny."

Slower to get it, more pleased when they do, and swift to locate the unfunny—for this we need the Stanford University School of Medicine?

The Biotech Week article that Hitchens quotes is essentially a reprint of a Stanford University press release, "Gender differences are a laughing matter, Stanford brain study shows". And Hitchens' response to the article is classic: he sees the study as a wasteful confirmation of the obvious, telling us what everyone already knows about men and women.

The only trouble is, somewhere along the way from the researchers to the flacks to Hitchens, the message has gone wrong. Those things that Hitchens is so sure are obvious to everyone? Well, let's take a quick look.

The study in question is Eiman Azim, Dean Mobbs, Booil Jo, Vinod Menon, and Allan L. Reiss, "Sex differences in brain activation elicited by humor", PNAS 102 16496-16501, 2005.

Hitchens: "[women are] slower to get it, more pleased when they do, and swift to locate the unfunny".
Azim et al. 2005: "We found no between-sex differences in the number of stimuli found funny [t(17.531) = -0.029, P < 0.977], the subjective degree of funniness [t(17) = 0.895, P < 0.383], or the response time (RT) to funny [t(17.99) = 0.20, P < 0.944] or unfunny [t(16.22) = -0.769, P < 0.453] stimuli."

How in the world could Christopher Hitchens, who is a smart person, have made such a dumb mistake?

The short answer is that instead of reading the article, he trusted flacks and journalists.And then he added a bit of his own stereotype-driven misinterpretation of the flackery and hackery.

He's not the only one. The rest of the media response to this work -- which came out a year ago, it's not exactly a news flash at this point -- was the all-too-familiar steaming pile of falsehood and irrelevance. Some pieces, like Hitchens', asserted things about the study that are directly falsified by its conclusions. Others asserted things that the study didn't deal with at all: thus Anne Casselbaum wrote in Discover Magazine "Women Don't Understand (why Adam Sandler is funny)"

Stanford University humor researcher Allan Reiss has a reassuring insight for all the men whose girlfriends and wives roll their eyes at Adam Sandler movies: Women really do enjoy a good laugh as much as you do; they are just wired to focus on different aspects of humor.

In the study Reiss et al. did, the women found the same cartoons funny to the same degree as the men. But the study wasn't in any way designed to find the (no doubt genuine) differences among its subjects's various senses of humor, and in particular it shed no light whatsoever on the distribution of Adam Sandler's appeal by sex.

The Stanford study did find some sex differences in its sample, which I'll write about another time. These differences are interesting in themselves, though hard to interpret, and their connection to the PR (and to Hitchens' description of the PR) is fascinating. For now, though, I want to focus on two curious and completely characteristic facts about this study and its media uptake.

The first thing is that journalists, as a group, misdescribed the study's findings. Big surprise. It's pretty clear that none of those whose reports I've scanned actually read the paper with a critical eye, relying instead on the press release and on other journalists' stories. Stanford's PR department -- and Dr. Reiss himself -- appear to deserve a share of the blame for the resulting misrepresentations. But it's strange that a science-oriented publication like Discover would employ a writer who doesn't bother to look past the surface. And it's striking that Hitchens, who would never take a politician's press release at face value, is so completely uninterested in the facts of the science that he chooses to cite. This reinforces my conclusion that in today's public discourse, science is treated not as a search for the truth, but as source of edifying fables.

The second fact is more subtle and more pervasive. It's not just a fact about PR departments and journalists, it's embedded in the way that most psychologists and nearly all neuroscientists think. Read these two selections from the Azim et al. study, and think about them for a minute:

From the Materials and Methods section: "We scanned 20 healthy subjects (mean age, 22 years ± 1.9; 10 females)."

From the Conclusions section: "Males and females share an extensive humor-response strategy as indicated by recruitment of similar brain regions: both activate the temporal-occipital junction and temporal pole, structures implicated in semantic knowledge and juxtaposition, and the inferior frontal gyrus, likely to be involved in language processing. Females, however, activate the left prefrontal cortex more than males .... Females also exhibit greater activation of mesolimbic regions, including the nucleus accumbens ... "

So the subjects in this study were 10 males and 10 females, average age 22, recruited at Stanford Medical School. Presumably they were medical students, or grad students, or pre-med students. Anyhow, they were all between 20 and 24 years old, and they were all at Stanford.

But the paper's conclusions aren't about how Stanford med students' brains work. Instead, the conclusions are about what "males and females share" and what "females ... activate .. more than males" and so on.

This is the way that the Stanford researchers and the Stanford publicists talk about their findings, both in the research paper and in the press releases. And this spin is adopted, entirely uncritically, not only by Hitchens but also by all the news reports that I've seen on this work.

For example, Alok Jha, "Why females laugh longer at punchlines", The Guardian 11/8/2005, whose lede was:

Women find the punchlines of jokes more satisfying than men do, according to a study by scientists. They also use more of their brains to appreciate humour in the first place.

And Randolph E. Schmid, "Women May Enjoy Humor More, If It's Funny", The Associated Press, Monday, November 7, 2005, 6:39 PM (reprinted in the Washington Post), who began his article by writing

The difference between the sexes has long been a rich source of humor. Now it turns out, humor is one of the differences.

and who wrote throughout about "how the male and female brains react to humor", "how men and women process humor", and so on.

Would anyone accept a characterization of Americans' political or religious opinions, or their product preferences, based on a sample of 10 first-year Stanford medical students? Would a newspaper try to predict a national election from such a sample? Would a network executive rely solely on such a sample in estimating the response to a new comedy show? You'd have to unusually stupid or gullible to believe predictions about the American population at large - much less the human species at large -- that are based on ten 20-somethings enrolled at one first-rank American medical school at some point in 2003 or 2004.

So why are Dr. Reiss and his colleagues willing to treat such a sample as acccurately characterizing the nature of the brain responses to humor of human females and human males, taken as a whole? And why do science correspondents throughout the media, and a savvy political journalist like Hitchens, accept this extrapolation as truth, without a hint of skepticism?

There's an implicit assumption here that from the point of view of humor, a brain is a brain -- or rather, a male brain is a male brain, and a female brain is a female brain. Age, education, personality, cultural background, occupation -- none of that matters, and so none of that needs to be controlled for. We neuroscientists don't need no demographically balanced samples, we're measuring brains. Determining men and women's responses to humor is treated like determining the melting points of bismuth and antimony -- all you need to do is to measure a pure enough sample. There's some residual recognition that statistical variation needs to be averaged out, which is why N=10 rather than 1. But if you think of the enormous variation in either sex's sense of humor -- surely as richly varied as their attitudes towards politics or shampoo -- the assumption of sexual uniformity seems very strange.

The funny thing is, the very same authors, in the very same issue of PNAS, disprove this assumption -- based on a different experiment with the very same cartoons. The article is Dean Mobbs, Cindy C. Hagan, Eiman Azim, Vinod Menon, and Allan L. Reiss, "Personality predicts activity in reward and emotional regions associated with humor", PNAS 102 16502-16506, 2005. From the abstract:

Our analysis showed extroversion to positively correlate with humor-driven blood oxygenation level-dependent signal in discrete regions of the right orbital frontal cortex, ventrolateral prefrontal cortex, and bilateral temporal cortices. Introversion correlated with increased activation in several regions, most prominently the bilateral amygdala. Although neuroticism did not positively correlate with any whole-brain activation, emotional stability (i.e., the inverse of neuroticism) correlated with increased activation in the mesocortical-mesolimbic reward circuitry encompassing the right orbital frontal cortex, caudate, and nucleus accumbens. Our findings tie together existing neurobiological studies of humor appreciation and are compatible with the notion that personality style plays a fundamental role in the neurobiological systems subserving humor appreciation.

There's a striking amount of overlap between the brain areas whose responses to this particular set of cartoons varied with personality measures, and those whose responses varied with sex. That could be because sex and personality measures co-vary, either in general (as they surely do) or specifically in Stanford medical students who volunteer for fMRI studies (where the distribution of personality by sex might well be an atypical one). There may also be other factors, having nothing specifically to do with the interaction of humor, sex and personality, which affect the response to these cartoons so as to create what seem to be effects of sex and personality on the neurological processing of jokes.

Perhaps we'll see future studies showing characteristically different brain responses to cartoons as a function of socioeconomic status, educational level, age, political orientation, and ethnic background -- treated in each case as if some fundamental physical constant were being measured. If that happens, you can bet that whatever the measurements are, the news stories will pitch the results in terms of the relevant inventory of social stereotypes.

[I should add that in cases like this, neuroscientists are motivated to over-generalize by self-interest as well as by their discipline's cultural blindness to social factors. A story about the neurology of Stanford medical students' reaction to cartoons is unlikely to get any media play, but a story about the neurological basis of sex differences in humor is a different animal altogether. And science journalists share a similar sort of motivation, beyond their profession's tradition of assembly-line reworking of press releases: their editors will (I'm guessing) prominently display a story about the neuroscience of sex differences in humor appreciation, but bury or not run at all a story about marginal sex-linked differences in the brains of a few medical students watching cartoons.]

Posted by Mark Liberman at 07:25 AM

December 13, 2006

One way to get a word in the dictionary

On his Comedy Central show last night (video here), Stephen Colbert triumphantly announced that "truthiness" has been selected by Merriam-Webster as their 2006 Word of the Year. The very same "wordanistas" he derided when he introduced "truthiness" on the first "Colbert Report" in October 2005 were now honoring his contribution to the lexicon. Colbert conveniently neglected to mention that the selection was made not by Merriam-Webster lexicographers but rather online voters visiting m-w.com (clearly including many legions of Colbert fanatics). So he took the WOTY distinction to mean that the wordanistas had changed their tune, finally recognizing "truthiness" as a real word. That allowed him to register mock outrage when he discovered that "truthiness" is not actually in the latest edition of Merriam-Webster's Collegiate Dictionary. "Apparently the definazis over at Webster's don't know the meaning of the word 'word,'" Colbert bellowed. But he offered his own guerrilla tactic for rectifying this oversight.

Colbert directed his loyal fans to go to ColbertNation.com and download an image of page 1344 of the Collegiate Dictionary revised to make room for "truthiness." (Something had to go: the entry for "try" was removed from the page. "Sorry, 'try,' maybe you should have tried a little harder," Colbert taunted.) Then he instructed viewers how to paste the "corrected" page into their copies of the dictionary, adding, "Now your reference collection accurately reflects my impact on the American language."

I don't know if those Merriam-Webster definazis really will be including "truthiness" in future online and print versions of the Collegiate Dictionary, but I think they'll have to insert an entry of some sort since it was their bright idea to open this year's WOTY selection to online voters. Such is the price of democratic lexicography. But of course, as was noted here on Language Log back in Oct. 2005 (and further discussed in Jan. 2006), "truthiness" has been in the Oxford English Dictionary for quite some time now — the original fascicle with that entry was published around 1915. And predating the OED by a few decades, the Century Dictionary published an entry for "truthiness" around 1890:

Of course, neither the Century nor the OED has the Colbert-esque sense of "truth from the gut" unfettered by facts. We'll have to leave that to 21st-century wordanistas to record for posterity.

Posted by Benjamin Zimmer at 01:34 PM

Should the "owners" of a language be permitted to forbid its use to criticize them?

In a post commenting on the Mapuche lawsuit against Microsoft ("Language as property?", 11/24/2006), I asked:

Here's a question: if the use of a language has to be licensed by the tribal elders, can they withhold this permission from someone who wants to criticize them, or to say something else that they don't approve of?

At the end of a thoughtful and interesting post at Transient Languages and Cultures ("Sovereignty over languages and land", 11/25/2006), Jane Simpson responded:

I'm guessing he's thinking of a group withholding permission from an outsider to use their language to criticise them. In the Australian Indigenous societies I know, people have the unquestioned right to speak the languages accepted as their parents' languages. So "tribal elders" aren't on about licensing kids to speak their own language. But outsiders? Well, I can't see why Indigenous communities couldn't have that right. Just as copyright laws allow a map-maker or a publisher to refuse a critic permission to republish a map. Or trespass laws allow me to prevent a critic from coming onto my land, let alone erecting a billboard on it criticising me (however justifiably).

No, I was thinking of a dictator or a junta (or a democratic government) banishing critics -- declaring them to be no longer members of the "group" -- and then forbidding them to use their language for any further criticism or protest.

For example, could Vladimir Putin (or a government ministry subservient to him) have forbidden Alexander Litvinenko from writing accusatory articles in Russian, on the grounds that he was no longer a proper Russian (due to having emigrated, being a traitor, or whatever)? Could the Russian government now forbid Litvinenko's widow from using the Russian language on similar grounds?

Or if you'd like to remove the issue of group membership -- could the Russian government forbid the BBC from broadcasting or printing unsanctioned articles in Russian?

Come on, some may say, we're not talking about big languages like Russian (118 million speakers), we're talking about little languages like Mapundungun (300,000 speakers). Well, where's the cut-off? What about a medium-sized language like Belarusan (6.7 million) or Rwanda (6.5 million)? Which side of the line is Tigré on (800,000 speakers in Eritrea)?

Assume (contrary to the odds, luckily) that WIPO were to propose that languages should be treated as folkloric information subject to sui generis intellectual property rights, and that the world's governments were to ratify this proposal and integrate it into their legal codes. If you think that such laws wouldn't be used in all sorts of nefarious ways, you've got a very different perception of the uses of power, law and government in today's world than I do.

[Here's another example: Should the official custodians of the various languages of Afghanistan -- whoever they might turn out to be -- be empowered to forbid RAWA from distributing literature in those languages, or broadcasting or distributing audio recordings in those languages? Suppose the literature is printed, shipped and passed out with the assistance of various NGOs in Afghanistan, or the broadcasts are sent out on shortwave stations run by other outsiders. Isn't that very close, in legal terms, to the situation of Mapungun native speakers working with Microsoft to localize software?]

[Update -- Keith Handley writes:

Make the analysis easier by picking a language with something closer to an owner: Can the Star Trek people prevent an outsider from criticizing them in Klingon? Or can they prevent Microsoft from releasing a Klingon localizations of software?

This is an interesting case, but a somewhat different one. It's partly different because there is a single corporate entity that might in some sense be said to "own" Klingon. And if Klingon were some sort of service mark, or something like that, this might be a question that has a real answer, more or less, in current international IPR law.

But whatever the answer, there's another difference that (in my opinion) is more important. There are no monolingual Klingon speakers; there are not even any people who are more comfortable in Klingon than in any other language. So the freedom-of-speech argument has a very different force than it does in the case of a real, living language, where the power to prevent use is a very significant political power. ]

Posted by Mark Liberman at 11:32 AM

December 12, 2006

Army strong reanalyzed

The news that one can pull down $200 million of public money for devising a two-word slogan for a branch of the armed services really caught my attention. ($200 million would be in the general region of two thousand times higher than the average gross annual income of a top linguistics professor or a senior writer at Language Log. And we can devise a dozen good two-word slogans in a three-minute break at the water cooler. Much of the $200m must be the budget for TV ads.) But anyway, having given my full attention to the phrase, I see that my esteemed colleague Roger Shuy has clearly misanalyzed the phrase. "Interestingly, the adjective comes after the noun it modifies," he says, and compares it to noun phrases like money aplenty. No, I don't think so.

The clue is in the "deep male voice saying, ‘There's strong, and then there's Army strong’," which Roger mentions as occurring in the TV recruiting ad: army strong is supposed to be a special subtype of strong. The phrase is actually an adjective phrase, with strong as the head, and Army functions as a modifier. This is an unusual construction, but not unprecedented. One could compare it with stone cold, meaning "cold in the way that stone is cold." There's being strong, and then there's being strong in the way that the Army is strong. I'm pretty sure that's right; and since modifiers in adjective phrases generally precede the head, we do not have any unusual word order in Army strong. We just have the rather unusual circumstance of a phrase in which a noun modifies an adjective. This is in fact the construction that I mentioned in an earlier post which I devoted to the construction Box spaghetti straight.

It has also been pointed out to me that use of this construction in advertising is not exactly original, and certainly not $200m original; for example, the slogan "There's clean. And then there's Chem-Dry clean" has been around for a while (not sure how long).

Posted by Geoffrey K. Pullum at 07:24 PM

Another Brizendinism

Mark Liberman has posted on Deborah Solomon's interview with Louann Brizendine in Sunday's NYT Magazine (p. 22), focusing on her conversion of her earlier claim about differences in words spoken per day between women and men to a claim about "communication events" per day. There's a lot more on comment on in this piece, but I was especially struck by another Brizendinism, another remarkable statistic, in the piece:

If women have superior verbal skills, why have they been subservient to men in almost all societies? Because of pregnancy. Before birth control, in the 1700s and 1800s, middle-class women were pregnant between 17 and 22 times in their lifetimes. All those eons upon eons, while Socrates and all these guys were sitting around thinking up solutions to problems, women were feeding hungry mouths and wiping smelly behinds.

So much for the complex story of relations between women and men throughout history. What I'm going to focus on is the claim about pregnancy rates: between 17 and 22 pregnancies per lifetime? Where does she GET these statistics?

Two side issues... First, it's not entirely clear how "in the 1700s and 1800s" is to be understood in relation to "before birth control": is she focusing on this time period (the 1700s and 1800s) as a period before birth control (most likely), or as the period when birth control became common (which would be suggested by the reference to Socrates, who was definitely well before the 18th century). Second, why the restriction to middle-class women? My guess is that she has some source that addresses middle-class women (presumably in cultures where "middle-class" makes sense as a social category) in the 18th and 19th centuries.

Now, Brizendine has a source problem. She is not herself a scientist -- she does no research of her own (in the Solomon interview, she maintains that she does no clinical research because she objects to placebos, as being cruel) -- but a clinician (she has clients/patients, not subjects), and she gets all her data from what she reads. As Mark has observed, again and again, she relies heavily on pop literature rather than the scientific literature for her statistics. So we are entitled to wonder where the 17-22 pregnancies-per-liftetime figure comes from. It certainly seems very high indeed.

My guess is that this figure is an estimate of how many pregnancies a woman would have during 35 to 40 years of fertility (and sexual activity) if absolutely no steps were taken to limit pregnancies and she herself survived all those childbirths. Neither of these assumptions is realistic; women have always used various means to limit pregnancies, and death in childbirth has been common until fairly recently.

I've mentioned my Swiss great-grandmother who had 14 children (some born dead). Those 14 pregnancies were spread over a 33-year period, so that there was an average of 2.35 years between pregnancies. This happens to be the spacing for a woman with 40 fertile years and 17 pregnancies. Assuming fewer fertile years and more than 17 pregnancies gives smaller spacings, down to 1.59 years for 22 pregnancies in 35 fertile years.

Brizendine's statistics strike me as about as believable as the following datum, which came to me in spam this morning:

Hello chap

I don't care why your member is so small, but 71% of women do. They are pretty sure that bigger Johnson will make their desire stronger.

But if anyone has any idea where Brizendine's statistics came from, or if anyone can cite some actual research on the number of pregnancies over a woman's lifetime before the 20th century, send me mail.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:29 PM

Relevance

In the segment of NPR's "Morning Edition" that lists notable events of the day, today we were told:

Don't forget to spay or neuter: "Price is Right" host Bob Barker is 82 today.

There are two reasons to think that the first part must be intended to be relevant to the second part. However, unless you know more about Bob Barker than that he's the host of "The Price is Right", you'll be at a loss as to how.

First reason to think it's relevant: the two parts are juxtaposed, and the first has nothing obvious to do with the day. So, unless the folks at NPR have become unhinged, the first must have some relevance to the second.

And then there's the prosody, indicated by the colon in my transcription above: the first part ends with a suspension rather than a fall, conveying that the second part is a continuation of the first.

And relevance there is: for years, Bob Barker has ended the game show with an appeal to viewers to spay or neuter their pets. (In 1994 he set up a foundation to fund grants for spay/neuter clinics.) I'm not a game show fan, so I didn't know about Barker's attachment to the cause and was just baffled by the "Morning Edition" announcement.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:31 PM

Slogan strong

It's hard to disagree with the idea that the US Army really needs a new slogan. For some mysterious reason it's become dificult for the military to recruit new soldiers these days. So the Army is spending 200 million dollars a year for a new one. The result is a real zinger -- "Army Strong." Maybe you've already heard it. The televsion ad produced by the New York advertising firm, McCann and Erickson, has a deep male voice saying, "There's strong, and then there's Army strong."

Interestingly, the adjective comes after the noun it modifies. As Huddleston and Pullum point out on pages 560-561 of their Cambridge Grammar of the English Language (here), only "a handful of adjectives are resticted to postpositive function," such as "flowers galore, "the city proper," and "restaurants aplenty," and they all are special in certain ways (read -- better yet buy -- this important book to find out how). But it seems that those who create slogans, like poets and lyricists, can do whatever they want with language, including putting adjective modifiers after nouns. I can recall a country music song sung by Don Williams, "Some Broken Hearts Never Mend," in which the first line is "Coffee black, cigarettes, start this day like all the rest."

Apparently the Army's previous but somewhat unfathomable slogan, "Army of One," just didn't communicate whatever it was that it was trying to communicate and its "Be All You Can Be" seems to have worn out whatever effectiveness it may have had. So the brief "Army Strong" is the current winner. As slogans go, it seems to be laudatory in comparison with the uncomfortable feeling of isolation created by "Army of One" (who wants to fight a war alone anyway?) and by the vague hope offered by "Be All You Can Be" (note the conditional "can" as opposed to the more positive "will" here).

I don't know if the Army has tried to copyright "Army Strong" but if it has, it could run into some problems. For one thing, the US Patent and Trademark Office regulations don't often allow copyright registration for short phrases and slogans. And typically the slogan has to be used in the same manner as the mark. That is, the slogan has to be used to identify the source of the goods or services, as opposed to being merely informational, generic, or laudatory, those characteristics that would make it difficult for consumers to distinguish the product or services from others. For example, after an electric shaver manufacturer came up with the slogan, "Proudly made in the USA," the pattent office wouldn't register it because the slogan didn't identify the product. The same result for Carvel's slogan, "America's Freshest Ice Cream," which didn't distinguish the product's source from that of other ice creams. Most slogans that pass the registration tests can be clearly identified with their product or service to the extent that when consumers hear or see the slogan, they can identify it with the source. It's possible that the army is safe on this one because "Army" is 50% of the entire slogan -- unless consumers confuse it with the Salvation Army or the army of some other country.

"Army Strong" is clearly brief and laudatory and whether it identifies the source of the organization may or may not be up for grabs. On the other hand, Nike's famous slogan, "Just Do It," and Kentucky Fried Chicken's "Finger Lickin' Good," neither of which identify the source of their products, developed what trademark law calls, "secondary meaning," protecting these slogans from use by others. But this usually takes time and heaps of money. Seconday meaning arises when consumers eventually come to identify a trademark or slogan with their products or services. In the now famous case of McDonald's v. Quality Inns International (here), it was the secondary meaning resulting from millions of dollars in promotion and advertising that enabled McDonald's to protect its Mc- prefix to this day. Maybe spending 200 million dollars will do this for the army.

Startlingly, by creating and promoting a two-word slogan, McCann and Erickson collect a hundred million dollars per word ( I know, a lot of this goes into advertising). But it might be nice if linguists could make that kind of money.

Update: Several readers point out that they don't see the slogan as a noun with a following adjective modifier. Using (rather good) examples, such as "Ford Tough," "dog tired," and "butt ugly," they didn't think the word order strange. Maybe they're right. But it still sounds odd to me. And maybe that's what the slogan is supposed to do -- which might make it a good one.

Posted by Roger Shuy at 12:31 PM

Not exactly a retraction

The latest "Wait Wait... Don't Tell Me!" show on NPR began with an update from host Peter Sagal:

We got a lot of mail from scientists objecting to our story last week about a new book on the female brain. The author of that book [Louann Brizendine, for those of you who haven't been following the story here on Language Log] says that women talk three times as much as men do. And further, the author says that men think about sex once every 52 seconds. Our correspondents wrote in to tell us it's all nonsense, there's absolutely no good scientific evidence for any of that. We apologize. We should have known, of course, that that couldn't be true: imagine, a man going a whole 51 seconds without thinking about sex!

And then, on with the show.

Sagal says "we apologize", but manages to spin the story in a way that's likely to reinforce stereotypes about male/female differences.

First, Sagal says that the mail came from scientists, though I'm sure that a fair number of complainants were just ordinary well-informed people (of the sort who read Language Log, for instance). Framing things this way sets things up as a dispute over "scientific truth", and that's unfortunate, because people in general tend to be suspicious of pronouncements from scientists on matters of social concern; scientists are widely viewed as having some kind of narrow "special interest" that prejudices their research on socially relevant questions.

This framing continues with the unpacking of what "it's all nonsense" means: "there's absolutely no good scientific evidence for any of that." I'm certain that many of the complainants went further than that, saying that for the first claim (on verbosity) there is evidence AGAINST it, and that the second (on sexual thoughts) is grossly exaggerated. I also suspect that most of the correspondents just said "evidence" rather than "scientific evidence".

Saying merely that there's no evidence for these claims leaves open the possibility that the claims could be true, just not yet proven. And since the claims are restatements, with exact numbers, from someone presented as an authority, of widely held folk beliefs about male/female differences, a verdict of "not proven" will probably be taken as license for those folk beliefs. The apology in no way threatens those beliefs.

Then there's "no good scientific evidence": what is "scientific" doing in there? There's an implicit contrast between scientific evidence and some other kind of evidence. What other kind could there be?

The evidence of our experience. We know what we see. And what a lot of people see is gabby women and sex-obsessed men. Scientists might not have SCIENTIFIC evidence, but ordinary people have plenty of evidence that (to their minds) confirms the stereotypes about women and men. Stereotypes are, in a sense, "social facts" and so are very hard to confront. (The phrasing that stereotypes are social facts is not original with me, by the way, though I've said this for years in classes. Unfortunately, I don't know where I got it from. And to complicate things further, some writers explicitly contrast "social facts", in the sense of '(scientific) facts about social life', with stereotypes, which are systems of belief about social life.)

Then, of course, Sagal ends with the punch line, suggesting that male/female differences might actually be greater than Brizendine claims. We all know that men think about sex ALL THE TIME, after all.

Not really a retraction, I'm afraid.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:51 AM

Who will write Billy Bymun's story?

For a long time, my spam mailbox has been crowded with messages to one Billy Bymun, sometimes addressed just as Billy (but we always know which Billy is intended). BB, as I like to think of him, seems to be in the market for new and refinanced mortgages, student loans, and elegant gifts at good prices. As far as I can tell, BB has not been offered erection drugs, penis enlargment, Russian women, Nigerian fund transfers, stock tips, or even bargain software. From the e-mail evidence, he's focused on housing and education, needs no spurs to his sex life, and avoids risky investments. A solid sort of fellow, though with a taste for high-end watches.

But there are no Google webhits at all for "Billy Bymun" (there are people named "Billy Bynum", though). He is a man of utter mystery, despite the fact that millions of us see his name every day. I've heard rumors that BB is the son of Betty Crocker, which would be delicious, but we just don't know.

Who will write Billy Bymun's story? Could J. D. Salinger be persuaded to explore the history and psyche of this famous but extraordinarily private man?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:19 AM

Let's call the whole thing off

While we were hanging around the water cooler at Language Log Plaza earlier today, Geoff Pullum told me he'd been listening to this story on NPR's Morning Edition this morning, about former Chilean dictator Augusto Pinochet's death. In the middle of Steve Inskeep's interview with Nathan Crooks (editor of The Santiago Times), Inskeep digresses to note that he and Crooks have been pronouncing Pinochet's name differently, and asks Crooks about it.

(Click here if you'd like to hear this entire short digression; links in the transcript below are to the individual pronunciations, which I represent in what I hope are orthographically-transparent ways.)

Inskeep: Mr. Crooks, I feel obliged to try to get the man's name right here, in death. We've been saying "Pinochet" here in the United States; you're there, in Santiago, Chile, and saying "Pinoshay". How did he say it?

Crooks: You know, I hear it both ways. In Chile, most people will say "Pinoshay", but in English I hear "Pinoshet".

Note that there are two issues here: the pronunciation of orthographic "ch" as "sh", and pronunciation of the final "t". Crooks seems to think Inskeep was only asking about the second of these issues. Geoff, for his part, was more interested in the first issue. Because I'm the resident expert on Spanish dialect pronunciation, Geoff asked me about the pronunciation of this French-derived name in Spanish (as well as the name of the current president of Chile, Michelle Bachelet).

[T]he American in Chile was saying "pinoshay" [as it would be, more or less, in French]. But no dialect of Spanish has "sh", does it? Spanish speakers would be saying "che" or "chet", but not "shay" or "shet".

As it turns out, there are some varieties of Spanish with a "sh"-like sound, and at least one of these is spoken in Chile. The issue about the final "t" also turns out to be complex and interesting.

According to José Ignacio Hualde, in his recent book The Sounds of Spanish (p. 152, emphasis added):

Another important dialectal phenomenon is the deaffrication of /ʧ/; that is, the loss of the occlusive element of the affricate, resulting in the fricative [ʃ] (as in English sheep): muchacho [muʃáʃo]. This lenitive development has been attested in a number of separate areas including parts of Andalusia, Northern Mexico (Sonora and Chihuahua), Panama and parts of Chile.

Let me explain what Hualde's saying here. In most varieties of Spanish (including the standard), orthographic "ch" pretty consistently represents what's called a voiceless post-alveolar affricate, the IPA symbol for which is [ʧ]. (Orthographic "ch" in English also generally represents [ʧ], but there are exceptions like character, Bach, and the like.) Some speakers in parts of Chile and elsewhere pronounce (at least some instances of) orthographic "ch" as a voiceless post-alveolar fricative, the IPA symbol for which is [ʃ] and which in English orthography is generally represented as "sh". (See Note 1 below.) The primary phonetic difference between the two sounds is that [ʧ] begins with complete closure in the mouth while [ʃ] does not; this is the "occlusive element" Hualde refers to.

So, in sum: the "sh" part of the pronunciation of "Pinochet" is not unexpected for at least some Chileans, for reasons other than the fact that "Pinochet" is originally a French name.

What about the final "t", then -- in Geoff's representations, is it "che"/"shay" or "chet"/"shet"? (See Note 2 below.) Again, there is a constellation of interesting facts about Spanish generally that lead me to expect either pronunciation as possible, in Chile and elsewhere in the Spanish-speaking world. These facts can be grouped into two inter-related generalizations, one which I call the final consonant generalization (1) and the other which I call the final stress generalization (2).

Final consonant generalization.
- There are very, very few words in Spanish that end in a (pronounced) "t". (In fact, I find it hard to think of any other than obvious borrowings.) This is part of a more general fact about Spanish, that there are relatively few words that end in (pronounced) consonants other than "s", "z", "n", "l", "r", and "d"; these consonants are all pronounced with the tongue-tip, as is "t", but for various reasons these consonants are relatively common word-finally and "t" is not.
Final stress generalization.
- There are relatively few Spanish words with more than one syllable that end in a stressed vowel. (Caveat: some verb forms are consistently stressed on the final vowel, and words consisting of only one syllable have no choice of where to be stressed.) Some examples are menú, Perú, tabú, café, and of course, olé.
- There are even fewer Spanish words (again, other than verbs) that are longer than two syllables and end in a stressed vowel. The place name Panamá is the only example that comes to mind. (There are also some other less well-known place names of this type.)
- With some well-defined exceptions, words that end in consonants are nearly always stressed on the final syllable. One set of exceptions is the set of regular plural nouns; when the plural suffix "s" is added to a vowel-final noun, the stress is on the same vowel as in the singular: póstre 'dessert' ~ póstres 'desserts'. (See Note 3 below.)

Back to Pinochet: as in French, this word is stressed on the final syllable in all Spanish (and English) pronunciations that I've heard (Pi-no-CHET). Given this immutable fact, a Spanish speaker is stuck with the following uncomfortable (but subconscious) choice: pronounce the final "t" and have the word be another exception to the final consonant generalization in (1) above, or don't pronounce the final "t" and have the word be another exception to the final stress generalization in (2).

So this leaves us with all four options as viable, any one or more of which may be how a Chilean speaker might pronounce "Pinochet" or "Bachelet": (a) with "ch" and with "t", (b) with "ch" and without "t", (c) with "sh" and with "t", (d) with "sh" and without "t". I'm inclined to say there's probably something to Crooks' incomplete-seeming response to Inskeep's question: maybe most if not all of the Chileans he talks to use "sh" pretty consistently, and maybe they only pronounce the final "t" when they're being careful (or, attempting what they think is the right English pronunciation). So, it's option (d) in normal speech, and option (c) in more careful speech. (For what it's worth: to my ear, this clip of people chanting Pinochet's name sounds like option (b), but then again this is a lot of people and a fairly noisy recording.)

I can't resist noting a couple of other linguistically-interesting details about the NPR piece:

The very first sentence of the piece is Inskeep saying: "The dictator (who) once ruled Chile is dead." The "who" is in parentheses because I can't tell for sure if Inskeep said it. Either way, it's not unexpected. On the one hand, some varieties of English allow dropping of "who" in this type of context. On the other hand, this word is completely unstressed, which in English often results in other predictable reductions. The relevant reductions here are (a) dropping of the initial [h]-sound and (b) substantial shortening of the vowel [u], which is already very similar to the following [w]-sound that begins the word "once".
At about the one-minute-thirty mark, Crooks says: "Yes, there are a large number of people here that still support Pinochet." Note how deliberate Crooks is with his delivery of this sentence, and yet he manages to (apparently, anyway) mismatch plural "are" with singular "a large number of people" (in the "grammatical" as opposed to "semantic" senses of these terms, as clarified by Arnold Zwicky a few days ago). Not so long ago I discussed a related pair of examples.

[ Comments? ]

Notes

As far as I know, it's not the case that all examples of orthographic "ch" are pronounced like "sh". Most notably, word-initial orthographic "ch" (as in Chile) seems to always be pronounced "ch", and so the "sh" sound is limited to non-initial "ch" (as in Pinochet). My mother once asked me why her Chilean friend seems to "switch" the two sounds; for example, her friend says muchacho as if it were mushasho, but while speaking English she says cherry wine for sherry wine. The key is that the "sh" in sherry is word-initial.
We can safely ignore the final "e" vs. "ay" distinction that Geoff brings up in his representations. In Spanish, orthographic "e" represents a monophthong [e], which is roughly somewhere between the vowels of "bed" (monophthong [ɛ]) and "bay" (diphthong [eɪ]) in English. Words can't end in vowels like [ɛ] in English, though, and so what would be Spanish [e] word-finally corresponds more closely to [eɪ] in English.
Note that the orthographic conventions of Spanish explicitly take these well-defined exceptions into account; both postre and postres are written without an acute accent to mark the stressed vowel because the stress on these words is considered to be "regular".
There is an exception-to-the-exception that proves the rule behind exceptions to (2). (Are you following me?) There are virtually no words in Spanish with stress on a syllable that is more than three syllables from the end of the word; let's call this the three-syllable stress generalization (3). The only exceptions to (3) are verb forms with a couple or more pronouns attached to the end (such as muéstraselo 'show it to him/her'). Now take a noun like régimen 'diet' -- already an exception to (2) in that it ends in a consonant but is not stressed finally -- and add the plural suffix (which in this case is "es" rather than "s" because the noun ends in a consonant). The result is regímenes, not *régimenes; in other words, the plural can have a different stress than the singular, but only to avoid an exception to (3), not (2).

And in case you're still reading ...

The only other variety of Spanish with a "sh"-like sound that I am aware of is spoken in Uruguay. Some speakers of Uruguayan Spanish (in and around Montevideo, if I'm not mistaken) use a "sh"-like sound for orthographic "y"/"ll", both word-initially and between vowels (but not word-finally, where you sometimes find orthographic "y" but not "ll"). This differs strikingly from the somewhat better-known neighboring Argentinian varieties, where the same orthographic elements are pronounced with (something similar to) a voiced postalveolar fricative (IPA [ʒ]; often represented orthographically as "zh" for the benefit of English speakers).
Geoff suggested "Ole, ole ole olet; Pinoche, Pinochet" as the title for this post. I had to go with my gut and choose the one you see before you. Feel free to comment on which you think was the better choice (and why), if you like.

Posted by Eric Bakovic at 12:39 AM

December 11, 2006

Library Thing

LibraryThing, whose Unsuggester Geoff Nunberg mentioned, is an interesting idea. I learned about it from Steve of Language Hat. Claire of Anggarrgoon also uses it.

I've been using it to catalog my own books. It is handy since you can enter just the ISBN or title or author and it will find anything that matches in one of a large selection of sources. It is intended in part to make book-owning a social activity. You can find out what other people have (there's an option for making your catalogue private if you wish), and you can find out how many other people own any particular title. I know that my tastes are somewhat idiosyncratic, but I was still a bit surprised at how few people have some of my favorites.

There are currently 116,319 members with a total of 7,995,319 books comprising 1,486,839 distinct works. I can understand that not a lot of people will own モンゴル語四週間 (Mongolian in Four Weeks, in Japanese), but I am disappointed that I am the only owner of Morris Halle's The Sound Pattern of Russian and Ferdinand de Saussure's Mémoire sur le systeme primitif des voyelles dans les langues indo-européennes. Shouldn't more people have Regna Darnell's Edward Sapir: Linguist, Anthropologist, Humanist? Not one other person has Josep Nadal and Modest Prats' Història de la llengua catalana! And isn't it shocking that only one other person owns Brent Berlin's Ethnobiological Classification? Only eleven other people have a copy of the Hōjōki (方丈記), whose opening paragraph is, in my opinion, quite possibly the most beautiful passage ever written. (I have two editions, plus an English translation.)

7,245 people have The Da Vinci Code. Go figure.

Posted by Bill Poser at 11:08 PM

GNU/Linux in Kurdish

The good news is that a Kurdish localization of the Ubuntu distribution of GNU/Linux has been released. The bad news is that the Turkish government can't seem to get its head around the idea that Kurds have the right to use their own language, even though it has theoretically repealed the laws that forbade the use of Kurdish.

Here is a screenshot showing the preferences menu in Kurdish.

Here are the distribution's home page and an article about it in the Kurdish wikipedia. (I don't know what it says as, regrettably, I don't yet understand Kurdish.)

The release was announced in an article in the Turkish newspaper Milliyet on November 21st, as well as on the web site of the city of Diyarbekir. The Milliyet and the Radikal now report that Abdullah Demirbaş, the Mayor of the town of Sur, who participated in the launch ceremony, is under investigation by the Diyarbekir public prosecutor. (Here is a report in English for those whose Turkish is rusty.)

Posted by Bill Poser at 07:37 PM

If you loved The Chomsky Reader, you'll hate The Devil Wears Prada

From John Holbo at The Valve and John Emerson, some discussions of the UnSuggester at LibraryThing, which "analyzes the seven million books LibraryThing members have recorded as owned or read, and comes back with books least likely to share a library with the book you suggest." If you own or have read Great Expectations, for example, the site unsuggests Howard Rheingold's Smart Mobs; for Quine's Word and Object it unsuggests Little Women; and for Richard Rorty's Contingency, Irony, and Solidarity it unsuggests The Lion, the Witch, and the Wardrobe. Owners of Steve Pinker's The Language Instinct are unrecommended, among other things, The Lake House and Vogue Knitting on the Go. And readers of my own Going Nucular are discounseled from The Nanny Diaries, The Seven Habits of Highly Effective People, and St. Augustine's Confessions. Heck, I could have told them that.

Posted by Geoff Nunberg at 07:19 PM

Coordination of unlikes

Geoff Pullum's recent discussion of "grave and deteriorating" in the Iraq Study Group report returns us to the topic of the coordination of unlikes, which I last blogged about in a long posting about "failures of parallelism". Geoff argues with some care that "grave" in this example is an adjective, while "deteriorating" is a verb form; "grave and deteriorating" is therefore a violation of the Category Likeness condition (requiring that conjuncts be of like category) that many people assume rigidly governs coordination. Things aren't as simple as we'd like them to be.

For a while now, I've been collecting examples of coordinations that violate Category Likeness. Here's a sampling of types that you can find in English.

I'll start with the ISG example:

(1) AdjP + Ving: The situation in Iraq is grave and deteriorating.

Here the coordination is in the predicate complement of BE; it's a coordination of a predicate adjective and a progressive verb form. Two famous examples from the syntax/semantics literature similarly have coordination in a predicate complement of BE:

(2) NP + Ving: The temperature is ninety (degrees) and rising.

(3) NP + AdjP: He is a Republican and proud of it.

(Example (2), with a predicate NP conjoined with a progressive verb form, became famous through the work of Barbara Partee, example (3), with a predicate NP conjoined with a predicate AdjP, in the HPSG literature.)

From (3), we might speculate that what allows coordination of unlikes there is at least in part semantic: the NP and AdjP there both denote properties that are predicated of the subject. Now in English, PPs can also be used this way, and here's a coordination of predicate AdjP and PP:

(4) AdjP + PP: ... her colleague Steven Chillrud, who was both afraid of heights and on vacation ... (New Yorker Talk of the Town piece, 8/28/06, p. 22)

On the basis of (4), we can concoct PP versions of (1) and (2): first, with the PP in place of the progressive:

(5) AdjP + PP: The situation in Iraq is grave and in decline. [invented]

(6) NP + PP: The temperature is ninety (degrees) and on the rise. [invented]

and then with the PP in place of the AdjP/NP:

(7) PP + Ving: The situation in Iraq is in crisis and deteriorating. [invented]

(8) PP + Ving: The temperature is at a record high and rising. [invented]

There's a lot more to be said about these predicate examples, but I'll pass on to some of other types.

One fairly common type conjoins two different kinds of purpose expressions: infinitival VPs and PPs with the preposition for:

(9) VPinf + PP: These [recommendations] include the proposals to enlist the help of Iraq's neighbors and for bolder peacemaking in Palestine. (leader in the Economist, 12/9/06, p. 11)

(10) PP + VPinf: ... fighting for prisoners' rights and to change the system. (Mary Ambrose, announcing her "Your Call" radio program on KALW, 6/7/06)

(11) PP + VPinf: Her only visits to the hospital had been for a variety of broken bones and to deliver her two children. ("Diagnosis" column in NYT Magazine, 4/25/05, p. 36)

An earlier posting on astounding ccordinations had an example of this type:

(12) PP + VPinf: ... designed for closeness, comfort, and to clean itself automatically (Remington shaver commercial, heard 21/21/04)

A related type conjoins an infinitival VP of purpose with an adverbial subordinate clause of purpose:

(13) VPinf + so-Clause: The railroad magnate and future founder of Stanford University expanded it [the Leland Stanford Mansion in Sacramento] to 19,000 square feet to accommodate his growing family and so he could use the mansion for receptions and other official duties. (AP story printed in the Palo Alto Daily News, 7/4/05, p. 6)

Other sorts of adverbials of unlike category can be conjoined: an adverbial subordinate clause of reason (with because) plus a participial absolute expressing reason:

(14) because-Clause + AbsoluteClause: Because I had to stay overnight and this being New England, the only place to stay was a bed-and-breakfast. (Sarah Vowell, Assassination Vacation, p. 1)

or a temporal PP with a temporal participial adjunct:

(15) PP + VPing: After a tender love affair with the wife of an innkeeper, and having renamed himself for a short while with the eccentric pseudonym of 'Lesbonico Pegasio', he [Lorenzo Da Ponte] appears again in Vienna as 'poet' to the Burg theatre, and the favourite of Emperor Joseph II. (John Mortimer, Where There's a Will, pp. 10-11)

A final type has two kinds of verb complements in coordination: a PP (in both my recent examples, with the preposition about) and a that-clause:

(16) PP + that-Clause: Writing in Annals of Internal Medicine, the researchers reported that doctors often did not know about the results or even that a test had been ordered ... (NYT Science Times, 7/19/05, p. D6)

(17) PP + that-Clause: "Meantime," Carswell piped up, "I don't need to remind you about making that apology, or that you're still on suspension?" (Ian Rankin, A Question of Blood, p. 303)

An earlier posting on astounding coordinations cited a similar example in which the PP was in:

(18) PP + that-Clause: Kirk Arnott, assistant managing editor [of the Columbus Dispatch], is the language cop or watchdog of the Dispatch. He believes in informal and conversational language, and that his paper should be as conversational as possible, to be accessible and clear to readers. (MacNeil & Cran, Do You Speak American?, p. 61)

There are four large groupings in this inventory: the predicate examples, the purpose complement examples, various coordinations of adverbials, and verb complements. Overall, what we should conclude from these cases is that similarity in function and meaning can at least sometimes trump differences in syntactic category. But, yes, there are a lot of details to work out.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:51 PM

A note from Marrije Schaake

In response to the previous post, Marrije Schaake writes:

Thanks for pointing once again to the Brizendine book. Her 'statistic' of women using 20.000 words to men using 7.000 words a day had also made it into a comic strip in Holland called Sigmund (www.sigmund.nl, yes, he's an analyst. I'm afraid I can't link to the right page, since they are on a rotating schedule). On December 6th this strip ran in De Volskrant. The lady says: scientists have determined that men use 7.000 words a day and women 20.000. Sigmund: 'Well!' Lady: Well done! Only 6.999 to go!

It sort of annoyed me, since this 'science' is what people will remember, as you say in your last paragraph. Sorry to say I didn't call the cartoonist on it. No wait, I'm going to send him a link to your article :-)

Posted by Mark Liberman at 11:49 AM

Sex differences in "communication events" per day?

Louann Brizendine's book The Female Brain, published last August, featured a number of striking quantitative assertions about sex differences in communication. The jacket blurb claimed "A woman uses about 20,000 words per day while a man uses about 7,000", while the text (p. 14) gave the same numbers in the other order: "Men use about seven thousand words per day. Women use about twenty thousand." Dr. Brizendine gives a set of references in her end-notes, but none of them support those numbers. In fact, no study of any sort has ever measured any numbers at all like these, as far as I've been able to find.

What are the facts about sex and talkativeness? There's an enormous amount of individual variation, and each individual talks more or less depending on mood and context. Against this background of variation, many studies have measured how much women talk, on average, compared to how much men talk, on average. The differences that they find between men and women as groups have always been small compared to the differences among men as individuals or among women as individuals. And more often than not, these small group differences actually show men talking a bit more than women do. For additional details, see the links at the end of this post.

Earlier this month, it seemed that Dr. Brizendine had been persuaded to withdraw the word-count numbers (Stephen Moss, "Do women really talk more?" The Guardian, 11/27/2006):

When I reach Brizendine, just as she is crossing the Golden Gate bridge, she tells me that she has accepted the criticism of the numbers quoted in the book - on both volume of words and rate of speech - and will be deleting them from future editions. Nor will they appear in the UK edition, to be published by Bantam in April. "I understand Mark Liberman's point and I am grateful to him," she says. "He felt I was passing on data that was not nailed down, and thus perpetuating a myth, so it will be taken out in future editions." She admits language is not her specialism, and she had been reliant on the advice of others.

But another interview, published last weekend, offers a re-interpretation rather than a retraction (Deborah Solomon, "He thought, she thought", NYT Magazine 12/10/2006):

Q: Your book cites a study claiming that women use about 20,000 words a day, while men use about 7,000.

A: The real phraseology of that should have been that a woman has many more communication events a day — gestures, words, raising of your eyebrows.

Now, given the relatively slow pace of the magazine business, it's quite possible that Ms. Solomon interviewed Dr. Brizendine before Mr. Moss did. Thus the NYT magazine Q&A may not reflect her current position. But the point is worth following up in any case.

An executive summary of the conclusions: the claim about "communication events" seems to have essentially the same status as the claim about word counts. I can't find any studies that yield numbers at all like those in the re-interpreted claim, which would be something like "women use about 20,000 communication events a day, while men use about 7,000". The studies I've been able to find that count something like "events" of nonverbal communication yield the same sorts of results as studies that count words: there's a lot of individual variation; individuals vary a lot depending on context and mood; male and female averages are not consistent with a large sex difference in counts of overall "communication events". (Though there can be large sex differences in the case of particular nonverbal signals: in a study discusssed below, women laughed about 43% more than men, while men produced 623% more "chin thrusts" than women did.)

Given the references cited in the end-notes to p. 14 of Brizendine's book, it seems that she got her "communication event" numbers from one of the many books by Allan Pease and his various co-authors. For example, Allan and Barbara Pease, Why Men Don't Listen & Women Can't Read Maps, p. 80-81:

A woman can effortlessly speak an average of 6,000-8,000 words a day. She uses an additional 2,000-3,000 vocal sounds to communicate, as well as 8,000-10,000 facial expressions, head movements, and other body language signals. This gives her a daily average of more than 20,000 communications. That explains why the British Medical Association recently reported that women are four times more likely to suffer from jaw problems.

"Once I didn't talk to my wife
for six months," said the comedian.
"I didn't want to interrupt."

Contract a woman's daily "chatter" to that of a man. He utters just 2,000-4,000 words and 1,000-2,000 vocal sounds, and makes a mere 2,000-3,000 body language signals. His daily average adds up to around 7,000 communication "words" -- just over a third the output of a woman.

They give no citation for any research backing up these numbers. I've spend a fair amount of time searching fruitlessly for published studies that might support such assertions. If you know of any, please tell me. [The work that Brizendine actually cites is Pease. A. and A. Garner (1997) Talk Language: How to use conversation for profit and pleasure. I don't have a copy of this work, but I'm assuming that it offers the same numbers, and the same lack of documentation, as the one that I've quoted from -- and in both cases, it seems that the numbers represent the type of pseudo-scientific bible story that has become regrettably common in popular works on psychology, as well as in the system of folk beliefs that Arnold Zwicky calls "bizlore". (Update 12/22/2006 -- I've bought a copy of Talk Language, and in fact it contains no information whatever about any counts of words, eyebrow movements or any other "communication events".)]

As you can read in the links at the end of this post, I'm quite confident that the word-count part of this assertion (6000-8000 words a day for an average woman, 2,000-4,000 words a day for an average man) is not consistent with the numbers that have really been measured in many published studies. What about the "additional ... vocal sounds" and the "facial expressions, head movements and other body language signals"?

One study that will give us an idea of how this is likely to work out is John F. Dovidio, Clifford E. Brown, Karen Heltman, Steve L. Ellyson, Caroline F. Keating, "Power displays between women and men in discussions of gender-linked tasks: A multichannel study", Journal of Personality and Social Psychology, 55(4), 580-587, 1988.

Here's their description of the study:

In preliminary testing at the beginning of the term, 88 introductory psychology students rated their familiarity (0 = no familiarity, 10 = a great deal of familiarity) with the materials, steps, and potential problems of 14 activities (e.g., washing and waxing a car, writing a research paper) that varied in their association with masculine and feminine gender roles. On the basis of those anonymous ratings, we selected three tasks as discussion topics: automotive oil changing, for which men showed greater familiarity than did women (M s = 7.0 vs. 2.4, p <.001); pattern sewing, for which women showed greater familiarity than did men (M s = 6.4 vs. 1.4, p <.001); and vegetable gardening, for which men and women indicated equal familiarity (M s = 6.5). We drew the 24 men and 24 women who participated in our study from this pool of students. Each mixed-sex dyad discussed the masculine topic (oil changing), the feminine topic (sewing), and the non-gender-linked topic (gardening). [...]

We randomly selected 24 male and 24 female undergraduates from a pool of 50 male and 38 female students in an introductory psychology class at a midwestern liberal arts college. We randomly paired the subjects, who were not previously consociated, in mixed-sex dyads. [...]

Each dyad discussed all three topics. "The order of the three discussion tasks (oil changing, sewing, and gardening) was counterbalanced, and one male and one female experimenter ran two dyads in each order."

The (three-minute-long) conversations were videotaped, and coded as follows:

Two coders recorded the verbal and nonverbal behaviors from the videotapes. The verbal measures were the number of speech initiations by each participant (Rosa & Mazur, 1979) and the percent of the total interaction time that each subject spoke (Berger et al., 1985). The nonverbal measures were (a) looking while speaking, the percent of time that the subject looked at his or her partner while the subject spoke (Dovidio & Ellyson, 1985); (b) looking while listening, the percent of time the subject looked at his or her partner while listening to the partner speak (Dovidio & Ellyson, 1985); (c) rate of gesturing, the number of expressive hand movements (not in contact with one's own body) that occurred per second while speaking (Dittman, 1972; Henley, 1977); (d) frequency of chin thrusts (Camras, 1980; Henley, 1977); (e) frequency of smiling (Henley, 1977); (f) frequency of self-touching, hand movements in contact with part of one's own body; and (g) frequency of laughing (Henley, 1977; Waxer, 1977).

The results?

There was an obvious, and interesting, effect of topic. How people communicate does depend on the interaction between who they are and what they're communicating about! But averaging over topics so as to focus on the sex differences, we find (if I've done the arithmetic correctly):

	Male	Female
Time speaking	40%	28%
Speech initiations	14.0	12.9
Looking while speaking	34%	30%
Looking while listening	44%	59%
Rate of gesturing	0.09	0.05
Frequency of chin thrusts	1.62	0.26
Frequency of smiling	10.6	13.6
Frequency of self-touching	6.1	6.5
Frequency of laughing	4.1	6.0

So the guys did more of the talking, as is often the case -- 43% more, this time, which is a bigger difference than one usually sees. What about non-verbal signals? Well, the guys did 80% more gesturing, and produced 623% more chin thrusts. The gals did 28% more smiling, 7% more self-touching, and 46% more laughing. Dovidio et al. didn't count eyebrow motions, it's true. But there's certainly no support here for the view that women produce about three times more "communication events" on average than men do.

[In more detail: Suppose we insist, rather against common sense, on an overall count of "communication events" in this data. We assume (without any basis) that each hand-gesture, each word, each laugh, etc. is a single "communication event". Then the average male spoke for 40% of three minutes, which is 1.2 minutes; since this calculation eliminates all pauses, let's assume 200 wpm (it was probably more than this), yielding 240 words; the average female totaled .28*3 = 0.84 minutes, which yields 168 words. I'm not sure how to count the gaze percentages, so let's leave them out for a first approximation. The "rate of gesturing" is measured per second, so we get .09*180 = 16.2 for the males, and .05*180 = 9 for the females. The other measures are all (I think) counts for the whole conversation. So for the males, we get 240+16.2+1.62+10.6+6.1+4.1 = 278.62 "communication events". For the females, we get 203.36 "communication events'. These sums are a preposterous farrago of category errors -- it's like adding up cars, bicycles, shoes and socks as transportation modalities -- but some such sums seem to be required by the Pease/Brizendine theory, and I don't see any way to get them to come out in a way that's consistent with the females-are-three-times-more-communicative theory.]

I'm hoping that the Guardian interview really was later than the NYT magazine interview. I'm afraid, though, that whether or not Dr. Brizendine retracts the word-count claims or just re-interprets them as "communication-event" claims, what the world will remember is the "scientific proof" that women are three times talkier.

Other posts on Louann Brizendine's The Female Brain:

Neuroscience in the service of sexual stereotypes (8/6/2006)
Sex-linked lexical budgets (8/6/2006)
Sex and speaking rate (8/7/2006)
Yet another sex-n-wordcount sighting (8/14/2006)
The main job of the girl brain (9/2/2006)
The superior cunning of women (9/2/2006)
The laconic rapist in the womb (9/4/2006)
Open-access sex stereotypes (9/10/2006)
David Brooks, neuroendocrinologist (9/17/2006)
Sex on the brain (Boston Globe, 9/24/2006)
Gabby guys: the effect size (9/25/2006)
"Every 52 seconds": wrong by 23,736 percent? (10/13/2006)
Two new reviews of Brizendine (10/30/2006)
Word counts (11/28/2006)

More on the spread of these ideas in the media:

Regression to the mean in British journalism (11/28/2006)
Censorship at the Daily Mail(11/29/2006)
Contagious misinformation(12/1/2006)
Femail again(12/2/2006)
~~Bible~~ Science stories(12/2/2006)
Fabricated but true?(12/3/2006)

Posted by Mark Liberman at 09:57 AM

Nigger, nigger, on the wall

It was recently pointed out by Heidi Harley that discussion of the use of the word nigger has reached the comics page. The word featured hugely in comedian Michael Richards' on-stage career meltdown at the Laugh Factory on November 17 (captured in video on someone's camcorder phone). Jesse Jackson now wants "to prohibit that word in public usage as hate language." I don't think nigger should be banned at all, in any sense. Not by anyone who thinks it's important for us all to have a clear view of who we are and what we're like. If you want to make sure you know what you look like, don't take down the mirrors. (Count Dracula did smash one mirror in a fury, early in Bram Stoker's novel; but that was because Jonathan Harker had just noticed Dracula's tell-tale lack of a reflection. As you'll see if you read on, that lashing out at the source of the evidence only underlines the aptness of my metaphor.)

Others have already opposed a word ban, of course. "Blaming a word for its users is an ill-conceived approach with no designated goal," says Kaffie Sledge at the Ledger-Enquirer in Columbus, Georgia: "If we're going to ban words, where will we start? And when will we stop?" One answer would be that in general we're not in a very good position to ban any words. It can't be done by an act of Congress, that's for sure, because of the First Amendment: "Congress shall make no law ... abridging the freedom of speech, or of the press." It is true that the FCC manages to exercise some control over the designated "obscene" words that can be used in broadcasts over the publicly territory of the airwaves, but to think there might one day be a law against saying nigger on the stage or printing it in newspapers or on Language Log is quixotic. I doubt that nigger is going to go away by statute.

Jesse Jackson's campaign seems in fact, from what I've read, to be centered on getting showbusiness people to agree on a voluntary basis that they will not use it in their performances. But that isn't a ban; it's an agreement. It's not analogous to a law against littering the beach; it's more like a team of volunteers deciding to clear litter off the beach. However, the Laugh Factory is reported to have banned the word nigger since November 17, and to have already fined black comedian Damon Wayans $320 for using it during his act after being been warned. (That, I guess, is constitutional; it's like fining the caterers at your private party if they litter your patio.)

Personally, I don't think any ban, or even a voluntary agreement among showbiz folks to suppress the word, is the right way to go.

The Michael Richards event provides its own lesson, for everyone. (See it if you have a strong stomach.) It is truly painful to watch: Richards is so bad up there, so unfunny, so unequipped to find ripostes that might stem the talking and heckling from a disruptive group of latecomers. He is dying in front of that audience, and he knows it. His offensiveness is so inept, his ineptness so offensive, as he paces back and forth, looking sideways into the wings as if for rescue but never directly at the black hecklers he is trying to confront. He walks about hollering things like "Fifty years ago, we'd have you upside down with a fucking fork up your ass!" (Nervous newspapers are leaving out fucking and printing that last word as "---", as if those words were the problem!) And he shouts "nigger" at his main heckler, over and over and over again.

But look at Richards now: his career in ruins, his name a byword for impotent white rage and pathetic public collapse. He used the word in anger, and he's pretty much being kept at bay with garlic and crucifixes in showbusiness right now. Remember that.

Ban the word? I say let it stay right with us. (In everyday life it will stay anyway. Any kind of explicit attempt at suppression would only enhance its insultingness and taboo value.) We all know the word well; we know where it lives. We know it's a deeply offensive insult left over from slavery and its hundred-year aftermath of discrimination. And there are other things most of us know about it. Use it in seriousness from a stage or podium and your career on stages and at podiums is over, that's clear. Address it to a black man in the street you are extremely likely to get beaten up. You might recover from the beating, but what will be harder to get over is that from then on you are almost certain to be judged despicable by most ordinary people who heard you.

If you're black and young, then in some very informal contexts you may be able to get away with using it to friends as an edgy but basically affectionate in-group term, connoting familiarity and solidarity (making this use more widespread is what Jackson thinks black rappers and comedians should cut back on; I say Damon Wayans is the expert on how to be funny and black, and he should decide what he's going to do in his act).

Mind how you go: it's a loaded, dangerous weapon of a word, especially now. Don't try it with people over fifty, for example. In fact, no matter who you are (and dictionaries should provide a brief warning along these lines), if you're at all in doubt, don't touch it with a ten-foot pole. But if you think you know what you're doing, by all means use it where and if you dare.

I want you to, because there are things I need to know about you. Whether you refer to African Americans as niggers is relevant to whether you and I are ever going to have lunch together or be drinking buddies, for example. I don't want to know you have been cowed by some ban or convention; I want to know how you think it is appropriate to talk. Knowing how Michael Richards used the word nigger is highly relevant to my decisions about whether I will ever put my money down to see his act in a comedy club. Useful information.

Use the word as you think apt; it will reflect things about you very informatively, like a mirror on the wall. Ultimately your linguistic choices are up to you. That's exactly as it should be.

Posted by Geoffrey K. Pullum at 12:48 AM

December 10, 2006

Save that comma!

While we were celebrating the awarding of the Presidential Medal of Freedom to William Safire, Eric Bakovic, searching for a suitable photo, discovered that back in June 2005, Bill received the Guardian of Zion Award. In the story on this award in Voices magazine, we find that Bill is apparently the author of "the dictionary":

Safire is the author of 14 books on grammar and usage and the author of four novels, the dictionary, The New Language of Politics and an anthology of great speeches, Lend Me Your Ears.

It took me a moment to see what had gone awry here: the problem is the punctuation.

It helped that I happened to know that The New Language of Politics is in fact a dictionary; its subtitle is A Dictionary of Catchwords, Slogans and Political Usage. So the writer's intention surely was that "The New Language of Politics" should be in apposition to "the dictionary"; there are three conjuncts, not four (and from now on, I'll italicize all book titles just to make things visually clearer, except when directly quoting from the Voices article):

four novels

the dictionary = The New Language of Politics

an anthology of great speeches = Lend Me Your Ears

Now there are two ways to punctuate appositive proper names: either with no punctuation at all —

(1) the dictionary The New Language of Politics

or with commas flanking the appositive —

(2) the dictionary, The New Language of Politics,

(the second comma is suppressed at the end of a sentence, as in "an anthology of great speeches, Lend Me Your Ears.", in the quotation above).

Advice varies on when to use commas and when not — no, Lynne Truss doesn't take up cases this complicated — but many writers tend to use commas for longer proper names, omitting them for shorter ones. Apparently the writer of the original text opted for commas, possibly because "The New Language of Politics" is pretty long. But only the first turned up in the text as printed above. What happened?

Two possibilities. First, that the writer made the very common error of leaving out the second comma in matched pairs; if so, an editor should have supplied it. Second, and I think more likely, that the writer put the second comma in, but that it was removed by an editor searching for (and destroying) "serial commas", commas preceding "and" at the end of a string of coordinated expressions. The Voices house style seems to be thoroughly anti-serial; elsewhere in the Safire article, we find:

columnist Charles Krauthammer, A.M Rosenthal, Herman Wouk, Sir Martin Gilbert, producer Arthur Cohn and Elie Wiesel

the judgments of political leaders, religious leaders, social arbiters and even media pundits

using the movie screen, the television screen, the cellphone screen and the Internet

brain science, immunology and arts education

Serial vs. anti-serial is one of those absurd religious disputes that concern the minutest points of practice but consume astonishing amounts of the energy and time of practitioners. (Lynne Truss, sensibly, refuses to take sides.) I'm a serial guy myself, but I'm exposed to material punctuated according to both schemes, with the result that I doubt that I'd note inconsistent use of commas within a text, unless I was specifically examining its punctuation style. I suspect that most people (other than copy editors and such) are like me; at any given point in our reading, we have to be prepared to cope with either scheme. Which means, paradoxically, that all the effort invested in enforcing one scheme or the other consistently is wasted on ordinary readers; we're not going to notice inconsistency.

This point turns out to be important to the punctuation of the Voices sentence we started with. Suppose the writer had opted for the comma-ed appositive in (2). That would give us:

(3) ... the author of four novels, the dictionary, The New Language of Politics, and an anthology of great speeches, Lend Me Your Ears.

This is still likely to be read wrong, as a coordination of four, rather than three, things. To divine the writer's intentions, you'd have to appreciate that the Voices text is consistently anti-serial. Which means that you'd have to have noticed that the one preceding coordination (the one going from Krauthammer to Wiesel) had no serial comma. This is way too subtle for ordinary readers.

The easy solution would be to use the comma-less appositive in (1); the anti-serial version is:

(4) ... the author of four novels, the dictionary The New Language of Politics and an anthology of great speeches, Lend Me Your Ears.

and the (to my mind better) serial version is:

(5) ... the author of four novels, the dictionary The New Language of Politics, and an anthology of great speeches, Lend Me Your Ears.

As Geoff Pullum pointed out in a chat around the water cooler here at Language Log Plaza, part of the problem with (3) is that it uses commas in two different functions. Geoff Nunberg then observed that you can use semicolons to fix this:

(6) ... the author of four novels; the dictionary, The New Language of Politics; and an anthology of great speeches, Lend Me Your Ears.

The semicolon can be your friend. (Notice that that last semicolon is obligatory, whatever your position on serialism vs. anti-serialism.) [Addendum 12/11: John Cowan tells me that there are committed anti-serialists who reduce the final serial semicolon (when it's being used as a "super-comma") to a comma. That is, these people allow a comma after the penultimate conjunct, but only as a replacement for the even stronger punctuation mark, semicolon.]

A final surprising fact: the address that Safire gave when he accepted the Guardian of Zion Award was titled, according to Voices, "Jerusalem, Job, and Justice" — WITH A SERIAL COMMA, despite the fact that Safire is otherwise an anti-serialist: recall that the subtitle of his political dictionary is A Dictionary of Catchwords, Slogans and Political Usage, and that he has written for many years for the New York Times, which is consistently anti-serial (though they don't go so far as to change book titles and subtitles that have serial commas in them). I think his punctuation choice is a good one, even for a confirmed antiserialist; "Jerusalem, Job and Justice" invites a reading in which the comma introduces an appositive, "Jerusalem: Job and Justice".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:37 PM

Spanish on the Senate floor: the great non-debate

A few days I posted about the new "Stop Martinez" website, set up by the lobbying group English First to oppose President Bush's choice of Sen. Mel Martinez as the next chairman of the Republican National Committee. The English Firsters feel that Martinez is unfit for such a post because of his positions on immigration and "official English" legislation. As a supposedly damning piece of evidence, the "Stop Martinez" site gives this bullet point:

On February 3, 2005, Martinez used Spanish on the Senate floor in his first speech, although there are some doubts about the translation.

The "doubts about the translation" link improbably leads to a Language Log post by Mark Liberman that had very little to do with Martinez's speech. Rather, Mark discussed an automated translation of a wire story about the speech that appeared on the site for El Sol de Zacatecas, a Mexican newspaper. I pointed out that this post was utterly irrelevant to whatever beef the English Firsters have with Martinez. Now Jim Boulet, executive director of English First, has responded to my post on "Martinez Watch," a blog affiliated with the "Stop Martinez" site. I appreciate Mr. Boulet taking the time to explain the rationale for linking to Mark's post, but I have to admit his explanation leaves me more baffled than ever.

Let me address Boulet's response paragraph by paragraph (his comments in red):

The points that the link intended to demonstrate were two:
First, debate over accurate translation of a speech or document of any length is inevitable and, accordingly, the United States would do well to continue as a single-language nation for official purposes. Based on both his voting record and his Senate Spanish speech, Senator Martinez disagrees.

This entirely misses the point of the linked post, which, once again, was about an odd computer-generated translation of a news article provided by a Mexican newspaper's website. There was no "debate over accurate translation" regarding the Spanish portion of Martinez's speech on the Senate floor. Mark Liberman's post was more about the current limits of machine translation (also discussed here and here) than anything having to do with Sen. Martinez or his use of Spanish. It strikes me as deeply disingenuous to shoehorn an academic discussion about MT into an argument about official English. Yes, any translation from one language to another can be critiqued for accuracy, especially a computer-generated one like the article on the El Sol de Zacatecas site. It takes an enormous leap of logic to then conclude that the U.S. should "continue as a single-language nation." (By the way, the U.S. has never been a "single-language nation.")

Second, the Martinez Spanish speech was supposed to attract Hispanic support for the Hispanic nominee he was endorsing. Instead, his remarks sparked debates among some Spanish speakers about translation accuracy.

As far as I can tell, this is entirely untrue. There were no such "debates among some Spanish speakers about translation accuracy" stemming from the Martinez speech. Mark came across the automated Spanish-to-English translation on the site for El Sol de Zacatecas from a news search and then posted about the obvious shortcomings in relying on translation tools like Babel Fish and Google Translate, which offer up such laughers as "the official transcription published east Thursday" for the Spanish article's phrase la transcripción oficial publicada este jueves. Again, that had absolutely nothing to do with any difficulties in translating Martinez's speech. Indeed, it's hard to imagine there being any such "debates" about accurately translating the speech (at least by humans), since Martinez's Spanish-language remarks consisted of a few unremarkable sentences in support of Alberto Gonzales, then up for confirmation for Attorney General. What's more, Martinez supplied his own translation of the Spanish remarks for the Congressional Record (see below).

Given the documented differences between Castilian Spanish and the Spanish usage of Puerto Ricans, Mexicans and Cuban Americans, complaints about Spanish word usage are inevitable. A political speech suffers when it drives people to dictionaries rather than to action.

This is also pulled out of nowhere. I don't know of anyone who was driven to a dictionary by Sen. Martinez's speech, or anyone who complained about dialectal variations or "Spanish word usage." Martinez's remarks would have been easily comprehensible to any (human) Spanish speaker. What was less than comprehensible (once again!) was an article about the speech that was fed into an automated Spanish-to-English translator and posted to a website. The "documented differences" among different Spanish-speaking communities is another red herring. As if English is somehow spared from "complaints about word usage" due to differences in spoken dialects!

President Bush's Spanish fluency has even been questioned: "Spanish wire service EFE reported last year that Bush speaks Spanish "poorly." Perhaps all of America's politicians would do better to stick to English.

Hoo boy. If President Bush's Spanish-language ineptitude is supposed to be some sort of linguistic benchmark, then yes, by all means, let's ban the public use of Spanish. (Some might argue that by the same token Bush's [dis]fluency in his native language would dictate a similar ban on English.) Fortunately, there are many American politicians, even native-born gringos, who are capable of speaking Spanish far more proficiently than the President. His brother Jeb (whose wife is Mexican) would be one example.

Another rather surprising example is Sen. Jim Inhofe (R-Okla.), who we can assume is fully acceptable to the English Firsters since he sponsored the amendment to the Senate immigration bill last May declaring English the "national language" of the United States. It turns out Mel Martinez wasn't the first U.S. senator to use Spanish in a floor speech, despite press reports claiming so at the time. Inhofe himself has done it on more than one occasion. In 2003, he spoke in Spanish at least twice, on Feb. 26 and Nov. 12, both times in support of the controversial nominee for the D.C. Circuit Court of Appeals, Miguel Estrada. In the two speeches, which are quite similar in content, Inhofe recalled speaking to a group from San Luis Potosi, Mexico, a sister city of Tulsa, where Inhofe was once mayor. His Nov. 12 speech (PDF) concluded with words resembling those that Martinez would later give in support of Gonzales:

Muchos Hispános estan escuchando ahora ..... y yo quiero decir.
Por descrácia, hay personas en el senádo que no quieren escuchar a ni una palabra de la verdad.
Yo invito a la communidad hispána para llama a sus senadores para insistir en los derechos de Miguel Estrada y en la confirmación de juéces de los Estados Unidos.

The translation entered by Inhofe into the Congressional Record reads:

Many Hispanic Americans are listening right now ..... and I want to say:
Disgracefully, there are people in the Senate that don't want to listen to even one word of the truth.
I invite the Hispanic community to call their senators to insist on the rights of Miguel Estrada and on the confirmation of the judges of the United States.

Inhofe's speech of Feb. 26, 2003 appeared in the Congressional Record (PDF) without an English translation for the Spanish comments. But a transcript of this speech can be found on Inhofe's own website, so he doesn't seem too embarrassed by his use of Spanish on the Senate floor. His words came back to haunt him in May, though, when his "national language" amendment was debated by the Senate, and opponents such as Sen. Dick Durbin questioned whether Inhofe's Spanish-language remarks could legally be printed in the Congressional Record under the proposed amendment. Inhofe, who acknowledged having made "probably five speeches on the floor in Spanish," denied that the legislation would have any such effect.

Finally, as promised, here is the Spanish-language section of Martinez's floor speech of Feb. 2, 2005 (text, PDF):

Y a los Hispano-Americanos a lo largo y ancho de esta gran nacion: tanto a nuestros niños, como a nuestros estudiantes de Derecho y los padres y abuelos que han venido a America a crear una vida mejor para ellos y sus familias, hoy les tengo un mensaje:
El Juez Gonzales es uno de nosotros. El representa todos nuestros sueños y esperanzas para nuestros hijos. Debemos reconocer la importancia de este momento--sobre todo para nuestra juventud. No podemos permitir que la politiquería nos quite este momento que nos enorgullece a todos. Apoyemos a Alberto Gonzales.

And here is the translation entered into the Congressional Record:

And to Hispanic Americans throughout our Nation:
From our schoolchildren, to law students, to parents and grandparents who came to America to create a better life for themselves and their families in the United States, I have this message for you today: Judge Gonzales is one of us. He represents all of our hopes and dreams for our children and for all of us as Hispanic Americans. Let us acknowledge the importance of this moment, especially for our young people. We cannot allow petty politicking to deny this moment that fills all with such pride. Let us all support Alberto Gonzales.

That's it. Ninety-seven words in all, in a 1,234-word speech, or less than 8 percent of Martinez's total spoken output. No "debate over accurate translation," no "complaints over Spanish word usage," just some boilerplate directed at the Latino constituency. Just like the boilerplate that Inhofe, leader of the Senate fight to make English the country's official (or at least "national") language, used himself on several occasions. So what's the fuss? It's nothing more than a trumped-up charge in a trumped-up debate, all in the service of an alarmist brand of linguistic isolationism. How about we spend our energies debating substantive political issues rather than a fabricated one?

[Update: I will concede one point to Mr. Boulet. It's true that any cross-linguistic rendering can be critiqued for accuracy, especially among the discerning readers of Language Log. Here is Matthew Stuckwisch's take on the translation of Martinez's speech that appeared in the Congressional Record:

Whilst Martínez says "esta gran nación" (this great N/nation), the Congressional record states "our Nation". Perhaps to add parallelism in the English the simple term "niños" (children) was changed to "schoolchildren". The remaining structure of the next sentence has been rather distorted, substantially changing the meaning IMO. He then says "Debemos reconocer" (We must recognise), an indicative statement, whereas the translation uses the imperative "Let us acknowledge". Oddly "sobre todo" (above all) was changed to "especially", even though above all is a common expression with the same meaning in English. The English translation quite curiously added the adjective "petty" (non-existent in the Spanish) to "politicking". The Spanish "nos quite" (take from us) is, in my opinion, far more active than "deny", as I think it portrays politicking in a worse light. While none of these make a huge difference in the overall meaning of his speech, in this day and age of taking two or three word quotes and changing them into huge political issues, the translation entered could be extremely improved:
"And to the Hispanic-Americans throughout our Nation: equally to our children, our law students, the parents and grandparents who came to America to make a better life for themselves and their families, I have a message today: Judge Gonzales is one of us. He represents all of our hopes and dreams for our children. We must recognise the importance of this moment — above all for our youth. We cannot allow politicking to take this moment from us that makes all of us so proud. Let us support Alberto Gonzales."
Also, there's an interesting usage note for the Spanish hyphen. Whilst in English it is generally known as a joiner, in Spanish it's generally consider a divider. That is if you're talking about a Hispanic v American conflict, you would say "un conflict hispano- americano", to show that the two terms are being contrasted. C.f. a person who considers themselves both Hispanic and American, "una persona hispanoamericana", which of course, actually ends up meaning all of Latin America as well in the Spanish. As far as I know, the best way in Spanish to say Hispanics from (US of) America is "hispanos estadounidenses". Actually, that speech could probably be a good case example of some of the issues involved in translation.

And here is Alexander Jabbari's assessment of the speeches and their transcriptions:

After reading your recent Language Log post, I thought you might be interested to hear that the transcriptions of both Jim Inhofe's and Mel Martinez's Spanish speeches are riddled with orthographical mistakes, and furthermore that Inhofe's use of Spanish in his Nov. 12 speech is not very good and sounds like it was translated directly from English by someone who is not a native speaker of Spanish, whereas Martinez's Spanish is perfect. I'm sure that's not surprising, as Martinez speaks Spanish as a first language and Inhofe evidently doesn't, but what was mostly surprising to me was the transcription. The transcription of Inhofe's speech contains spelling errors ("descrácia" instead of desgracia), misuse of diacritics ("senádo" instead of "senado," along with many other words containing diacritics where they don't belong), and an error in capitalization ("Hispános" instead of hispanos), not to mention the translation glosses "hispanos" as "Hispanic Americans," though the term refers to Hispanics of any country. Although Martinez's Spanish is impeccable, the transcription of his Spanish is not. It also contains a few capitalization errors (such as "estudiantes de Derecho" instead of estudiantes de derecho) and contains no diacritics whatsoever (except "ñ"), though many words in his speech should be written with diacritics (such as nación, which is written as "nacion" in the transcription).

And Bill Poser points out that "llama" in the Congressional Record transcription of Inhofe's speech should read "llamar".]

Posted by Benjamin Zimmer at 10:04 AM

From My Office Wall

The pressures of end-of-term chaos -- term paper drafts, exam construction, urgent meetings with desperate students, and a depressingly large bunch of committee meetings -- have reduced me to a staring-at-the-wall state, and on the wall in question are two of my favorite quotations from linguists' writings, so I thought I'd put them here for the possible edification of those who might not have encountered them before. The first is from p. viii of Stephen R. Anderson's book A-Morphous Morphology (Cambridge University Press, 1992):

Linguistics will become a science when linguists begin standing on one another's shoulders instead of on one another's toes.

And the second is from the late great Jim McCawley's review, in Linguistics 18:911-930, of Frederick J. Newmeyer's book Linguistic Theory in America (New York: Academic Press, 1980):

Newmeyer's attitude here...resembles the traditional Christian attitude toward sex: the pleasure of gathering data is proper only within the confines of holy theory construction and when not carried to excess; recreational data-gathering is an abomination.

Not to sound extreme or anything like that, but I think Anderson's words ought to be tattooed onto the brow of every linguist. I'm thinking here of two subspecies in particular: those who think that the best way to make yourself look smart is to make somebody else look stupid, and those who think that people of a different theoretical persuasion are a threat to one's own intellectual development.

As for McCawley's comment, I take no stand on whether his charge against Fritz Newmeyer is justified, but it will resonate with those linguists who, like me, can get excited about the tiniest new fact that is recorded in their field notes or uncovered in their reading ("They have a root that refers solely to the drumming of a ruffed grouse? Cool!") -- especially those of us who have met with condescending sneers from linguists who find such bits of language BO-ring and want to make sure we know that. (And no, I am not defending genuine bores who insist on telling all their colleagues about all the exciting new facts they've discovered about their favorite language.)

Posted by Sally Thomason at 09:14 AM

William Safire wins Presidential Medal

The White House recently announced that William Safire will be one of ten recipients of the Presidential Medal of Freedom, to be awarded in a ceremony on December 15. But only a select few have known the truth behind this event -- until now.

Not long ago, Safire was nominated for one of the coveted Language Log Awards. In particular, he had the inside track for Best Language Maven. This caused an immense scandal. It was bad enough when a Language Log post praised one of Safire's columns ("To pass into a certain condition, chiefly implying deterioration", 6/30/2004) — whole linguistics departments cancelled their subscriptions en masse, and only a last-minute intervention by Grant Barrett averted a vote of censure at the 2005 annual meeting of the American Dialect Society.

When news of Safire's "Loggy" nomination leaked out, it was ten times worse.

Mr. Verb exclaimed : "Look, it's like this, see: Words, they have, like, meanings!" And he went off muttering, "Award? award?!? AWARD??!!??"

Language Hat just put his head in his hands and moaned: "Mindbogglingly stupid."

And those were the moderates.

Still, we went forward undeterred. But as the celebrities were gathering for the awards ceremony at Language Log Plaza, a klaxon sounded. The Provisional Wing of the LSA had threatened an attack.

Now, it's been rumored for years that the Provos have been stockpiling phonologically active materials, highly enriched in nuclear accents, perhaps even enough to build a primitive phonemic bomb. Experts doubt that this is possible, but the threat is too dire to ignore. I mean, it'd be the Tower of Babel all over again. So we evacuated the building and snuck Bill out through the basement steam tunnels.

Ashamed of my profession's inability to recognize the true value of Safire's contributions, I contacted the White House. W owes us one (at least one), and I'm happy to say that after a small amount of push-back ("Whaddya mean, put Michael Brown off to next year? Safire writes good stuff about language, sure, but it's not near as good as what Brownie did after Katrina!"), W did indeed put Bill on this year's list for a Presidential Medal of Freedom.

It's not a Loggy; but he'll be able to get through the awards ceremony without any catcalls about nominative suffixes.

Posted by Mark Liberman at 07:42 AM

Punctuation

Yesterday, as the sherry decanter was set out on the sideboard in the Senior Writers' Lounge at Language Log Plaza, Arnold Zwicky brought up the topic of appositives and the use of commas to set them off. Or not. William Safire came into it, somehow. I'm a bit vague on the details, frankly. That's because I got distracted when someone (Poser, was it?) suggested that we should design a notation for English as if it were a computer language, and wrote a sample on the blackboard in colored chalks:

(I packed ((a cookbook)((the novel)=(Dracula)) and (a good dictionary))) where = is the infixed apposition operator

I began to imagine what might have happened if some Enlightenment sage -- Leibnitz? Descartes? -- had come up with this same idea, won over the European intelligentsia, and with their help persuaded the crowned heads of Europe to enforce a logical syntax of punctuation on the writing systems of their time, and therefore of ours. And then I thought of this cartoon:

It's true, as J.S. Mill said, that "[t]he structure of every sentence is a lesson in logic." But there's also William Blake's warning: "They became what they beheld."

Posted by Mark Liberman at 07:37 AM

Rowbottom

In response to my post about Rinehart, Dick Margulis has reminded me that Penn developed a similar tradition at about the same time: the Rowbottom:

The cry of "Yea, Rowbottom!" served as the rallying call for mass student disturbances and even full-scale riots at the University of Pennsylvania during much of the twentieth century. The term was first used in 1910, starting simply as a student's attempt to summon Joseph Tintsman Rowbottom, a 1913 graduate of Penn's School of Engineering, but before the year was out, "Yea Rowbottom!" had become an invitation for mass mayhem.

I neglected the mass mayhem angle in writing about Rinehart, but according to David Winter's article in the Journal of Personality,"[b]y the 1930s, the cry of "Rinehart!" often signaled the beginning of a college riot".

The traditional Rowbottom riots continued at Penn long after the Bowl Fights ended, but like many other aspects of our universities' intellectual life, they vanished in the turmoil of the 1960s.

Posted by Mark Liberman at 07:34 AM

December 09, 2006

Linguistic Turmoil in the Northwest Territories

It hasn't attracted much attention farther south, but there is linguistic turmoil in the Northwest Territories. The problem is French, which along with Cree, Dene Suline (Chippewyan), Dogrib, English, Gwich'in, Inuktitut, and Slave, is an official language. (You can see samples of all eight official languages here.)

The problem is that English and French are the two national languages of Canada but that they have very different status in the Northwest Territories. As in most of Canada outside of Québec and New Brunswick, not very many people speak French in the NWT. The territorial government estimates that approximately 900 of the 41,861 people in the NWT have French as their first language. l'Aquilon, the French-language newspaper of the Northwest Territories refers to 1100 francophones. So francophones are about 2% of the population.

On the other hand, there are significant numbers of speakers of native languages: 185 of Cree, 2,600 of Dogrib, 700 of Gwich'in, 790 of Inuktitut, 2,200 of Slave, and 3,000 of Dene Souline. As a result, not only does French play second fiddle to English as it does in most of Canada, but from the local point of view, the priority of French is below that of the native languages as well. The Northwest Territories is unusually supportive of its native languages. There is an Official Languages Act (versions in a variety of formats are available here.) and a territorial Languages Comissioner to see to its implementation, whose activities are described in its Annual Report of the Office of the Languages Commissioner - 2001.

In 2001 the Fédération Franco-Ténoise*, the organization of francophones in the Northwest Territories, filed a lawsuit claiming that their linguistic rights were being violated. The territorial government not surprisingly responded that French speakers should not expect too much since there are so few of them.

The francophones won. The decision of the Supreme Court of the Northwest Territories is reported here in English and here in French. Here is the actual ruling.

The court issued the following orders to the territorial government:

Within one year, it must draft a comprehensive implementation plan for providing French-language services in all government institutions, especially those that offer service to the public.
The plan must provide for audits of services, creation of bilingual positions in government, especially in service points to the public.
There must be systematic recruitment of francophone personnel in health area, including physicians, nurses, technicians and pharmacists.
Public notices published in English newspapers must also be published in the local French newspaper, l'Aquilon, or an equivalent.
Within six months, Hansard must also be published in French

These things are easier said than done. The territorial government says that it has great difficulty hiring skilled personnel such as nurses even without the added requirement of ability to speak French. It was not able to arrange for the translation of the Hansard, the record of debates in the territorial legislature, into French, within six months, so publication has been suspended for fear that continuing to publish it only in English would violate the court's order.

This is a difficult situation. While I sympathize with the desire of francophones to maintain their language outside of Québec, it is clearly the aboriginal languages that deserve pride of place, both because they are native and because, unlike French, they are endangered.

* Even native French speakers probably don't recognize the adjective Ténois (sometimes spelled TéNois) in the name of the plaintiff organization. It is a neologism used only in Canadian French, derived from Térritoires du Nord-Ouest and means "of or pertaining to the Northwest Territories".

Posted by Bill Poser at 06:41 PM

Linguistic cartoon update update

Seems like lots of funnies are addressing the taboo-language question these days. Out of the Gene Pool has a character trying to coin a new swear word:

What the poor critterblob doesn't know is that even if it took off, it'd never be offensive; to really be a taboo word, a lexeme has to start off associated with some highly charged subject matter, for example, orifices, religion, or race (your mileage may vary depending on your cultural context).

Candorville and Herb 'n Jamaal have recently drawn about the n-word, stimulated by Michael Richards' public deployment of it (three weeks ago being today in cartoon-time, with the publishing lag):

This Candorville (there's been a series) highights the complicated question of in-group vs. out-group uses of slurs, previously discussed on the Log; see, e.g. the second-to-last para of az's post here.

Update: Toon-savvy reader ststones writes:

I have a much earlier example from a "Tom the Dancing Bug" anthology published in 1992. Max is a precocious diapered baby and Doug is a "generic cartoon animal". They are playing some kind of board game.

Doug: There! I advance to blue ... and I win again!
Max: Oh, you ... you ... DOIK!
Max: Oh my God! ... Doug! I'm so sorry!
Doug: What! What's a doik?
Max: Nothing! I just made the word up! I was mad!
Doug: But what does it MEAN?
Max:If you must know I made it up as a horrible epithet!
Doug: Oh!
Max: I'm so ashamed! I intended it as a slur that carries with it every
horrible prejudice anyone harbors against you and your kind!
Doug: My kind?
Max: Yeah, that's what I was THINKING when I said it, but it's not a real word
... just a random syllable I spat out!
Max: Anyway, Doug, I'm sorry.
Doug: That's okay, Max. It's not like it was a real word!
(Reaction panel. Then Doug walking toward a worm hole.)
Doug: So why does it make me feel so darn UNCOMFORTABLE?
Worm: Watch your step, you fat DOIK!

"Doik" reappears in several subsequent strips

Posted by Heidi Harley at 05:26 PM

Linguistic cartoon update

Frazz for 12/9/2006:

I'm always glad to see phonetics featured in the funny papers. But Frazz should have said "Espresso has only one plosive!" Or maybe better, "There is no velar plosive in espresso!" Because the [p] is a plosive, after all. And the comics should be the last bastion of truthfulness in this ever-truthier world.

Dilbert for 12/8/2006:

Taboo words -- which Scott Adams has been thinking about recently -- are a common topic for analysis here, and some of our posts have been picked up in the comments on the Dilbert blog. [The origins of "frack" on Battlestar Galactica are discussed at length in the Dilbert Blog entry, but Erin O'Connor suggests linking to the wikipedia entry as well.]

Cathy for 12/8/2006:

A nice example of "the him" used as a perfectly grammatical noun phrase.

Posted by Mark Liberman at 03:16 PM

The literary life, back then

Every once in a while, I come across something that makes it clear how much times have changed. Most recently, a footnote in P. N. Furbank's review of Victoria Glendinning's biography of Leonard Woolf (New York Review of Books, 12/21/06, p. 44). Glendinning quotes a letter to Lytton Strachey, from Woolf's first year in the Ceylon civil service, in which he describes a fantasy life that includes "reading Voltaire on the immense verandah". Here comes the footnote:

He had brought with him to Ceylon a seventy-volume edition of the works of Voltaire.

Unimaginable today, I think, even for book nuts like me.

[Mark Liberman points out that these days you could bring it in your pocket, on a thumb drive. But he understands that I was talking about traveling with those old-fashioned objects made from wood pulp -- book books, rather than e-books.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:41 PM

Farewell Conrad Burns

The farewell speech by Senator Conrad Burns caught my attention, largely because he has served in that capacity in my adopted state, Montana, for the past 18 years. I've only lived here for ten of those years, but these were enough to make me very happy that he was recently defeated by Democrat Jon Tester.

Burns was not what one would call clever with language. He was once asked how he could stand to live in Washington DC "with all those niggers." Burns replied, "It's a heck of a challenge." In 1999 he publicly referred to Arabs as "ragheads." He also claimed that one or more of the 9/11 hijackers came into the US from Canada, riling the Canadian government with a charge that was later proved untrue.

Then there is his association with the Jack Abramoff scandal (including a $150,000 campaign gift), leading the Citizens for Responsibility in Ethics (CREW) in Washington to list him among the 20 most corrupt members of congress (for more on this, see here). The last straw for Montana voters may have been Burns' sharp criticism of a group of firefighters that he ran into at the Billings airport when they were on their way home, after successfully containing a 92,000-acre forest fire near Billings. A reporter present wrote that the senator told them they had done a "piss-poor" job and that they hadn't done "a God damned thing" and just "sat around." Uh-oh. Most Montanans are deeply grateful to the firefighters who come from all over the country to contain our many summer fires. And Burns lost any hope he have had for the taxicab driver vote when he said that the "faceless enemy" of terrorists "drive taxicabs in the daytime and kill at night."

I guess I wouldn't expect Burns to mention any of these language goofs in his farewell address, much less to apologize for them. Instead he lists his many accomplishments in the Senate, then says:

But we have now opportunities now and they've opened up and I'm proud to say that it was me and my office that led the way on most of those changes. It is said it is not bragging if you done it. I was fortunate enough to attract a staff that shared the same vision of change and change we did ... I know what it is like to be in the minority. And you know what it is like to be in the majority. One of the great statements said, "The majority is more funner." I've enjoyed my work with some of the best men and women in the Senate who represented both sides of the aisle, from different regions of our country and diverse cultures of our country. I will miss them, but we have welded some friendships that will last forever. The same can be said of nations, which we've traveled and met national and international leaders.

Now isn't that a great speech? As it is said, him and his office really done it. More parsing could be made but maybe that's enough for now. Farewell, Senator Burns.

Posted by Roger Shuy at 01:41 PM

Rinehart

Here's a bit more about the historical background of Thomas Pynchon's Against the Day. Well, it turns out to be quite a lot more, I'm afraid -- once you start pulling at a loose thread on the internet, you can unravel a shocking quantity of historical fabric before your second cup of coffee gets cold. Anyhow, on page 156, just before the crimson=worm paragraph discussed earlier, there's a passage that allows us to date the event, more or less, and also excites a few linguistic resonances:

It was a less than intimate tête-à-tête. Alumni of both persuasions were milling everywhere in and out of the lobby, gesturing carelessly with foaming beer steins, sporting hats, spats, and ulsterettes vividly dyed in varying densities of the rival school hues. Every five minutes a page came briskly through, calling, "Mr. Rinehart! Call for Mr. Rinehart! Oh, Mr. Rinehart!"

"Popular fellow, this Rinehart," Kit remarked.

"A Harvard pleasantry from a few years back," explained Scarsdale Vibe, "which shows no sign of abating. Uttered in repetition, like this, it's exhausting enough, but chorused by a hundred male voices on a summer's evening, with Harvard Yard for an echo chamber? well . . . on the Tibetan prayer-wheel principle, repeat it enough and at some point something unspecified but miraculous will come to pass. Harvard in a nutshell, if you really want to know."

"They teach Quaternions there instead of Vector Analysis," Kit helpfully put in.

We'll come back to the Quaternions vs. Vectorists issue another time (though if you're impatient, you can read this). An article in the Harvard Alumni Magazine ("I Love My Vincent Baby", September-October 2002) explains the "Rinehart" part:

'Rinehart' is a Harvard rallying cry that goes back to the turn of the century. Its eponym was one James B.G. Rinehart '00, who was often hailed by a classmate beneath his window. On a warm June night in 1900, the classmate's cry of 'Oh, R-i-i-i-n-e-HART!' was spontaneously taken up by hundreds of inmates of the Harvard Yard, and in after years reverberations were reported from sites as far off as Cairo. In recent years, the tradition has all but died. Rinehart himself died in 1952.

But the Rinehart story turns out to be quite a bit more interesting. Over the decades, the story underwent a curious series of re-interpretations. David Winter, ("Gordon Allport and the Legend of 'Rinehart'", Journal of Personality 64(1), 1996) describes the process.

The first stage:

John Bryce Gordon Rinehart (1875-1952) was a member of the Harvard Class of 1900, having graduated from Waynesburg College in Pennsylvania 2 years earlier. He was intelligent (going on to graduate with honors) and affable, with many friends. He took most of his courses in history, government, and economics and on the side tutored other students in these subjects. He lived on the fifth floor (no elevator) of Harvard's Grays Hall. His friends and tutees, in order to save themselves an unnecessary climb up four flights of stairs, were in the habit of first calling out his name in order to see whether he was in.

On June 11, 1900, Rinehart was out of town, but left his windows open because of the heat. Some friends came by and called his name. Thinking he was in but hadn't answered because he was studying, they called again and again. (One person who claimed to have known Rinehart well between 1908 and 1923 described him as a "grind" who "often studied late at night.")' Suddenly, in imitation, students in other dormitories took up the cry, "Oh Rinehart!" The following night this same call was repeated across the campus. A tradition had been bom.

The second stage:

Within a few years of the events of June 1900, a legend developed around the "Rinehart!" cry. A 1950 survey of 74 Harvard alumni asking how the cry started (Feeney, 1950) showed that 72% of graduates from the classes of 1900 through 1909 gave some version of the true story stated above. Among graduates of the classes of 1910 and later, however, only 5% knew the truth and fully 76% replied with something along the lines of the following, which we shall call the "core legend".

The "core legend" of the second folkloric stage, from "Feeney, 1950, quoting a member of the Harvard Class of 1916":

An undergraduate of the Nineties or earlier, who roomed in the Yard and had few friends, used occasionally to stand below the windows of his empty room and call his own name, Rinehart, in hopes of adducing the attractive odor of popularity and friendship. He was probably caught at it, and the cry was taken up derisively from other windows.

A variant of the second stage -- which apparently never had much folkloric impact -- was invented by the American psychologist Gordon Allport, who entered Harvard as an undergraduate in 1915. In the fall of 1917, he

... entered a contest sponsored by the YMCA magazine North American Student for the best account of a "most highly treasured" college tradition. His essay, "Harvard's Best Tradition," won the first prize of $20 .... It was a retelling of Harvard's "Rinehart" legend.

But Allport's version was subtly altered. The Rinehart call began as jocular annoyance with a popular student; legend transformed it into derision of a lonely poseur; Allport transmuted derision into sympathetic acceptance. Here's how his prize article started:

Many years ago, in one of the venerable ivy-covered dormitories in the college yard, there lived a very lonely freshman. This freshman, like most of his kind, was full of ambition and aspiration, and had come to college to win himself a creditable place in the fraternity of Harvard men. He craved popularity, but had little talent in the art of becoming a social leader. He couldn't play football; couldn't dance; was neither good nor bad in his studies; had no pronounced vices or signal virtues which might appeal to one crowd or another. He was indeed a typical awkward youth of seventeen.

Many an afternoon he sat by the window listening to his classmates call their favorites to come and join a happy crowd bound for a Saturday hike to Fresh Pond or for a theater party in town. Eagerly he waited, hoping that some one would call: "Rinehart, O, Rinehart, do you want to go?" He often rehearsed his response to this coveted invitation, visualizing carefully his entree into the society of his classmates. But the call never came.

One day, out of sheer desperation and loneliness, he went down in front of the dormitory and, just to see how it would sound, called his own name vigorously, "Rinehart, O, Rinehart, come on down!" How glorious it sounded! If only. . . .

The rest of the story is not hard to imagine. One of the boy's observant classmates—there were a few such—witnessed this scene, and associating with it the remembrance of certain timid advances and wistful looks, divined the secret.

The sympathies of the crowd were easily aroused by the story, and special effort was made to fraternize with the lonely boy. Ever after it was the custom upon passing his window to call to him to come and join in all excursions.

As time passed , the Rinehard legend accumulated additional layers of event and invention:

Over the years, several incidents attributing almost magical efficacy to the "Rinehart!" cry accumulated around the core legend. Morrison (1936), for example, repeated the following story:

A Harvard graduate, pestered by touts in the courtyard of Shepheard's Hotel, Cairo, called "Oh, Rhinehart!" [sic] and was presently answered in the same kind from four or five windows, whose occupants then helped him to disperse the beggars.

Kahn (1969) added the story of "a Harvard man in Africa who was about to be kidnapped by some Arabs, screamed 'Rinehart!' and was rescued because there happened to be another Harvard man nearby in the French Foreign Legion".

The "Rinehart!" cry also penetrated American popular culture, John Barrymore mentioned it in the 1939 movie. The Great Man Votes, and the song "Harvard Blues," written by Harvard graduate George Frazer and recorded by Count Basie, includes the line, "Rinehart, Rinehart, I am a most indifferent guy . . ."

In Harvard Blues, it's hard to tell whether Rinehart is intended to be the protagonist or the referent of a ritual vocative:

I wear Brooks clothes and white shoes all the time
I wear Brooks clothes and white shoes all the time
Get three "Cs," a "D" and think checks from home sublime

I don't keep no dogs or women in my room
I don't keep no dogs or women in my room
But I'll love my Vincent Baby, until the day of doom

Rinehart, Rinehart, I'm a most indiff'rent guy
Rinehart, Rinehart, I'm a most indiff'rent guy
But I love my Vincent Baby, and that's no Harvard lie

Institute and Porky are my clubs
Institute and Porky are my clubs
And I think that girls at Radcliffe all are dubs

Went to Groton and got a big broad A
Went to Groton and got a big broad A
Now at Harvard and follow an indiff'rent way

Do my drinking down in the cool Ritz Bar
Do my drinking down in the cool Ritz Bar
Dad is Racquet and Chilton is my ma.

("Vincent", by the way, is here a reference to an exclusive women's club, not to an antique brand of motorcycle. Such clubs -- Institute, Porky = Porcellian, Racquet and Chilton are other references in the song -- were central to the anthropology of the American upper classes in the 1930s. And the "big broad A" is a reference the class-tinged trap-bath split, with a punning echo of the earlier line about "three Cs and a D".)

Here's the story of how Count Basie came to record this song in 1941, according to the Harvard magazine article -- he was charmed by the second stage of the Rinehart legend:

In his 1984 biography of Frazier, Another Man's Poison, Charles Fountain reports that in 1941 Frazier regaled his friend Basie with a sadsack version of the legend: "Rinehart was a friendless young Harvard who tried to present the illusion that he was in truth a popular sort by standing under his dormitory window and hailing himself," wrote Fountain. "Every other November, on the eve or the morning of the Harvard-Yale game, part of the atmosphere in the lobby of the Taft Hotel in New Haven was the faithful and incessant paging of Mr. Rinehart—'Call for Mr. Rinehart! Call for Mr. Rinehart!'— with never a Mr. Rinehart to answer."

All this amused Basie, and Frazier was emboldened to write his lyric. Basie and arranger Tab Smith wrote a tune for it. John Hammond, a Columbia Records impresario, recorded the blues, perhaps as a favor to Frazier. The song was popular and became a regular part of the shows Basie gave on college campuses. It enjoyed some critical acclaim, the New York Times's jazz critic calling it one of the "greatest of all blues lyrics." That went too far, Hammond told Fountain years later. "It wasn't very good."

The last stage of the Rinehart legend: oblivion. According to David Winter's article:

The custom seems to have died out after World War II and especially after the campus turmoil of the 1960s. (An informal poll of recent Harvard graduates by the present author turned up no one who had heard of either the cry or the legend.)

In the fictional world of Against the Day, I'm not sure whether the original Rinehart event is supposed to have taken place in 1900, as it did in reality -- in which case Vibe's phrase "a Harvard pleasantry from a few years back" would put the conversation in 1903 or so; or in the 1890s, as it did in some versions of the stage 2 legend -- in which case, the conversation might have taken place much closer to the Chicago World's Fair of 1893, where the book begins.

[Ironically, Allport later suggested (Gordon Allport and Leo Postman, "An Analysis of Rumor", Public Opinion Quarterly, 10(4) 501-517, 1946) a "basic law of rumor" according to which the strength of a rumor is proportional to the importance of the topic multiplied by the ambiguity of the evidence: R ≈ i x a.]

that

Posted by Mark Liberman at 12:02 PM

December 08, 2006

Plural, mass, collective

Nathan Bierma's "On Language" column in the Chicago Tribune, 11/29/06, fields a query about recent uses of the word troop:

Q. Recently, the media's use of the word "troop" has left me confused. I always thought that the word was a plural, like "bunch" is. However, glaring headlines have begun declaring facts such as "65 troops killed in Iraq."

-- Julie Stone, Darien

Bierma cagily disregards the misuse of the technical grammatical term plural and follows the letter writer's intent:

A. "Troop" has always meant more than one, from its origins in the French "troupe" -- a word that we still use in English for a group of actors...

The informal use of "troop" to mean "one soldier" may have been popularized in the Vietnam era.

Let's step back here and straighten out the concepts involved. The core of the problem is that English has several ways to "mean more than one".

Step 1: SG and PL. Many English nouns come in two versions, with notably different syntax; for the lexical item STUDENT, these are the two inflectional forms student and students, usually called "singular" and "plural", respectively. (Note that I'm using small caps to refer to lexical items and italics to refer to the various forms a lexical item takes in sentences.) I'm going to reject the standard labels, because they encourage you to think that the grammatical categories are semantically defined -- with a singular word used to refer to one thing and a plural to more than one -- while the fact is that the connection between grammatical categories and meaning is much more indirect. What I'll do instead is use the labels SG and PL, which are helpfully suggestive but also evidently novel.

A very small sampling of the complexities in the connection between SG/PL and meaning:

- One use of the quantity determiner MANY requires a SG head noun: many a as in "Many a student has suffered from stress" 'Many students have suffered from stress';

- Indefinite SG noun phrases can be used for universal reference, that is, reference to everything in some class: "A good student has no trouble with my exams" 'Good students have no trouble with my exams';

- So can definite SG noun phrases: "The lion is ferocious by nature" 'Lions are ferocious by nature';

- One or two requires a PL head noun, even though it explicitly allows for the possibility that only one thing is referred to: "One or students have complained" 'One student or two has complained';

- Some uses of SG and PL words don't denote things at all, but predicate properties or statuses of something: "Kim is a student" and "Kim and Terry are students".

(We'll see still more complexities below.)

Now, of course, a great many SG noun phrases do indeed refer to individuals, and a great many PL noun phrases do refer to more than one individual, so that the traditional labels "singular" and "plural" aren't bad. But they are misleading.

We do need to distinguish SG and PL, because words with these properties have different syntax, in a number of ways, just two of which I'll list here:

- SG nouns take the determiners THIS and THAT ("this/that student"), PL nouns the determiners THESE and THOSE ("these/those students");

- SG and PL have different verb agreement patterns: "The student was/*were delighted" vs. "The students were/*was delighted".

Step 2: C and M. Most English nouns belong to one or the other of two grammatical categories, usually labeled "count" and "mass"; as with "singular" and "plural", I'm going to shift to less obviously semantic labels, C and M, respectively. The lexical item BUSH is C, while SHRUBBERY is (for most speakers, Monty Python notwithstanding) M.

C nouns have both SG and PL forms ("the bush", "the bushes"), M nouns have only SG forms ("the shrubbery/*shrubberies"). SG C words and (SG) M words share some syntax by virtue of their both being SG, but these sets of words also differ extensively in the determiners they can occur with: for example, SG C words allow A, EACH, and ONE ("a/each/one bush/*shrubbery"), while (SG) M words allow ALL and MUCH ("all/much shrubbery/*bush").

(Though lexical items mostly come "off the shelf" with a classification as either C or M, English has a number of ways of converting M nouns to C nouns with a related meaning -- for instance, M WINE, in "Much wine is too alcoholic these days", to C WINE 'type/variety of wine', in "Many wines are too alcoholic these days" -- or vice versa -- for instance, C CHICKEN 'species of bird', as in "This chicken eats too much", to M CHICKEN 'chicken meat', as in "The potstickers contain both chicken and pork".)

The syntax here is more intricate than you might have thought. Particular constructions can require:

- a SG C word (I'll refer to such words with the label I, meant to suggest "individual");

- a (SG) M word, M for short;

- a PL (C) word, PL for short;

- a SG word, either C or M;

- a word that is either M or PL (I'll refer to such words with the label E, meant to suggest "extended").

This last, E, type is a surprise to most people, but it's very important to the workings of English syntax. Here are three contexts in which M and PL words function together:

- The determiners A LOT OF and LOTS OF in combination with bare nouns: "A lot of / Lots of shrubbery/bushes/*bush will burn easily";

- The postmodifiers GALORE and APLENTY: "There should be shrubbery/bushes/*bush galore/aplenty in the desert";

- The determiner ALL in combination with bare nouns: "All shrubbery/bushes/*bush in the desert will be fragrant".

There's a lot more, but this will do for our purposes. On to semantics.

So far we have one type of noun word that frequently (indeed, usually) refers to "more than one": PL words. Looking back at Julie Stone's letter to Nathan Bierma, we see that the word troop is certainly not PL (nor is bunch); it's an I word, and the lexical items TROOP and BUNCH are C nouns. Well, actually, there are two lexical items TROOP, both of them C, with somewhat different meanings: 'a group of soldiers or other military personnel' (a subtype of C noun that I'll put off discussing until the next section) and 'a soldier or other serviceperson'. The history of the second lexical item seems to have been that for some time it was used only in the PL, in the form troops, but eventually was extended to all the uses of C nouns, including as an I word. (More on this below.)

Putting troops aside for the moment, we've now entered the world of M nouns, where there are fresh possibilities for reference to "more than one". Some M nouns are unproblematic here: M nouns like WATER, WINE, and COFFEE, which denote substances that are not naturally divisible; and names of substance types like GOLD ("It's made of gold") and MAHOGANY ("Mahogany is expensive these days"). We then pass to M nouns converted from C nouns but denoting types rather than individuals (ROSE in "Some kind of rose was growing on the hillside").

Then things get sticky. A great many M nouns denote collectivities of things, but small things, especially small things whose indivual identities are not usually important to us: CORN, RICE, BARLEY, CHAFF, CONFETTI, etc. Some of these contrast minimally with C nouns of similar denotations, like BEAN, PEA, LENTIL. In any case, it would be easy to think of barley in "The barley was almost cooked" as "meaning more than one" in much the same way as lentils in "The lentils were almost cooked" does -- and in fact, every so often someone misidentifies little-thing M nouns as "plural".

The temptation to confound M and PL -- recall that they share a fair amount of their syntax -- is even stronger when the contributing bits are no longer particularly little, as with the M noun MAIL 'cards and letters'.

So far these are well-known, and much discussed facts. Now we get to something I think I discovered, some years back: a class of cases where a M noun clearly denotes more than one easily separable individual. Suppose I take you to my herb garden, where you can see here and there some tarragon plants, plus a long row of basil plants. I want to tell you that I'm growing a few tarragon plants and many basil plants, but I want to do this as compactly as possible (Omit Needless Words!), using the lexical items TARRAGON 'tarragon plant' and BASIL 'basil plant'. What do I say? That I'm growing a little tarragon and much (or a lot of) basil -- not a few tarragons and many (or a lot of) basils (unless I mean to refer to different varieties of these herbs). These lexical items are M, despite the nature of their referents. On the other hand, if I show you a heap on which potato vines sprawl, all jumbled up with one another, I'll tell you I'm growing potatoes there, not potato.

There's a pattern here: many plant names inherit their C/M classification from the nouns that denote the principal product (in our culture) of the plant in question. TARRAGON 'tarragon plant' is M because TARRAGON 'culinary herb' is M (and the name of the culinary herb is M because it's used in little bits whose individual identities are not usually important to us). Similarly BASIL 'basil plant'. I leave POTATO 'potato plant' as an exercise for the reader.

This is a complex system -- there's a whole lot more -- which does involve an association between C/M classification and (culturally salient) characteristics of the referents, but also includes additional principles that compete with, and often override, these "natural" associations (plus a certain amount of idiosyncrasy). In any case, we end up with some M nouns that "mean more than one".

Step 3: COLL and ~COLL. Still another way in which a noun can "mean more than one" can be seen in the C noun GROUP. This lexical item has perfectly ordinary SG and PL forms, group and groups, with unremarkable meanings. But the lexical item itself denotes a collectivity, in the sense that its referent has individuals as members or parts. This is the sense in which the letter-writer saw "troop" (and "bunch") as "plural".

The standard technical term here is "collective" (vs. "non-collective") noun; as usual, I'll use suggestive but non-standard labels: COLL and ~COLL.

A digression on further terminological confusions: I've complained here about Bill Walsh's -- and, following him, Bill Safire's -- use of "collective nouns" to refer to mass nouns. On an earlier occasion (his column of 12/10/00, p. 68, on the Word of the Year for 2000, chad) Safire relays a use of "plural" to refer to mass nouns:

...according to Peter Graham, now university librarian at Syracuse, who served early in his career as a key-punch operator: "We had what we called a chad box underneath the key punch. We resisted calling it 'confetti' because the small bits of paper, when they caught on your clothes, would not dislodge." Graham notes that the noun was then construed as plural, on the analogy of chaff, but today's ballot counters are referring to chads, construing the word chad as singular.

CONFETTI and CHAFF are, of course, M nouns, period, and CHAD is a M noun for some people ("The chad was scattered on the floor"), a C noun for others ("The chads were scattered on the floor") -- and some people have both usages.

So far, we have "plural" used for collective (Bierma's correspondent, who doesn't pretend to be an authority), "collective" used for mass (Walsh and Safire), and "plural" used for mass (Safire and his librarian informant). It seems that even those who set themselves up to be authorities on language and its use don't really know about mass nouns, and that "plural" is always available to refer to a word that "means more than one" in one way or another.

Let's return to COLL nouns. The facts here are mind-bogglingly complex, and there's a lot of variation, but there's one aspect of the system that would be inclined to lead people to think of COLL nouns as somehow "plural".

Background fact: COLL nouns frequently occur with following PPs consisting of the preposition of plus an object NP that denotes the kinds of things or stuff in the collectivity, the "contents" of the collectivity. So, with SG COLL nouns, we get things like "a group of students" (GROUP takes PL object NPs) and "a variety of information/facts" (VARIETY takes E -- M or PL -- object NPs). Now we have expressions with a head noun and a contents NP that can differ in grammatical number: SG for the first, PL for the second.

There are two ways of thinking about these expressions: either the head is the head and that's that, in which case these expressions are, as wholes, SG and take SG verb agreement ("A group of students is at the door", "A variety of sizes is available"); or the nature of the contents is what's important in the context, in which case these expressions are, as wholes, PL and take PL agreement ("A group/variety of students have been complaining"). For many head nouns, both usages are standard.

What's important is that we now have, in the second usage, occurrences of SG COLL head nouns that take PL verb agreement -- a fact that makes these COLL nouns "look plural" (though they clearly are not PL, since they have the determiners of an I noun). The way to look at this second usage is to think of the head noun as "transparent to" SG/PL and C/M, with the whole expression inheriting these properties from the contents NP:

A variety of information is/*are available: (SG) M
A variety of facts are known: PL (C)

Another way to think about the second usage is that it's partway along to a reanalysis of the head noun as a determiner of quantity. Some nouns -- DEAL in A GREAT/GOOD DEAL OF -- went down this road long ago, others -- LOT in A LOT OF and LOTS OF -- in the past couple of centuries, and still others -- BUNCH in colloquial A BUNCH OF ("A whole bunch of shrubbery was growing by the door", "A whole bunch of bushes were growing by the door") -- more recently. These determiners are generally transparent.

Troops over the years. At some point TROOP was a plain old COLL noun; "We had (many) troops in the field" was entirely parallel to "We had (many) brigades/battalions/companies in the field". But, as Bierma points out in his Tribune column, the PL troops will convey something like 'a lot of military personnel', and the way is open to a reinterpretation of troops as a ~COLL PL meaning 'military personnel'. There are then two lexical items TROOP, the old COLL one and the innovative ~COLL one, the latter occurring only in the PL; this is the state described in some dictionaries, for instance AHD4.

The path from this state to the current one, where the ~COLL noun has been extended to the full privileges of a C noun, perhaps went via uses with smaller and more exact quantity expressions ("We had thousands of troops in the field", "We had 4,000 troops in the field", "We had 273 troops in the field", "Four troops were killed yesterday") over the years, until it appears as a SG referring to a single serviceperson. (Someone should investigate this history.) I'd thought this was a rarely recent development, but in contexts within the military it goes back at least to the Korean War (as recent discussions on the ADS-L showed). What might be fairly recent is its regular use in news reports and the like. As Bierma notes:

Whatever the source, the new use of "troop" made possible a recent headline in the satirical newspaper, The Onion, under a picture of a single soldier: "Kuwait deploys troop."

And whatever the source, ~COLL troop is a useful thing to have. The alternatives have various defects: soldier properly applies only to the Army (the Navy, Marines, and Air Force regularly object to having the word used with reference to them); serviceman is sex-marked; serviceperson is an awkward multi-syllabic substitute; (military) personnel is (like police) a PL-only word (yes, there are all sorts of exotica in the world of SG/PL); and so on. So troop is a good solution. Now we just have to get used to it.

[Addendum 12/9: A military informant reports the frequent use of servicemember, at least in administrative contexts, and a Google search confirms that it occurs in such contexts with some frequency, and also occasionally in news reports from military sources: "A U.S. servicemember was wounded Feb. 24 when a vehicle..." (DefenseLINK News).]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:19 PM

Grave and deteriorating

"The situation in Iraq is grave and deteriorating." The opening words of the report of the Iraq Study Commission have been echoing around the world since Wednesday (grave and deteriorating already gets over half a million hits, and the number is rising; there are well over 3,000 hits on Google News alone). We all look upon that statement with great seriousness and a sinking heart. But the syntactician, qua syntactician, also thinks, aha! A very clear case, in carefully considered natural prose, of a coordination in which the coordinates belong to entirely different syntactic categories.

Grave is an adjective. Not only is very grave grammatical (and modification with very is really only possible for adjectives and adverbs, but not verbs), but the word is inflectable (not all gradable adjectives are, even when their meanings would appear to make that a possibility), so we get the comparative and superlative forms graver and gravest. We can also say It seems grave and It became grave, too (these constructions being pretty good tests for adjectivehood). And so on.

But deteriorating is definitely not an adjective. It is the gerund-participle form of the verb deteriorate. I do not say this just because of its -ing ending. There are adjectives that end in -ing. Charming is one. And it has a gradable meaning, so very charming is grammatical. (It happens not to be inflectable, though, so we find more charming and most charming rather than *charminger or *charmingest in comparative and superlative constructions.) But deteriorating fails in every way to exhibit adjective properties. We do not find (for example) *very deteriorating, or *It seems deteriorating, or *It became deteriorating.

What we learn from this (and many more arguments that could be given) is that the simplistic view of coordination is wrong. The simplistic view would say that coordination is a matter of using a coordinator (a word such as and) to link into one grammatical unit a series of adjectives, or a series of verbs, or a series of noun phrases, or a series of clauses...

The real truth, as usual, is quite a bit more complicated. Though not quite as complicated as the situation in Iraq.

Posted by Geoffrey K. Pullum at 08:20 PM

Another year of truthiness

Merriam-Webster has announced its Word of the Year, and it's our old friend truthiness. We've been tracking the word's progress ever since Stephen Colbert introduced it on the first episode of his Comedy Central show back in October 2005. The word got a big boost when the American Dialect Society selected it as the Word of the Year for 2005, and now the ADS is looking prescient for having jumped on the truthiness train a year before Merriam-Webster got to it.

Here is Colbert's reaction, as reported by the AP:

"Though I'm no fan of reference books and their fact-based agendas, I am a fan of anyone who chooses to honor me," he said in a statement e-mailed to The Associated Press. "And what an honor. Truthiness now joins the lexicographical pantheon with words like 'squash,' 'merry,' 'crumpet,' 'the,' 'xylophone,' 'circuitous,' 'others,' and others."

In past years, Merriam-Webster selected its WOTY by calculating which word got the most look-ups on m-w.com, but this year it was strictly a popularity contest, with visitors to the site asked to submit votes for their favorite choice. Merriam-Webster reports that truthiness beat out such contenders as google, decider, war, and insurgent, winning by "an overwhelming 5 to 1 majority vote." One suspects some stuffing of the ballot boxes going on here, especially considering that Colbert fans easily scored a win for their hero in an online contest to name a Hungarian bridge, trouncing the early leader Chuck Norris. (The Hungarian government ended up disqualifying Colbert, on the grounds that he's still alive.) As far as I know, Colbert didn't make an on-air appeal to win this contest, as he had with the Hungarian bridge campaign. The Colbert Nation and like-minded crusaders for all that is truthy won this one on their own. And I wouldn't be surprised if truthiness triumphs in a similar WOTY contest over on Dictionary.com.

As for WOTY selections that are not chosen by the vox populi, the New Oxford American Dictionary decided on carbon neutral, reflecting new efforts by individuals and companies alike to combat global warming by decreasing or offsetting the emission of greenhouse gases. (Full disclosure: I've recently joined the Oxford team and was involved in the selection process.) Meanwhile, Webster's New World College Dictionary went with Crackberry. The American Dialect Society will round out the WOTY season, making its choice at the group's annual meeting in Anaheim on Jan. 5. But last year's winner, all-conquering truthiness, might be a tough act to follow.

Posted by Benjamin Zimmer at 05:17 PM

Crimson worm rhapsody

Between you and me, I keep finding that just when I'm beginning once again to be prepared to trust etymologists, they go and do something that makes me want to forget about it again. I can definitely believe that worm and crimson have a common source: there's the crucial [r] and [m] as the main consonant sounds in the root, and we know that crimson pigment was made from worms so there's a meaning connection, and I'm just beginning to develop some trust... and then they tell me that rhapsody is also a derivative of the same Indo-European root (really, it does say that under wer-², and also under wed-², in the American Heritage Dictionary appendix on Indo-European roots — Mark didn't just slip that one in to see if you were paying attention this morning).

Rhapsody. Yeah, right. Worm is related to rhapsody because of the [r], and rhapsody is related to saxophone because of the [s], and saxophone is related to onyx because of the [o] (and hey, there's an x too!), and onyx is related to asininity because of the [n]... Suddenly trust has evaporated and it all seems suspicious again, and I'm remembering the satirical remark of Mark Twain's that the name Middletown is derived from Moses (by loss of -iddletown and addition of -oses, don't you see). I don't do etymology. For me (and I suspect many linguists) it is the topic that first made me think I might be interested in linguistic study, when I was a kid, but then I found it was entirely different things about language that really made linguistics exciting for me. I never went back to word origins, and I doubt I ever will (far too much specialized knowledge of ancient languages is called for) — even though it is perhaps the thing linguists are most likely to be asked about by members of the general public.

[Update: I have started getting emails from philologically very competent people (thanks in particular to J. S. Bangs) who normally begin by asking me if I am serious (the answer is no: language is much too fascinating to be entirely serious about) and then, as if losing faith, start very solemnly trying to convince me that rhapsody really is related to worm.

This is not something that I really doubt. In brief, the -od- bit is a stem meaning "song" (and of course we have the Greek-derived word ode from that), and the rhap- part (in which the -h- doesn't matter; it's just a Greek thing, like the connecting -s) is from a root meaning something to do with folding or bending or sewing or stitching or wrapping that may actually be related to the English word wrap (in which the w- doesn't matter; it's just a Germanic thing), so all that remains is to find a way to tell a plausible story about how rhapsody originally meant "sewn or stitched song" (don't ask me to explain this; I've already told you more than I know)... I don't want to mislead you: etymology for Indo-European is highly developed, and they're not telling lies. I'm not saying that I really truly think they are making stuff up. I know these things they say are true. That is not my problem. My problem is that I sometimes find I cannot make myself feel like I think they are true. You see the difference?

Just to increase my discomfiture, John Anderson (thanks a lot, John) has insisted on reminding me of the following facts about the staggeringly widespread Indo-European language family, which more than a millennium ago had already established itself from Iceland in the north-west to Ceylon (now Sri Lankaa) in the south-east, and was once found as far east as Chinese Turkestan: the English word head is actually related in ancestry not only to capital but also to chapter; precocious is historically connectable to apricot; the word hound is related to cynic; and weird is related to rhombus (and worm and rhapsody and stalwart and vertebra and wrath and wrong and wrestle and briar, of course). I repeat, I know perfectly well that these things are true. I just can't make it feel like they are.]

Posted by Geoffrey K. Pullum at 12:12 PM

Advertisement for a roboticist

[Here's a note that I just sent to the "penguists" mailing list, which goes to people interested in speech and language research at Penn.]

It's a cold day, and you might not think you're interested in robotics. But I'd like to suggest that you head over to 3401 Walnut St. at 12:00 today to hear Dan Koditschek talk about "Programming Work".

His abstract:

Despite decades of steadily accelerating computational power and recent advances in sensor technology, we have not yet achieved robots that can perform work according to our wishes. One challenge arises from the disappointing power density of available actuators: there are severe limitations to the rate at which work can be performed. But animals, suffering similar constraints, manage to go about their work quite effectively, exchanging energy with their environments to perform numerous and varying tasks in a goal directed manner. The fundamental problem of robotics remains how to specify and execute programs that encode the goal directed exchange of energy (e.g., forces over motions) within a physical environment.

I will review a two decade agenda of robotics research seeking useful symbols and effective compositional methods that connect them into programs for the specification and execution of physical work. Our effort is strongly inspired by biology and I will mention some of the ways in which we have tried to identify and borrow principles underlying the use animals make of their bodies in accomplishing their work. These efforts have, in turn, suggested new hypotheses about how animals’ bodies make use of their nervous systems in so doing. Over time, animal nervous systems have given rise to animal brains and I hope this talk may stimulate an exchange of views leading to new hypotheses about how the brain works and the body thinks.

Make a few substitutions ("communicate" for "perform work", for example) and you could define an analogous set of issues in the study of speech, language and communication. We have the advantage -- and perhaps also the disadvantage -- of some well-founded ideas about how to identify and manipulate the "useful symbols ... for the specification and execution" of communication. But those of use who are interested in speech, language and communication also could use some new ideas about how the brain works and the body thinks, and in return we can contribute plenty of ideas and problems, old and new.

Posted by Mark Liberman at 10:17 AM

wextract

It's sometimes useful to be able to extract a bunch of little audio files from one big audio file in .wav format. You know the time boundaries of the pieces that you want, based on a time-aligned transcription or other annotation. You don't want to do all the extractions by hand using an interactive waveform editor, and you'd like a light-weight, scriptable program that can do it for you. After looking around for a while, I couldn't find any suitable free software solution to this problem. (I was especially disappointed that sox doesn't have the ability to extract a piece of a file.). I used to deal with this problem by using sox to make a copy of the file with no header, and then using a generic little seek-read-write program to extract the desired chunk of bytes (whose size and location depends on the sampling frequency, sample type and number of channels), and then using sox to put a .wav header back on. But it's annoying to have to produce the headerless copy -- especially if you're dealing with a 600-MB file. And if you use the wrong sampling frequency or channel count, everything else comes out wrong too. So I decided to make a generic little seek-read-write program that understands simple .wav headers. If you need such a thing too, here's wextract.c. If you don't want or need it, ignore this post.

[Update -- boy, is my face red! Keenan Pepper points out that among its umpteen options, sox actually does have a function "trim" which does exactly what I wanted. In my defense, I can only say that in addition to mis-reading the documentation, I asked a couple of other sox users, who also failed to find this function... Anyhow, I learned how .wav headers work, said he lamely.]

[Meanwhile, Bill Poser was seized by a fit of hackery, and modified wextract to use libsndfiles, which I had considered and rejected as too complicated. He even packed it up using Gnu autoconf and all. If you want to see how to use libsndfiles, or to use it for some other purpose, the tarball is here. Warning: you'll have to install libsndfiles first -- and you'll also have to execute ldconfig as root, since the libsndfiles "make install" command unaccountably fails to do that. If none of that makes sense to you, consider yourself lucky and move on.]

Posted by Mark Liberman at 09:15 AM

Crimson = worm?

I've been slowly making my way through Pynchon's Against the Day. It's been slow because I've been busy, and also, well, it's just slow, at least so far. However, a few days ago I got to page 156, where Kit Traverse meets his would-be benefactor Scarsdale Vibe in New Haven on the weekend of the Yale-Harvard game at some point shortly after 1900:

Pre-game passions were running high. Venerable professors of Linguistics who had never so much as picked up a football had been earnestly reminding their classes that, by way of the ancient Sankrit krimi and the later Arabic qirmiz, both names for the insect from which the color was once derived, "crimson" is cognate with "worm". Young men in striped mufflers knitted by sweathearts who had duitfully included rows of flask-size pockets ran clanking to and for, getting a head start on the alcoholic merriment sure to prevail in the stands.

The business about "crimson" and "worm" seems to be sort of true, like a lot of things in this book. The American Heritage Dictionary's etymology for worm:

Middle English, from Old English wurm, variant of wyrm. See wer-² in Appendix I.

The reference to the Appendix takes us to:

Conventional base of various Indo-European roots; to turn, bend.
Derivatives include stalwart, weird, vertebra, wrath, wrong, wrestle, briar, rhapsody, and worm.

The AHD's entry for crimson gives the etymology as:

Middle English cremesin, from Old Spanish cremesín, Old Italian cremesino or Medieval Latin cremesīnus, all from Arabic qirmizy, from qirmiz, kermes insect. See kermes.

And kermes, in turn, is from

French kermès, short for alkermès, from Arabic al-qirmiz, the kermes, probably from Sanskrit kṛmi-ja-, (red dye) produced by worms. See k^wṛmi- in Appendix I.

And when we go to Appendix I (the list of Indo-European roots) we find:

Worm. Rhyme word to *wṛmi-, worm (see wer-²). carmine, crimson, kermes, from Arabic qirmiz, kermes, borrowed from Sanskrit compound kṛmi-ja-, “(red dye) produced by worms” (-ja-, produced; see genə-), from kṛmi-, worm.

Wikipedia has a more elaborate discussion, including some relevant chemistry and biology.

Posted by Mark Liberman at 09:13 AM

Maybe Jacques Lecoq did it

In a couple of Language Log posts back in March of 2005 ("Gibberish by any other name", "Fo did it"), I concluded (with considerable help from Ray Girvan and Stefano Taschini) that the word grammelot is probably a modern invention. The "rambling nonsense-speech" that this term describes may have been a feature of the Commedia dell'Arte 500 years ago, but the term itself seems to be a recent one. I suggested that perhaps "the term was invented in the 1960s by Dario Fo and Franca Rame to describe their own linguistic experiments", and in the course of the discussion, I quoted from a paper by Adrienne Ward, which in turn quoted from a book by Antonio Scuderi. Professor Scuderi has recently written to correct me.

Professor Scuderi's note, reproduced with permission:

The other day, in a moment of egoistic self-indulgence, investigating books and articles where my work has been cited, I came across my name on your blog, Language Log, in an installment entitled "Fo Did It" (2005). Apparently without opening a book, the author concludes that the Italian Nobel Playwright, Dario Fo, invented the term grammelot, which refers to an aural performance technique. He quotes a long passage by someone else who quotes me, and then concludes, "So either Ward and Scuderi have misread Fo, or this word has someone (sic) survived for half a millennium in the theatrical demimonde, without leaving detectable traces in the literary and linguistic history of Italy, France and England" (2005:3).

Okay, it's just a blog, no responsible peer reviewing required, etc. But still... If the author had cracked open the two books of mine that are mentioned, he would have found that I cite where the term is listed in a major Italian dictionary (Zingarelli 1995:797), and a survey of attempts to trace the origins of the word. In my essay, "Updating Antiquity," in Dario Fo: Stage, Text and Tradition, I explain that according to John Rudlin (Commedia dell'Arte: An Actor's Handbook, 1994:59-60), Fo most likely learned the technique from Jacques Lecoq, who "definitely" learned it from Jean Dasté, who had used it with the Copias troupe, which had called it grummelot. I explain that the etymology is uncertain and provide several hypotheses. None suggest that Fo invented the term, and in fact, Fo himself makes it clear that it did not originate with him: "termine di origine francese, coniato dai comici dell'Arte e maccheronizzato dai veneti che dicevano 'gramelotto'."

Wrestling with this enigmatic term in a scholarly endeavor was not easy. In any instance, of course, it is disappointing to see the results of such research bandied about in an off-hand irresponsible manner. In the present case, the invention of Fo inventing the term has reached Wikipedia by way of Language Log, so that myth is not just lost in a blog, but, alas, presented in a forum that some will trust.

Well, actually, there's a sense in which blogs really are peer reviewed, as Professor Scuderi has just demonstrated. I've appended his note to my original post, with a response to the effect that there is still no evidence that the term grammelot or any of its variants was used 500 years ago, or even before the 1960s. And I suggested that he correct the Wikipedia entry, which he promised to do.

Anyone who reads this blog knows that I'm in favor of careful scrutiny of ideas, free debate, and scrupulous correction of mistakes. We always make corrections in a way that is at least as prominent as the original error, and doesn't hide the fact that a mistake took place. I'll leave for another time a discussion of the future evolution of the peer-reviewing process in scientific and scholarly publication, and instead just point to the contrast between peer-reviewed journals, weblogs and other methods for honestly attempting to present the truth on one hand, and the current practices of some esteemed media organizations.

And let me remind all readers that in case of less than full satisfaction, the Language Log Customer Relations Department stands ready to refund double your subscription price.

[As several people have pointed out to me, my first version of this post substituted "1995" for "2005". Time flies, but not that fast.]

[Professor Scuderi responds:

I really don't want to waste any more time on this, especially in this format. I don't appreciate being dragged into your discussion for all to see, cited as a secondary source, as if I am claiming that I know where the term comes from, which I never did. I do know that much of the commedia dell'arte was passed down to forms of low-brow theater, which derive from it, whole or in part, such as mime, music hall, vaudeville, circus clowning and so forth, in an oral tradition. This means that maybe, just maybe, you can't find some of the technical terms of these performers listed in print anywhere, until after they were in use for some time. Some may even have been lost with no record. The technique of imitating the sounds, intonations, and cadences of a given language, while inserting a key word that the audience can understand, was known to some American vaudevillians, for example, as "double talk." (To see it done, just watch some old Sid Caesar sketches.) I'd just like to say that if Copias, for example, or Lecoq invented the term then "Fo Didn't Do It."

Fair enough. Many terminological histories are possible -- in this case, such evidence as there is suggests that the term "grammelot" dates from the second half of the 20th century. As for the dragging part: Adrienne Ward quoted Antonio Scuderi, and I quoted Ward quoting Scuderi, and Scuderi wrote with a correction, which I promptly posted, and then an objection, which I've just posted. If that be dragging, make the most of it.]

[Trevor, who doesn't have to be dragged into anything, writes:

It sounds rather like a Romance version of gramatol, which turns up in Speke Parott by the satirist John Skelton ca 1500 (nodypollys and gramatolys of smalle intellygens) meaning windbag (Greg Walker, John Skelton and the Politics of the 1520s) or smatterer (Thomas Wright, Dictionary of Obsolete and Provincial English). Skelton was a player in the Grammarians' War and Tudor England despised the learned, so I guess you could hypothesise something along the lines grammatologist(s) -> gramatolys. This is more or less what Steve McCaffery suggests in (p261) Prior to Meaning: The Protosemantic and Poetics, although I've never actually come up against the word grammatologist in Tudor literature & haven't got corpus access to check.

I'm sure ;-) you've read Jacques's Of grammatoly.

The OED glosses gramatol as "A smatterer", giving the single citation

a1529 SKELTON Sp. Parrot 319 Nodypollys and gramatolys of smalle intellygens.

but has so far missed the chance to include grammatoly in its Jacquian sense.

The Skelton citation is the only hit in the LION database. More of the passage, which is worth quoting:

316 And thowe sum dysdayne yow, and sey how ye prate,
317 And howe your poemys arre barayne of polyshed eloquens,
318 There is none that your name woll abbrogate
319 Then nodypollys and gramatolys of smalle intellygens;
320 To rude ys there reason to reche to your sentence:
321 Suche malyncoly mastyvys and mangye curre dogges
322 Ar mete for a swyneherde to hunte after hogges.

Neither grammatologist nor gramatologist nor any of their obvious typographical variants occurs in LION or in EEBO, alas.]

Posted by Mark Liberman at 06:46 AM

Vocabulary size and penis length

Luis Martínez-Fernández, writing in the Chronicle of Higher Education, claims that "a person whose native language is not English can adopt the English language as a means of communication for a variety of reasons," and that one of them is "the need to use a more precise language with a richer vocabulary. (English has about 900,000 words, while French, for example, has fewer than 100,000.)"

Where do people get this stuff? It rather looks as if Martínez-Fernández may have swallowed the self-promoting Paul Payack's specious claim that the number of words in English is creeping up toward one million. But what about the support for the claim that poor old French can only muster a hundred grand? (I know I once claimed French is a miserable and inadequate language. But I was only kidding.)

It scarcely matters what number you give in contexts like this. The sort of people who are prepared to believe that you get greater richness and precision when you have more available words will believe anything, so you can feed them any numbers you like. Not long ago a significant number of totally clueless journalists heard that a gigaword corpus had been collected and ran away with the notion that all the words in it were different, so they trumpeted that English had a billion words. (Confusing a corpus with a dictionary is roughly comparable to confusing the set of all cars now driving on American roads with the set of distinct car models available in the catalogs of US manufacturers.)

Why does the number of lexical entries in the dictionary matter to people, as opposed to the number of fax machines, or the number of lost socks? Teresa Cunningham, who pointed me to the Martínez-Fernández article, lives in Europe, where she has plenty of experience in talking to people in languages other than English, and she remarks: "I have never had anybody turn to me and ask to ‘borrow’ an English word so they can express their thoughts more precisely while speaking another language." Quite so.

Precision, richness, and eloquence don't spring from dictionary page count. They're a function not of how well you've been endowed by lexicographical history but of how well you use what you've got. People don't seem to understand that vocabulary-size counting is to language as penis-length measurement is to sexiness.

Posted by Geoffrey K. Pullum at 01:05 AM

Enough Latvian to get by?

I'm watching through Alfred Hitchcock's oeuvre for a writing assignment lately (don't ask) and recently caught an intriguingly archaic way of depicting bilingualism.

In Foreign Correspondent, from 1940, Joel McCrea is in Amsterdam trying to tell local policemen about a criminal scene he has just witnessed. The policemen do not speak English. This we can let squeak by in terms of plausibility, as knowledge of English wasn't nearly as widespread among Europeans as it is today.

The cute part, however, is when they enlist a little girl to translate. When she announces "We always use English in school," she sounds more like an American doing the Muppets' Swedish chef, and then, when she speaks Dutch to the officers, she is plainly an American girl reciting the sentences phonetically, with an accent that is pure Cleveland.

No movie director who wanted to be taken seriously could get away with that today. We would expect, almost demand, an actual Dutch-speaking girl in the part. If necessary, she would be taught her English lines phonetically.

Yet this slapdash treatment of foreign languages and bilingualism was typical in American movies until the seventies. Earlier in Foreign Correspondent, well-travelled peace activist Laraine Day meets a Latvian ambassador at a reception. For one, the ambassador speaks neither English, French, nor German — I see. But then, never fear — Day's character turns out to know "enough Latvian to get by"!

For the record, the character is not depicted as having had any particular reason to get her feet wet in Latvian. It's apparently just that, well, you know — she's travelled a lot over there, and so she just knows a little of all of "those languages," even Latvian, and one imagines, Occitan, Sorbian and Friulian when they come in handy.

This sort of thing was par for the course in the pop culture in general back then. In I Love Lucy, Lucy is constantly depicted as speaking barely a single word of Spanish, despite it being the native language of the man she has been married to for over ten years.

One might wonder -- how is it that people as sane and intelligent as we are let implausible depictions of language use like this pass so casually?

I think one reason was that in 1940, sound film was less than fifteen years old, and the feeling that sound movies were on some level filmed plays had not completely dissipated (for example, by 1940, the term "photoplay" was only recently obsolete).

Thus in Foreign Correspondent, as in so many movies of the period, during scenes set outdoors it is painfully obvious from the way people's voices echo that they are on an indoor set, which plainly has a painted backdrop. There is little effort to disguise that it is a filmed performance — the painstakingly "realistic" quality of modern film would have seemed peculiar to filmmakers then, and in fact producers like Samuel Goldwyn resisted realism overtly ("People don't pay to see their kitchen," he once said).

In an environment like that, it's easy to see why there wasn't felt to be a strong need to dot every i and cross every t in terms of foreign languages.

Another part of it was also, I think, that America overall has more interest in foreign cultures than America then. Partly because of the realities of racial segregation until the late 1960s, partly because of how few immigrants were allowed into the U.S. between the mid-twenties and the late sixties, and partly because of the multiculturalism explosion after that time, American pop culture is more given to a more realistic approach to foreign languages than back in the days of baseball and apple pie.

Thus Gary Oldman spoke Romanian when playing Dracula. Spanglish gives us a grinding depiction of a Mexican woman's efforts to learn English, including demanding from her daughter how to render "Just try it on" complete with the pragmatic shading this entails, rather than having her turn up speaking and comprehending fluently after twenty minutes.

A recent episode of Studio 60 on Sunset Strip had a Chinese teenager translating for her father, and it is unimaginable that they would have cast a Chinese-American faking a Chinese waiter accent with the whites and chanting an American undergraduate's Chinese to her father full of tone mistakes. In New York, the stage musical The Light in the Piazza, set in Italy, has characters speaking untranslated (and carefully coached) Italian so much that it annoys some audience members.

Today, the old-time approach to foreign languages is mostly permitted in comedy. I recall an episode of Saturday Night Live in the nineties in which Melanie Hutsell had one line in a sketch set in witch-burning Salem, as an aggrieved Dutch person. Like the little girl in Foreign Correspondent, she did the line in a silly Swedish chef accent, including pronouncing "Dutch" as "dootch." But Hutsell even seemed to be in on how absurd her delivery was, breaking character and laughing a bit at herself as she finished the line.

And only a bizarre and comedic film like Borat can go back to the old days and have Sascha Baron Cohen speaking a mishmash of Hebrew sprinkled with some Russian here and Polish there and pass it off as Kazakh.

Posted by John McWhorter at 12:41 AM

December 07, 2006

Happy birthday, Noam

Happy birthday, Noam. (Noam Chomsky, unquestionably the most famous member of the linguistics profession, turned 78 today.)

And by the way, here is a very little-known fact about wishing people a happy birthday. The song "Happy Birthday To You" was written in 1893 by two sisters, Mildred J. Hill and Patty Smith Hill, who were schoolteachers in Louisville, Kentucky. They died unmarried and childless, but left behind a foundation, the Hill Foundation, which had a share of the royalties. The song is still earning two million dollars of royalties a year, for a company that bought the rights; see this page for some history. After Patty Hill passed away in 1946, some of the Hill Foundation's money went to their nephew Archibald A. Hill. And he was a distinguished linguist, formerly of the University of Texas. When he died, he left some money to the Linguistic Society of America. Thus some of the money earned by "Happy Birthday" ended up making the LSA significantly more prosperous. I have been told that it was primarily the Hill bequest that made it possible for the LSA to purchase its present office suite in Washington, DC. Thanks from all linguists, Arch. And thanks to Mildred and Patty as well.

Posted by Geoffrey K. Pullum at 06:51 PM

Apocalypto: Raising linguistic hackles

After months and months of anticipation, Mel Gibson's Mayan epic Apocalypto is finally upon us. Originally the buzz surrounding the film was mostly about Gibson's choice to shoot the entire film in Mexico with local actors speaking Yucatec Maya. Now, of course, observers are more interested in speculating if the film will be dead-on-arrival at the box office thanks to Mel's notorious anti-Semitic rant and DUI arrest last July. But linguistic issues are still getting some attention in the Apocalypto coverage, for instance in this Associated Press article describing the mixture of excitement and ambivalence among the Yucatec Maya community about a major Hollywood movie filmed in their indigenous language. But what about that foreboding Greek title?

You might have seen Gibson himself explaining in the TV commercials for the movie that "Apocalypto means 'a new beginning,'" an assertion he was making to reporters more than a year ago. Back then, Languagehat ridiculed Gibson's gloss, pointing out that the Greek word apokalypto (ἀποκαλύπτω) is actually a verb meaning 'uncover; disclose, reveal'. I speculated (in Update #2 here) that the 'new beginning' interpretation might have something to do with New Age mysticism, adherents of which take great stock in the notion that the Mayan calendar — and therefore the world — is supposed to end on December 21, 2012. As Gibson intones in the commercials, "Unfortunately, to have a new beginning, something else has to end." ("Like your career," one wag responded.)

So is Apocalypto really about the 2012 theory? The writer of the AP article investigated this possibility:

Mauricio Amuy, a non-Maya actor who participated in the filming of Apocalypto, says the production staff discussed the theory on the set.
"We know the Bible talks about prophecies, and that the Mayas spoke of a change of energy on Dec. 22, 2012, and it (the movie) is somewhat focused on that," Amuy said. "People should perhaps take that theory and reflect, and not do these things that are destroying humanity."

Fortunately, it seems that not everyone is buying Gibson's New Age-y mistranslation of the title. Chicago Tribune movie critic Michael Phillips takes exception in his review of the film:

Gibson and company chose to translate the Greek word "apocalypto" as "new beginning," which has raised linguistic hackles, since the word is a verb meaning "uncover" or "reveal." The director has said he considers his apocalyptically scary-sounding title to be "a universal word. In order for something to begin, something has to end. ... But it's not a big doomsday picture or anything like that." Right. No more so than "Passion," anyway.

Nice to see the griping of linguabloggers get noticed, even if the owners of those raised hackles are left unspecified. I'm guessing Phillips mined Languagehat for that one, since a Google search for <apocalypto greek> returns last year's LH post (and my response) as the second result, right after the Wikipedia page for the movie. Now if we could only get reporters to stop referring to Yucatec Maya as an ancient and obscure language...

[Update: John Lawler saw the movie last week at a preview at University of Michigan — Touchstone Pictures invited UM language faculty and students, so the Linguistics Club made an event out of it. You can read his pre- and post-Apocalypto reactions on his blog. His takeaway message: "Read the Popol Vuh and skip the movie."

And for a withering critique of the film with more on the Mayan point of view, Simon Musgrave recommends "Mad Mel and the Maya" by Earl Shorris in the latest Nation.]

Posted by Benjamin Zimmer at 04:47 PM

Most Recess Appointments are Unconstitutional

John Bolton's resignation as US Ambassador to the United Nations in the face of near certain failure of the Senate to confirm him has yet again brought to public attention the power of the President to make recess appointments. The received view is that the President may make an appointment without Senate confirmation whenever a position is vacant and the Senate is not in session. Such an appointment expires at the end of the next session of the Senate. I happened to be reading the US Constitution and something caught my eye. The power to make recess appointments derives from Article II, Section 2, Clause 3:

The President shall have Power to fill up all Vacancies that may happen during the Recess of the Senate, by granting Commissions which shall expire at the End of their next Session.

As I read it, this permits the President to make recess appointments only when a position falls vacant while the Senate is in recess, not when a position falls vacant while the Senate is in session and the vacancy persists into a recess. This is because the verb "happen" has only a punctual reading. A vacancy "happens" when the previous occupant of the position dies, resigns, or is removed from office, or when a new position is created. It does not continue to happen. Although I am a native speaker, I am not a specialist in English, so I consulted my colleague Geoff Pullum, co-author of the Cambridge Grammar of the English Language, who reports that he agrees with me. On this interpretation, most recent recess appointments, including those of John Bolton and of judges William Pryor and John Pickering were ultra vires.

A bit of research using Google turned up a recent paper by Michael B. Rappaport of the University of San Diego School of Law entitled The Original Meaning of the Recess Appointments Clause, which makes the same point. Rappaport also argues that the clause is only intended to apply during intersession recesses, not during shorter recesses. If the plain meaning of the clause renders most recess appointments unconstitutional, why has this illegal practice been permitted to continue for so long and, apparently, without much controversy?

Addendum: Reader Steve Carlson points to another recent discussion of this issue, a paper entitled Recess Appointments of Article III Judges: Three Constitutional Questions by Edward A. Hartnett of Seton Hall Law School.

Posted by Bill Poser at 03:56 PM

America's Least Wanted?

This is a reflection on Mark Liberman's post, "An Early New Year's Resolution," some advice to linguists when they deal with the media.

Okay, I admit it. I was the linguist whose contribution was cited (15 seconds worth) on the November 18, 2006 television program, America's Most Wanted. No, it didn't mention my name (thank God). As is so often the case, the media didn't use what I told them the way I would have wanted. No, it wasn't all that interesting but there seems to be a lesson here. Mark warned himself to never mention words for snow to any journalist at any time. That's one solution. There seems to be little reason to believe that our reports come out the way we think they should.

The tape recorded telephone call that America's Most Wanted sent me for analysis was only about 30 seconds long. From this I was asked to discover whatever I could about the unknown speaker's regional and social dialect. From the beginning I warned the program about how tentative any such analysis would have to be. In that brief sample I found no clues that could place the speaker in the South, New England, the Inland North, the West, or the Southwest. Then, not mentioning most of my qualifiers "could," "may," and "perhaps," the program made it seem pretty certain that the caller was from a region somewhere along the North Midland/South Midland dialect areas of the Midwest. Very briefly a map of this area flashed on the screen and the information was treated with much more authority than I tried to give it. I couldn't give them much of diagnosis. But then, there wasn't much to diagnose.

It seems that more and more frequently the media are asking linguists to contribute to stories, programs and articles. This can be a good thing. A handful of reporters do it very well but many others take a quick look at what we give them, cull out only that which suits their preconceived notions, sometimes get it wrong, and often ignore what we believe to be the really important stuff we've told them. I suppose journalists believe they have a right to do this and I suppose we can only be embarrassed when any damage is done. But it gets pretty discouraging when, as happened to me a few years ago, a prominent magazine reported that I said I could tell when someone was lying when what I actually told the reporter was exactly the opposite of this.

So why do linguists keep on getting sucked into cooperating with the media when we have a pretty good reason to think that what we say won't come out the way we thought it should? I suppose some harbor the fond hope of getting 15 seconds of media fame. Wrong. Or, as in my experience with America's Most Wanted, a friend who I respect referred the reporter to me and I felt some kind of obligation to that friend. Wrong again. Or we want the facts to be accurate and to see our field become better known and appreciated. A much better motivation. Whatever the reason, we agree to a brief interview or provide a written statement that is far too often misunderstood, mutilated, or inadequately summarized.

But let those in the media who distort or cut corners beware. Language Log is on the alert for questionable representations of our linguistic contributions, as evidenced by the way Mark revealed how the BBC got wrong what a famous British phonetician told a reporter about the report that cows moo with a regional accent (September 3, 2006) And Mark's post, "Silly Season in the BBC Science Section" (August 26, 2006) doesn't seem to be limited to science reporting.

So dear media friends, be vigilant. As my cleaning lady once commented during the Nixon impeachment hearings, "What's done in the dark will come out in the light."

Posted by Roger Shuy at 01:58 PM

"Mel Martinez is Spanish for Harriet Miers"

Andrew Leonard at Salon reports that the linguistic nationalists at English First are in an uproar over President Bush's selection of Sen. Mel Martinez (R-Fla.) to take over the chairmanship of the Republican National Committee. The group disagrees with Martinez's position on immigration reform, but what really gets their goat is his stance on language issues. "Mel Martinez is Spanish for Harriet Miers," they mockingly proclaim on their new website "Stop Martinez." They go on to assert that the Cuban-born senator is "wrong on English" because he didn't vote for the Inhofe amendment last May declaring English the national language of the United States. (He didn't vote against it; he just didn't vote. They neglect to say whether he abstained or had some other reason for missing the vote.)

Worse than that, Martinez had the gall to speak in Spanish on the Senate floor. The relevant bullet point on "Stop Martinez" reads:

On February 3, 2005, Martinez used Spanish on the Senate floor in his first speech, although there are some doubts about the translation.

I was shocked to discover that the hyperlink ostensibly explaining the "doubts about the translation" takes you to a Language Log post, specifically Mark Liberman's entry of Feb. 6, 2005, "Never pronouncing East Thursday?" In the post, Mark discusses what appears to be a computer-generated Spanish-to-English translation of a story about Martinez's speech that showed up on the website for the Mexican newspaper El Sol de Zacatecas. Yes, Mark expressed doubts about that particular translation (since it had glaring errors like translating este jueves as "east Thursday"). But why in the world would the English Firsters commandeer a post about faulty MT as some sort of implicit critique of Martinez and his speech? ("Doubts about the translation" makes it sound like there was something sinister going on in the Spanish text that was omitted from the Senate's official English rendering.) Did they not actually read the post, or did they figure criticizing a translation of the speech — or rather a translation of an article about the speech — was tantamount to criticizing Martinez himself? Either way, I think I speak on behalf of the entire Language Log family when I say: leave us the heck out of it.

[Update, 12/8/06: Jim Boulet, executive director of English First and maintainer of the "Stop Martinez" site, responds here.]

[Update, 12/10/06: My response to Boulet's post is here.]

Posted by Benjamin Zimmer at 07:49 AM

Inherent ambiguity as an adaptive advantage

I tend to be suspicious of new theories advanced by people who want to sell me things. There's a good reason why "federal law ... states that intramural scientists conducting research with human subjects—for example, investigators and research team members involved in patient selection, the informed consent process, and clinical management of a trial—should not be allowed to have any financial interest in or relationship with any company whose interests could be affected by their research or clinical trial..."

I'm especially suspicious when the new theories are described only in vague terms, and no evidence is offered other than even vaguer mentions of large numbers of satisfied customers, and references to authoritative places where unspecified research might have been done.

Priscilla Dunstan's theory about the secret language of babies, documented on Oprah ("Amazing Medical Breakthroughs: The Secret Language of Babies Video") and available to parents and other interested parties in a 2-DVD set for $59.95, starts out with all of these handicaps. That doesn't mean that it's wrong. Time will tell. But as I'll explain below, I think that even if it's nonsense, it's probably beneficent nonsense. Unless you think that using false or exaggerated claims to sell things is intrinsically wrong -- and in that case, where would civilization be?

The first new aspect of Dunstan's idea, as far as I can tell, is that it's important to distinguish between a "pre-cry" -- which seems to be a sort of crying, but one occurring early in the time course of a crying bout -- and the "hysterical cry", which is said to follow if the "pre-cry" is not attended to. In addition, there's a specific lexicon of five "words" that are claimed to be used in different types of pre-cry, which Ms. Dunstan describes as follows (this version of the list is taken from Oprah's web site):

Neh="I'm hungry"
Owh="I'm sleepy"
Heh="I'm experiencing discomfort"
Eair="I have lower gas"
Eh="I need to burp"

Following yesterday's appeal, a reader is lending me a copy of the Dunstan DVD set, and I hope to learn more from it exactly what Ms. Dunstan thinks these five pre-cry "words" are like, in acoustic terms. Meanwhile, I'll explain a bit about past research on infant crying.

There's a long history of acoustic studies of infant cries, going back at least to the 1960s -- I recall learning about this from Eric Lenneberg in a course I took as an undergraduate, about 40 years ago. A sample reference would be O. Wasz-Höckert, J. Lind, V. Vuorenkoski, T. Partanen and E. Valanne, "The infant cry: A spectrographic and auditory analysis", Vol. 29 of Clinics in Developmental Medicine (Spastics International Medical), 1968.

A summary of four decades of back-and-forth on this topic can be found in Joseph Soltis, "The signal functions of early infant crying", Behavioral and Brain Sciences 27, 443-490, 2004:

Infant crying is clearly a means by which infants can communicate needs (e.g., hunger, pain, or discomfort) to caregivers, who may be alerted to appropriately satisfy those needs (e.g., by feeding, protecting, or soothing). It is a matter of some controversy, however, as to whether there are acoustically distinct cry types (e.g., hunger cries or pain cries) to which caregivers can respond specifically without additional contextual cues (reviewed by Gustafson et al. 2000).

Work in the 1960s by the so-called Scandinavian cry group is often cited in support of the cry types hypothesis (see Gustafson et al. 2000; Wasz-Hockert et al. 1985). Researchers recorded birth cries, pain cries (during vaccination), hunger cries (4 hours after feeding), and pleasure cries (after feeding). Listener subjects identified the four cry types better than chance (55% correct versus 25% expected). Gustafson et al. (2000) criticized this work, however, because the best exemplars of each cry type were preselected by researchers, and listeners were given the four a priori categories in advance, both of which conditions increased the likelihood of accurate assignment. Additionally, the results were collapsed across all four cry types, so that the positive effect could have been due to only the most easily distinguishable cries, such as the contented coos and babbles that constitute the “pleasure cry.” In a replication of earlier work, however, exemplars for each cry type were chosen at random, and results for the four cry types were presented separately (Wasz-Hockert et al. 1968). Additionally, the replication showed that the four cry types differed statistically along several acoustic dimensions, such as fundamental frequency and melody, although the differences were quantitative rather than qualitative. The accuracies in identifying birth, pain, hunger, and pleasure cries were 48%, 63%, 68%, and 85%, respectively (grand mean = 66%).

Other studies also show that subjects can distinguish between cry types. Wiesenfeld et al. (1981) showed that mothers could identify pain cries (rubber band snap) versus anger cries (taking away pacifier or physical restraint) of their own infants better than chance when given three categories from which to choose (pain, anger, or other; 66% correct versus 33% expected). Gustafson et al. (2000) also showed that mothers could identify pain versus hunger cries better than chance when given six cry categories from which to choose (44% correct vs. 17% expected).

When subjects are given open-choice tests, however, the evidence for cry types is not as strong. In an early study by Sherman (1927), cries were elicited from babies by late feeding (hunger cry) and by dropping, restraining, or pricking with a pin (pain cries). Non-mother subjects behind a screen were asked to judge the “emotional characteristics” of the cries, but there was no agreement among listeners, leading to the conclusion that infant cries were merely “undifferentiated noise” (Gustafson et al. 2000). More recently, Muller et al. (1974) played cries elicited by hunger (pulling the nipple away), pain (rubber band snap), and startling (clap of wooden blocks close to the ears). Again, without a priori categories, subjects could not differentiate the cry types, even of their own children. Participants tended to attribute all cries to hunger.

An alternative to the cry types model is the view of the infant cry as a graded signal (Gustafson et al. 2000; Murray 1979). According to this view, vocalizations vary quantitatively on some acoustic dimension, such as duration or frequency, and that graded change along the dimension reflects motivational or emotional state.

(Gustafson 2000, often cited in this passage, is G.E. Gustafson, R.M. Wood and J.A. Green, "Can we hear the causes of infants' crying? In: R.G. Barr et al., Eds., Cry as a sign, a symptom and a signal. Clinical, emotional and developmental aspects of infant and toddler crying, MacKeith Press, 2000.)

So in addition to the billions of parents who have been listening more-or-less closely to their infants in the ordinary course of life, there have been hundreds of researchers (thousands, including all their students and assistants) who have examined and tested millions of recorded infant cries, natural and provoked, over decades of work, proposing and debating a wide variety of ideas about the cries' function and information content. It's possible that there's a simple, universal "lexicon" of five cry (or pre-cry) "words" that these billions of parents and thousands of researchers have missed -- but it seems implausible, frankly.

All the same, Dunstan's insistance that those five universal words are really there may not be a bad thing at all. As Rami Nader, Elizabeth A. Job, Melani Badali and Kenneth Craig say in a comment on Soltis' BBS article ("Infant crying in context"):

Our focus has been on the role of early cry as a commanding source of information about infant pain and distress that requires interpretation by an adult caregiver. Its inherent ambiguity may offer an adaptive advantage, as resolution requires adult presence and scrutiny of other behavioral, physical, and contextual factors.

And it makes sense that paying close attention to an infant's vocalizations, and trying to correlate them to other behavioral and contextual factors, can only be a good thing. There's some experimental evidence for this piece of common sense, in B.M. Lester, et al., "Developmental Outcome as a Function of the Goodness of Fit Between the Infant's Cry Characteristics and the Mother's Perception of Her Infant's Cry", Pediatrics 95(4) 516-521, 1995:

Objective. To determine whether the "goodness of fit" between infant cry characteristics and the mother's perception of the cry is related to developmental outcome at 18 months of age.

Design. This was a prospective, longitudinal study from birth to 18 months performed in a blinded manner.

Setting. The study was conducted in a maternity hospital, including normal and special care nurseries and a laboratory for developmental follow-up.

Patients. The 121 term and preterm infants and their mothers were selected to meet medical criteria.

Measurement. Acoustic analysis of 1-month infant cry and the mother's perception of the same cry was used to divide subjects into four groups representing matches and mismatches between infant cry characteristics and maternal cry perception. Primary outcome measures of cognitive, language, motor, and neurologic outcome were administered at 18 months. Caretaking environment measures were also recorded.

Results. Statistically significant (P < .05) findings showed that matched groups scored higher on measures of language and cognitive performance than infants in the mismatch groups, with a particular advantage for infants in the matched group in which mothers accurately perceived the higher-pitched cries of their infants. There were no differences between the groups in biologic or sociodemographic factors. Group differences were observed in social support and maternal self-esteem.

Conclusions. Matches and mismatches between infant cry characteristics at 1 month and the mother's perception of the cry are related to cognitive and language outcome at 18 months in term and preterm infants. This relation is probably due to transactional processes in which developmental outcome is affected by the clarity of the infants' signals and by the ability of the mother to accurately perceive her infant's signals. The mother's ability to read her infant's cues may be affected by factors such as social support and self-esteem.

[For a sample of previous analysis of the acoustic features of infant cries, and their perceptual analysis, see e.g. Athanassios Protapapas, "An assessment of the perceptual role of individual acoustic features of infant cries", Brown University M.S. thesis, 1993, or Athanassios Protopapas and Peter Eimas, "Perceptual differences in infant cries revealed by modification of acoustic features", J. Acoust. Soc. Am. 102(6) 1997.

Previous studies of infant cry acoustics and their perceptual significance have remained inconclusive as to the graded nature of cry production and perception and to the exact role and importance of particular acoustic features. In this study, a set of infant cries were digitally analyzed and resynthesized to form natural-sounding cries with varying fundamental frequency (F0), degrees of jitter (period to period variations in F0), and rise time (time for F0 to reach its maximum value). In a perceptual rating task, higher-F0 cries as well as cries with larger amounts of jitter tended to be given more negative ratings than were lower-F0 cries and cries with less jitter, respectively. The perceptual ratings of the rise time manipulations were inconclusive. This study demonstrated a perceptual effect of F0 and jitter independently of other parameters, consistent with current notions of infant cry gradedness. It was also shown that digital signal processing techniques can be fruitfully applied to infant cry research.

]

[And Paul J. Camp points out that The Simpsons -- of course -- have already explored this area of science (in episode 8F23, Brother, Can You Spare Two Dimes?):

Herb stands in a public park trying to figure out an idea, when a woman struggles to understand what her baby's trying to tell her.

Herb: [confronting her] Lady, you just gave me the idea of a lifetime! How do I thank you?
Lady: Please don't hurt me.
Herb: Consider it done.

___________________________________________________________________

Herb invites the family in for a presentation that will change the world, and brings out a drinking bird. Homer is in awe. Herb tells him to take it easy, but Homer continues to ogle the bird. Herb introduces his true plan: a baby translator.

It measures the pitch, the frequency, and the urgency of a baby's cry, and then tells whoever's around, in plain English, exactly what the baby's trying to say! Everything from "Change me" to "Turn off that damn Raffi record!"

___________________________________________________________________

He reveals his less-than-spectacular creation. Marge "oooh"s. Herb says "You don't have to humor me", but she retorts that it's pretty ingrained. Homer says it's the stupidest thing he ever saw. [...]

Just then Maggie reveals its ability.

Lavish attention on me, and entertain me.

Everyone is in awe. Lisa plays peekaboo with her.

Lisa: Maggie? Maggie? [covers her eyes]
Maggie: [babbles]
Translator: [monotone] Where did you go?
Lisa: Peekaboo! [uncovers eyes]
Maggie: [laughs]
Translator: [monotone] Oh, there you are. Very amusing.

]

Posted by Mark Liberman at 07:36 AM

December 06, 2006

More about cussing in Quebec

Language Log readers who responded to my December 5 post throw some more light on the use of obscenity in French-speaking Canada.

For one thing, any outsiders there who want to sound native apparently will need some coaching in pronunciation. Jean-Philippe Marcotte writes that many religious swear words in Quebec, even in their most euphemistic form, are phonetically differentiated from the actual religious terms. For example, if you want to try to achieve native swearing competence, you should lower your lax vowel in the second syllable of "tabernacle" to the ah vowel, producing something like "tabarnak," and the glide vowel, /ay/, in "Christ" should be the unglided lax vowel, /I/, sounding like the name, Chris. And while you're at it, you'll also need to drop the final consonant, /t/, in that word.

Another Montreal reader advises that body-part words are considered "off-colour but not taboo." The f-word, for example, "has no emotive power and is not bleeped from television" but prime-time programs usually substitute "colline" for "chalice" and "tabarouette" for "tabernacle." She also reports that a friend of hers who grew up in Montreal in the 1950s had his five year-old mouth washed with soap after his father heard him say "tabernacle."

Sounds strong, eh? I don't know. Altered pronunciation and lexical substitution sound pretty familiar in the wonderful world of swearing amelioration. Darn it! Geez, I guess stuff happens.

Posted by Roger Shuy at 01:54 PM

News from Language Log Labs

Readers have been clamoring for our evaluation of the recent discovery from Down Under about the (so-called) universal language of babies:

After eight years of research, Australian mother Priscilla Dunstan says she has discovered a universal baby language, comprised of five distinct sounds.

Dunstan says babies produce the different sounds depending on their needs. 'Neh' means the child is hungry, while 'owh' indicates he or she is tired.

Other sounds include 'eh', 'eairh' and 'heh', which mean the infant needs burping, has wind or is uncomfortable.

Dunstan says babies make these sounds during the "pre-cry stage" - before they start crying hysterically - thus, parents who learn to identify the noises should be able to reduce the frequency of screaming outbursts.

Dunstan, who has always had a sharp listening skills, identified the five key sounds after spending hours listening to her own son and other infants.

She has since developed and released a Dunstan Baby Language DVD, which is available in Australia, America and is soon-to-be released in Britain.

Our crack team of researchers has been laboring around the clock to bring you the truth about this remarkable claim. Or rather, I've spent my breakfast hour this morning poking around on the internet, trying to decide whether to invest $59.95 in Ms. Dunstan's "Dunstan Baby Language 2 DVD Set", which appears to be the only source of information about her discoveries.

The wikipedia entry is essentially a compilation of PR materials, and the various associated websites are essentially advertisements.

The operators at Dunstan Baby Language are standing by, ready to take my order, but apparently they don't send out review copies, and they appear to be unwilling to accept a free 2-year subscription to Language Log in trade. If you happen to own a copy, and would like to lend it to me briefly for testing purposes, please let me know. Ditto if you know anything about credible attempts to evaluate the hypothesis that there are "five universal words or sound reflexes used by infants" -- for example, the effort said to be underway at Brown University.

[Update: more here, including some information about possible Brown University connections.]

Posted by Mark Liberman at 08:25 AM

December 05, 2006

Oh tabernacle! What the wafer!

Taboo language does not go unnoticed by the eagle-eyed staff of Language Log. In the month of November alone, you can find the following posts on this topic: November 6, November 8, November 19, and November 20. Now comes an article in the Washington Post explaining that profanity in French-speaking Canada offers new and exciting vistas about taboo language.

Apparently the Quebecois turn to religion for their swear words. The f-word and the s**t word just don't cut it for them. No sexual cussing. No scatology. Just religion -- like "hostie" (host) or, for swearing in polite company, "tabar" (tabarnacle). The reason for this? For them, religion is taboo, not sex or body functions, says Professor Andre Lapierre of the University of Ottawa.

This led to a rather feeble effort at defensive language planning by the Montreal Archdiocese. It commissioned a bunch of billboards expaining the church meanings of the taboo words, "tabernacle" and "chalice." Not surprisingly, the effort didn't seem to work and the religious cussing continues.

Oh tabernacle! What the wafer!

Update: Martin Chesnay from Montreal informs me that the Washington Post was not entirely accurate about Canadian French speakers abandoning sexual swearing. He claims they use "the f-word" as much as it's used in the US, maybe even more. He also clarifies that the Archdiocese was NOT involved in language planning. It was "part of a fund raising campaign...trying to use shock to generate awaremess for that campaign." Hmm. Well, at least I'm glad that it wasn't language planning, for there are better ways to plan such.

Posted by Roger Shuy at 02:01 PM

Happy-tensing and coal in sex

The papers have been buzzing with news about the Queen's English. These reports vary in tone and content, but many of them bang the drum for the decline of civilization. Thus Neil Tweedie in the Telegraph ( "How Queen's English has grown more like ours", 12/5/2006) begins like this:

As the common tongue continues its inexorable slide towards a new dark age of glottal stops and "innits", news comes that even the Queen is drifting slowly down river towards Estuary English.

A scientific study of Christmas broadcasts to the Commonwealth since 1952 suggests the royal vowel sounds have undergone a subtle evolution since the days when coal was routinely delivered to Buckingham Palace in sex.

The source of the fuss is a series of scientific papers by Jonathan Harrington and others. Most of the reporting deals with stuff that they published back in 2000, though there's some additional material in a paper that came out in October of this year. I'm always happy to see good phonetics research in the media spotlight, but my first question is, what happened on December 3 or so to make this news? Jonathan Harrington has just moved from a post at the University of Kiel to a position as Professor of Phonetics and Digital Speech Processing at the University of Munich, but I don't think that the PR department at a German university is likely to have promoted a faculty member's work so effectively. Handicapped as I am by ignorance of British culture, I can only guess that the timing of these stories has something to do with the schedule of the Queen's Christmas broadcasts.

There's more to say about the ideology of the news reports -- for a taste, since these things don't really change much over the years, you could see John Wells' presentation of the Guardian's coverage back in 2000.

As for the scientific content, the papers from 2000 were J. Harrington, S. Palethorpe and C. Watson, "Monophthongal vowel changes in received pronunciations: An acoustic analysis of the Queen's Chistmas broadcasts", Journal of the International Phonetic Association 30 63-78, 2000; and J. Harrington, S. Palethorpe and C. Watson, "Does the Queen speak the Queen's English?", Nature 408 927-928, 2000. Here's the abstract of the 2000 Nature article:

The pronunciation of all languages changes subtly over time, mainly owing to the younger members of the community. What is unknown is whether older members unwittingly adapt their accent towards community changes. Here we analyse vowel sounds from the annual Christmas messages broadcast by HRH Queen Elizabeth II during the period between the 1950s and 1980s. Our analysis reveals that the Queen's pronunciation of some vowels has been influenced by the standard southern-British accent of the 1980s which is more typically associated with speakers who are younger and lower in the social hierarchy.

Here's a display that presents the key findings:

The three symbols '5', '8' and 'S' represent the average positions of different vowel types in the Christmas broadcasts of the 1950s and 1980s, and in standard southern British of the 1980s, respectively.

(No dog jokes, please -- the stuff about "Barks" is a reference to the psychoacoustic scale of that name.)

The shift for the [æ] vowel is what's responsible for Neil Tweedie's little joke about "coal in sex". As John Wells, professor of phonetics at UCL, put it in a quote in the Guardian in 2000:

"We are all familiar with the change that has taken place in the vowels of words like 'that man' where, in the 1930s, we still had something like 'thet men,' " said Jonathan Wells, professor of linguistics at University College London. "She is only following along trends that exist in any case. She still remains well behind them, shall we say, and of course she still sounds upper-class, the way she always did."

In this case, the Guardian got the quote right, although they got John's name and position wrong.

The more recent work is Jonathan Harrington, "An acoustic analysis of 'happy-tensing' in the Queen's Christmas broadcasts", Journal of Phonetics, 34(4) 439-457, October 2006. Here's the abstract:

This paper presents a longitudinal analysis of some vowels from the annual Christmas broadcasts produced by Queen Elizabeth II over a 50-year period in order to investigate whether adults adapt to sound changes taking place in the community. The sound change that was analyzed in this paper, which is sometimes known as happY-tensing, concerns the tensing of the final vowel in words like ‘happy’ in British English Received Pronunciation over the course of the last 50 years. In the first part of the study, schwa vowels in Christmas broadcasts separated by 40–50 years were analyzed in order to exclude as far as possible any long-term acoustic effects due to vocal tract maturation. The results of this analysis show a large decrease in both the fundamental and F1, F2, and F4 from earlier to later broadcasts. It is then shown that the Queen's 1950s happY vowel is less tense than in a 1980s corpus of four female speakers of Standard Southern British. A subsequent comparison between the 1950s and 1990s Christmas broadcast happY vowels shows a small change towards the tenser position. It is argued that the vowels exemplified by KIT and happY have undergone phonetic raising in RP, with the latter also having fronted. The Queen has participated in the first of these changes and marginally in the second.

The result of happy-tensing (along with the previously-documented change in [æ]) is to turn IPA [ˈɦɛ.pɪ] to [ˈɦæ.pi], more or less -- or in the orthographic phonetics of the newspapers, "heppih" to "happee".

The Queen's vowel shifts are in the direction of the variants used by most Americans and Canadians. So I wonder, is the Queen's speech drifting slowly downriver towards Estuary English, or is it sublimating subtly into the more ethereal realm of World English?

[If you're interested in the ideology of accent -- which is the main news here -- some of the press discussion is listed below:

Roger Dobson, "Speaking the Queen's English: Me 'ubby and I, innit", The Independent, 12/3/2006;
Mark Prissell, "One's voice ain't that posh", The Sun, 12/4/2006;
Neil Tweedie, "How Queen's English has grown more like ours", Telegraph, 12/5/2006;
Catherine Jones, "One thinks one has lorst one's posh voice", Western Mail, 12/5/2006.
Sajeda Momin, "How the Queen's English has changed with the times", Daily News & Analysis (India), 12/5/2006;
Justin Lees, "Royal Vowels crossing Jordan", The Daily Telegraph (Canada), 12/5/2006.
"My word -- the queen's English is slipping", UPI (reprinted in the Daily Indian, 12/4/2006);
"Study: Queen Sounds More Like Subjects", AP (reprinted in the LA Times, 12/4/2006).

]

[Update -- David Eddyshaw writes:

Apropos of vowels and the Class War, I recently read a report of an old-style Labour Party activist here decrying the Tory supposed indifference to child care provision thus:

"they're the kind of people who think a creche is what happens when two four-by-fours* collide"

* "Chelsea Tractors". I think it's SUV in American.

]

[And Samuel Fox writes:

The true sign of changing times is not that the Queen's vowels have drifted, but rather that the authors of the paper could have referred to her as HRH, rather than the correct HM.

I have no clue about this sort of thing, but I should have thought that Nature's copy editors would be well informed. And a number of official-seeming sites, such as this page from the British Embassy in Buenos Aires, refer to {"HRH Queen Elizabeth II"}.]

Posted by Mark Liberman at 07:57 AM

December 04, 2006

Unicode 5.0 is here

I just got my copy of The Unicode 5.0 Standard, by the Unicode Consortium (Addison-Wesley, 2006; ISBN 0-321-48091-0; 1470 pages plus a CD ROM; $59.99). The excellent Berkeley-trained linguist Ken Whistler is one of the 14 editors. This is the most spectacularly nerdy book I have ever seen. All the details about how all the writing systems in all the world are to be encoded in a standard way for computer systems. And I know who's going to love it: Bill Poser, of Language Log's Asian Writing Systems and Open Source Software departments, is going to be squealing with delight. He'll need two copies minimum, one for the office and one for the nightstand by his bed.

Details of the IPA phonetic symbols you sometimes see on Language Log can mostly be found in pages 591-601, though for a fuller introduction to their phonetic values you'll want either Phonetic Symbol Guide (University of Chicago Press; 2nd edition 1996; ISBN 0226685365; $21, excellent price) or Handbook of the International Phonetic Association (1999; Cambridge University Press; ISBN 0521637511; $27.99, very good price) or some suitable introduction to phonetics such as A Practical Introduction to Phonetics (by J. C. Catford; Oxford University Press; 2nd edition 2002; ISBN 0199246351; $32.95, pretty good price) or A Course in Phonetics (by Peter Ladefoged; Heinle; 5th edition 2005; ISBN 1413006884; $73.95 or more, a fine book at a disgraceful price). And for entering phonetic symbols on web pages, take a look at the wonderful page by John Wells (University College London) on IPA in Unicode, and click on Inserting IPA symbols in web documents.

Posted by Geoffrey K. Pullum at 10:21 AM

Massachusetts hold 'em

It's good to see that America's (cartoon) youth are sensitive to regional speech variants, as explained in the 12/3/2006 Foxtrot strip:

(Even if the vowels seem to be a sort of midwestern approximation...)

Posted by Mark Liberman at 08:13 AM

December 03, 2006

art, arts, arting, arted

Greetings from the desert terrace at LL Plaza! (The Plaza looks something like the Getty Center in LA, although the Pennsylvania landscaping has more azaleas and fewer palms.) I just arrived from Tucson and am hanging out here among the cacti to ease the transition.

For my first contribution, just to show I've been paying attention, here's another example of the 'X is a verb' snowtrope that I saw on a display panel at the Scottsdale Museum of Contemporary Art a week or two ago:

They mean to convey something like, 'Art is an activity we, the curators, and you, the museum goers, and they, the artists, all actively engage in,' as in the other uses of the 'X is a verb' formula discussed in these august pixels. Just another misguided linguistic metaphor. However, it reminded me of something else about 'art' as a verb that I'd been thinking of posting about.

When I was a kid in Newfoundland, we said the Lord's Prayer every morning at school. (It was a secular public school, but derived from the Protestant half of a historically denominationally organized school system; old habits die hard.) I knew 'art' was a verb, in "Our Father, who art in heaven", but I understood it as some verbal counterpart of the noun 'art', as in skill, work, magic, the opposite of the 'dark arts' -- you know, arcane, mysterious art. 'To art' in this sense would mean something like, 'to work (magic)'. So I thought we were intended to be addressing "Our Father, who works (magic) in heaven..." It wasn't until much later that it occurred to me that this was in fact just an arcane, mysterious form of the verb 'to be'.

But recalling my childhood confusion as I stood in the Scottsdale Museum of Contemporary Art, another puzzle about this use of art occurred to me. It is true that art is a former member of the present tense conjugation of the English copula be--but it's the wrong one. The relevant entry in the OED, for art, says this:

2nd sing. pres. ind. of BE. One of the remaining parts of the orig. substantive vb.; cf. AM.

That is, as a form of be, art is unambiguously second person singular. Consequently, its use in "Our Father, who art in heaven" is mighty peculiar. Relative pronouns like who or which inherit the person of the NP they modify,* and the modified NP, our father, is third person. Consequently, the verb should be a third-person (singular) form, that is, is: "who is in heaven". (The logic of my childhood misparse was similarly flawed; my verb should have been arts rather than art, unless I imagined it was irregular, of course, which I guess I must have).

So then I started wondering where the art came from.¹ In fact, in the New Testament Greek "original" version, there is no copula present; the line in Greek went like this (transliteration and interlinear gloss taken from the relevant page at the Center for Indo-European Language and Culture at the University of Texas Austin):

Pater hêmôn ho en tois ouranois;
O-father of-us he in the heavens
'Our Father which art in heaven,'

In the 'Standard Latin' translation of this, the second-person form of the copula, es, appears:

Pater noster, qui es in caelis

because, I suppose, the head of the relative clause is vocative, the case form used to address someone. (This is kind of interesting in itself! I didn't know that a vocative NP could trigger second person agreement within a relative clause modifying it.)

The first Old English translations were made from the Latin translation, rather than from the Greek, and an actual second person pronoun appears (OE and gloss taken from Cathy Ball's Old English web pages at the University of Georgetown):

Fæder ure Þu Þe eart on heofonum
Father our thou that art in heaven
"Our Father, you who are in heaven"

And of course, here the use of the second person form makes all kinds of sense, since the head being modfied by the relative clause is itself the second person pronoun, in apposition to the inital NP Fæder ure, 'our father'.

What's interesting is that the art form of the verb persisted in official English versions of the prayer long after the "thou" had been dropped and regular rules of English agreement would have predicted a switch to the third-person form is. (NB: English does not have a vocative case form.) You can track the persistence of art at Cathy Ball's online collection of English forms of the prayer, here. Modern English translations based directly on the Greek almost never include any copula at all; translations into other Germanic languages which include a copula either use the 3rd sg form or, if they use the 2nd sg. form, include a pronoun. (For a complete description of the process employed in the creation one modern English translation and side-by-side comparisons of ten different translations, both modern and older, check out this pdf.)

With the loss of the 'thou', the art became an anachronism, and its persistence illustrates an interesting point about ritual speech. Ritual speech is one place where archaic words linger on, long after they have fallen out of common use. Indeed over time, they often become unintelligible to youger generations uttering them (a situation which is conducive to misparses and eggcorns like mine). In ritual speech it's important to get the form of words exactly "right"--words that are just a paraphrase of the meaning won't do. By the time the 'thou' was dropped from the official version, the use of 'art' must have been completely formulaic, retained because it was the "right" form to use in this prayer, like the predicate-first subjunctive in the next line, 'Hallowed be thy name'.

The 'thou', although present in neither the Anglican nor the Catholic official versions, isn't completely gone. There are 12,900 Google hits out there for "Thou art in heaven" vs. 252,000 for "who art in heaven" and 94,500 for "which art in heaven". A search for "who is in heaven" turns up 247,000 hits, but only two of the first 10 hits have anything directly to do with the prayer, so I assume most of those aren't relevant. "Our Father, who is in heaven" has a measly 21,600. A lot of the modern translations just use 'in heaven', no copula or relative clause at all, and "Our Father in heaven" weighs in at 284,000 hits.

Cautionary Postscript: This discussion is not about the Lord's Prayer itself, but rather about subject-verb agreement in free relative clauses, the English copula, problems of translation, formulaic speech and the genesis of misparses. The prayer is just an extremely well-documented source of data about these issues.

*Thanks to Mark Reed and Simon Cauchi for helping me to be clear about this point!

¹Be warned: I am not a scholar of any classical language, or of Old English; the information that follows is what I can deduce by looking at some paradigms, reading some websites I trust, and making a few educated guesses. I might easily be wrong about something. I expect someone will let me know if I am.

Update: Several readers have written with interesting remarks and replies, and I have directed them to the 'comments' section of the crosspost at my own little linguistics blog, Heideas. If you're interested in following up any of the grammatical points above, you might find that discussion interesting.

Posted by Heidi Harley at 11:26 PM

Fabricated but true?

Yesterday, in a post about the curious culture of modern science writing, I wrote:

As I've watched the reaction to Louann Brizendine's book over the past few months, I've concluded that "scientific studies" like these have taken over the place that bible stories used to occupy. It's only fundamentalists like me who worry about whether they're true. For most people, it's only important that they're morally instructive.

What would [various journalists] say, if presented with evidence that they've been peddling falsehoods? I imagine that their reaction would be roughly like that of an Episcopalian Sunday-school teacher, confronted with evidence from DNA phylogeny that the animals of the world could not possibly have gone through the genetic bottleneck required by the story of Noah's ark. I mean, lighten up, man, it's just a story.

From his bunker in Los Angeles, Omri Ceren wrote in response:

CBS invented that response with the post-Killian memo "fabricated but true" argument. You're sounding awful militant there, professor -- thinking of switching over to the other side ;;-) ?

Omri's analogy is an apt one. But in the fall-out from Memogate, CBS News fired Mary Mapes (who produced the offending segment), Betsy West (the Senior Vice President who supervised primetime news programs), Josh Howard (the executive producer of 60 Minutes Wednesday), and Mary Murphy (Howard's second-in-command). And Dan Rather is now exiled to the remotest regions of HDNet.

In contrast, consider the 9/29/2006 segments on ABC's 20/20, titled "The truth behind women's brains" and "Gender myths: let science decide". No one at 20/20 is in even the slightest bit of trouble, although the sheer amount of fabricated evidence presented on those programs was a great deal larger. In fact, the folks responsible for those 20/20 segments probably got praise and credit from their employers, since the pseudo-science of sex differences is a very popular topic, and those segments were effectively presented and presumably got good ratings. The same thing can be said about the dozens, if not hundreds, of editors, producers, pundits, reviewers and reporters who have spread the same fabrications through the global media over the past few months.

My point here is that journalists still maintain the presumption that the news media ought to tell the truth about politics, economics, natural disasters, and so on. If it's shown that fabricated evidence has been presented as if it were true, someone ought to apologize or even get fired. However, it's clear that there's no such presumption in the area of science reporting, even when the issues have major public policy implications. Has any journalist ever been disciplined for publishing a source's fabrications about science, even when a small amount of research would have uncovered the problems? I've never heard of a case. Science writing is treated as a form of popular entertainment, of a vaguely utilitarian sort, and even when articles present quantitative "facts" that are completely fabricated, as has recently become common in the case of the "science" of sex differences, there are no consequences.

(For another example, of many, you could take a look at the infamous "email lowers your IQ twice as much as marijuana does" story.)

With respect to the question of switching sides, we here at Language Log like to think that you can be interested in the truth, independent of your political, cultural and religious allegiances. In the case of the Memogate controversy, we presented our mite on behalf of the truth. 60 Minutes showed the faked documents on 9/8/2004; LGF showed that they were forgeries on 9/9/2004; CBS News continued to defend the authenticity of the documents, vigorously, until 9/20/2004, when (as the wikipedia article puts it) they "stopped defending the documents and began to report on the problems with their story". On 9/22/2004, CBS conceded, in effect, by appointing an independent review panel. We started talking about the story on 9/15/2004 -- we were a bit slow on the uptake, I'll grant -- but our judgment was clear from the start:

"Typography, truth and politics" (9/15/2004)
"You couldn't have a starker contrast" (9/17/2004)
"Little Green Apples at the Blue Moon Bar" (9/24/2004)

Posted by Mark Liberman at 03:53 PM

Who remembers Ayds?

A while back in discussing Hormel's effort to defend the Spam trademark against association with unsolicited bulk email, I pointed out that the associations of a trademark can change in such a way as adversely to affect sales of the trademarked product without any violation of the trademark. Amy Forsyth, the person who actually runs the Linguistics department at the University of Pennsylvania, points out a particularly nice example of this.

Once upon a time there was an appetite-suppressing candy called Ayds. It came on the market in 1937 and sold well until the early 1980s. Around 1981, however, the disease AIDS began to gain the public's attention, to the detriment of Ayds, since the two had the same pronunciation. The manufacturer of Ayds tried changing the name to Diet Ayds, to no avail. The product was eventually withdrawn from the market.

Posted by Bill Poser at 03:23 PM

Does anybody have a word for this? We do now.

I wouldn't have thought that there was a great call for such a word in most people's lives, but then came my first sighting, in an Advocate interview (9/26/06) with Julian McMahon, one of the stars of the television show "Nip/Tuck" (and, before it, "Charmed"). McMahon is talking about his sexual adventures, when the interviewer asks about three-way sex (McMahon, a woman, and a buddy), which turns out not to be McMahon's thing:

I'm not good with the other-guy thing. I don't want to see my buddy's come face.

This is come face 'facial expression during orgasm'. It turns out that this is not the only word that's been coined for this meaning; we now have O-face as well.

There is a Robert Mapplethorpe come-face photograph -- of an ecstatic Larry Desmedt (1979) -- that serves as the frontispiece to the collection Certain People: A Book of Portraits, and you can of course see the expression in pornographic photography and film, but probably most people get most of their chances to observe it on their partner's face during sex, an occasion when their attention is likely to be elsewhere. I can't recall anybody's discussing come faces until recently, except in connection with my xxx-rated collages, where come faces are something of a theme -- and then no one seemed to have a word for them. Things have changed.

Googling on "come face" pulls up some cites, though you get a lot of irrelevant hits, including many involving "come face to face with". When we discussed the expression on the ADS-L back in October, Charlie Doyle suggested that searching on "cum face" would be easier. This turns out to be true, but you pull up a lot of references to cum face in a different sense, 'face with cum/semen on it', as a result of "facials" or "bukkake" (you can google up images, even, though I find many of them dismaying). What we have here is a partial differentiation in spelling between the verb denoting orgasm and the noun denoting ejaculated semen. This is a topic of some interest in itself, and I'll get to it, but first some words about O-face.

You'll get tons of webhits for "O face"/"o face"/"O-face"/"o-face", somewhat fewer for the variant spellings with "oh" instead of "o". As Matthew Gordon noted back in October, the expression goes back to the 1999 movie Office Space; imdb offers this quote:

Drew: I'm thinking I might take that new chick from Logistics. If things go well I might be showing her my O-face. "Oh... Oh... Oh!" You know what I'm talkin' about. "Oh!"

The movie might well not have been the source of the expression, but it certainly was the vector for its spread. It now beats come face all hollow.

For your entertainment: Details magazine has been printing O-face quizzes, with a display of twenty faces (of both sexes). In the October issue (p. 180) it's "Game Face or O-Face?", in which your task is to distinguish "an ace tennis player's expression of exertion and a porn star's look of ecstasy." In the November issue (p. 104) it's "Idol Face or O-Face?", which provides some "contorted expressions of an aspiring pop idol" and some "of a seasoned porn star." Now in the December issue (p. 132) it's "Guitar Face or O-Face?":

The disheveled mane and squeezed-shut eyes. The sweaty brows and parted lips. Without the audio cues, some emotive rockers bear an uncanny resemblance to porn stars. Take a closer look at these facial acrobatics and see if you can tell who's nailing a solo and who's straining to deliver a big finish...

(Answers available on the Details site.)

Back to come vs. cum. For lots of people (of whom I am one), differentiating in spelling between the verb come and the noun cum gives you a verb with the past form came (which is what I say), and a noun that clearly looks like a noun, and (since it it has a non-standard spelling, an ear spelling) looks "dirtier" than the spelling come would for the noun.

Meanwhile, from the noun cum there's a (zero-)derived verb cum 'ejaculate on, shoot cum on', apparently seen mostly in the past participle: someone gets their face/ass/boobs/whatever cummed.

But the V-come/N-cum pattern isn't the only one around (though I suspect it's the dominant one, and it allows you to distinguish come face from cum face). Some people have cum for both, giving a past form cummed, as in

The other day i cummed for the first time. My male friends told me that i should have only cummed a droplet, but i cummed and it ran all down my penis. (link)

And some people have come for both. No doubt there are people with variation for one or both of these items, with the spellings belonging to different stylistic levels (with come as a bit more refined than cum, if you can talk about refinement on this topic). Someone should investigate this.

For all I know, there are people who have cum only as the verb and come only as the noun, though that looks bizarre to me.

In any case, it seems that there was a time, not long ago, when English had no expression of any currency for 'facial expression during orgasm'. Now we have two, both of them easily understandable in context on first hearing, so at least one of them is likely to endure -- unless, of course, our culture enters a phase of visual and linguistic modesty in sexual matters.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:58 PM

Every jot and tittle

This story is good enough that it deserves to be true. According to Radagast at Rhosgobel ("The Dead Seas scrolls", 12/1/2006):

While flipping around the TV channels today I stopped briefly on one of our local religious stations. The person preaching was rambling on about the glory of god and how the bible was the word of god (or something like that); to help make his point that the bible was the word of god, he introduced the Dead Sea scrolls. He said that they were 3,000 years old and that scholars had found that they were identical to the modern day bible. In fact, he said, "Every dot over every 'i', every cross of the 't', every comma, and every period is in the exact same place as in the bible in your hand" (quote paraphrased).

Can anyone provide chapter and verse, in the form of the name of the preacher, and the particulars of the broadcast? Or better yet, put the clip on youtube. If you have any further information, please send it to me. I sincerely hope that this one turns out to be not only a good story, but also true, unlike some similar quotes from earlier times. It would be nice to see that the preachers are still able to make up bigger lies than the science writers.

[Update -- Bradley Skaggs suggests that the preacher may have been Grant Jeffrey, who had a show at 10:00 p.m. on TBN on the date in question, and whose website mentions the Dead Sea Scrolls in a way consistent with Radagast's paraphrase:

I have had the privilege of exploring the Dead Sea Caves where thousands of ancient biblical manuscript fragments were found in 1947 that confirm the astonishing accuracy of the text of the Scriptures.

Another quote, however, suggests that Jeffrey has a more accurate picture of the date of the scrolls, and understands that they were not written in English:

If someone had asked a minister in 1947 to prove that the original Hebrew Scriptures from the Old Testament were reliably copied without error throughout the last two thousand years, he might have had some difficulty in providing an answer. The oldest Old Testament manuscript used by the King James translators was dated approximately A.D. 1100. Obviously, that old manuscript from A.D. 1100 was a copy of a copy of a copy, etc. for over two thousand years. How could we be sure that the text in the A.D. 1100 copy of the Scriptures was identical with the original text as given to the writers by God and inspired by Him? However, an extraordinary discovery occurred in the turbulent year before Israel became a nation. A Bedouin Arab found a cave in Qumran near the Dead Sea which ultimately yielded over a thousand priceless manuscripts dating back before A.D. 68, when the Roman legions destroyed the Qumran village during the Jewish war against Rome. [...]
The most incredible discovery was the immense library of biblical manuscripts in Cave Four at Qumran that contained every single book of the Old Testament with the exception of the Book of Esther. Multiple copies of several biblical texts such as Genesis, Deuteronomy and Isaiah were found in Cave Four. Scholars were able to reach back a further two thousand years in time to examine biblical texts that had lain undisturbed in the desert caves during all of the intervening centuries. The scholars discovered that the Hebrew manuscript copies of the most authoritative Hebrew text, Textus Recepticus, used by the King James translators in 1611, were virtually identical to these ancient Dead Sea Scrolls. After carefully comparing the manuscripts they discovered that, aside from a tiny number of spelling variations, not a single word was altered from the original scrolls in the caves from the much copied A.D. 1100 manuscripts used by the Authorized King James Version translators in 1611.

I can't evaluate Jeffrey's claim about the exactness of correspondence, but it differs mainly in emphasis from what the wikipedia entry says: "Although some of the biblical manuscripts found at Qumran differ significantly from the Masoretic text, most do not." And this page at BYU suggests that the claim on Jeffrey's web site might be a bit exaggerated and misleading, but are not completely invented:

About a fourth of the scrolls are copies, in whole or in part, of every book in the Old Testament except the book of Esther. [...]

Some of the biblical texts from Qumran differ significantly from conventional wording and even among themselves. And there is evidence of additions and deletions in some texts, suggesting that in some instances scribes felt free to alter the texts they were working on. [...]
However, other biblical manuscripts are very close to the text found in the Hebrew Bible, known as the Masoretic text, which was composed by Jewish authorities centuries later, between A.D. 600 and the middle of the tenth century. This consistency is remarkable because these manuscript copies are at least a thousand years older than previously known biblical manuscripts and even predate the canonization of the Hebrew Bible!

So either Radagast misheard, or Jeffrey got carried away on TV and said stuff different from what he wrote on his web site, or Jeffrey wasn't the guy Radagast heard.]

[Update #2 -- A K M Adam comments:

I can't resolve the conundrum of which preacher your correspondent may have been listening to, but as a worker in the biblical vineyard I was tickled by the transcribed characterization of the "the most authoritative Hebrew text, Textus Recepticus."
First, not surprisingly, the reliable manuscript to which Jeffrey adverts is not a known by the name he uses. The conventional standard Hebrew manuscript is the Masoretic text, of which the earliest complete exemplar was the Aleppo Codex (now incomplete, leaving the Leningrad Codex as the earliest surviving complete version).
But (second) Jeffrey was presumably not referring to a Hebrew manuscript at all, but to Erasmus's edition of the Greek New Testament, which is known as the "Textus Receptus" (not "Recepticus," though Google shows an embarrassing superabundance of the erroneous form -- I will resolutely resist the temptation to tease out the linguistic logic that renders "recepticus" an intelligible ersatz equivalent of "receptus").
So, if Jeffrey really was comparing Old Testaments with Old Testaments, he was not using the Textus Receptus; and if he was using the Textus Receptus, he was not examining the Old Testament; and either way the Textus was Receptus, not Recepticus.
But apart from that, I leave it for you to estimate whether he was making any mistakes.
Grace and peace,
A K M Adam
Professor of New Testament
Seabury-Western Theological Seminary

Well, maybe Mr. Jeffrey is careless enough even in writing to license the unkind suspicion that he might, in preaching, taken further flight from the facts in an attempt to inspire his audience. This is almost certainly the process that produced the "women talk three times more than men" and "men think of sex every 52 seconds" factoids. (Though just to keep things clear, that was not Jeffrey's doing, it was a more secular preacher.)]

[Update 2/20/2007 -- Radagast writes:

A few days after you made your post I did finally figure out who the preacher was; it was John C. Hagee. He was on at 1pm that day (link).
I sadly haven't been able to find a recording of the sermon, and there's precious little on his website about the Dead Sea scrolls, but I did find a PDF that includes the line "You think of Isaiah, who wrote the book in the Bible that was found in the Dead Sea Scrolls validating the reality of the Word of God." That sounds a lot like the same mindset that was behind what I heard that day.

]

Posted by Mark Liberman at 01:50 PM

You're a canned mine

My philosopher partner Barbara has just returned to California after spending November at the University of St Andrews in Scotland. A month in the Greenwich Mean Time zone, soaking up philosophy at the university's first-rate Arché research center, has afflicted her with first-rate jet lag. She dealt with the insomnia this morning by getting up before dawn to work at a writing task. But after a couple of hours a headache and a slight feeling of nausea convinced her she should go back to bed for a while, perhaps with a cup of tea. I said I'd bring the tea up to her, and as she headed for the stairs she expressed her gratitude, speaking slowly and sincerely: "You're a canned mine."

For a few moments we froze, staring at each other in utter disbelief, as if she had been possessed by an evil spirit and it had just spoken in her voice. But we both know a bit about psycholinguistics, and we soon realized what had happened. The vowel nuclei in the last two syllables had been interchanged in a speech production error. The [ai] of kind, the [æ] of man. It convinced us both she really did need to lie down for a while.

Spectacular speech errors of this sort are quite common, and not only in the speech of people currently located eight time zones away from where their biological clock has been set. The details of such errors have often been used by phonologists as evidence for phonological structure. After all, if you can accidentally switch the nuclei of two adjacent syllables when you're very tired, one obvious explanation would be that phonology is not a kind of fiction made up in the process of doing linguistic analysis; rather, there really are syllables, they really have nuclei, and your speech production mechanisms actually operate in a way that, in effect, makes reference to these units.

Posted by Geoffrey K. Pullum at 11:59 AM

E. Langdell

John Mayer of the Center for Computer Assisted Legal Instruction couldn't make it to the workshop I've been attending here in Pittsburgh, due to Thursday's snowstorm in the midwest. So he quickly put together three screencasts of his planned presentations. One of them, under the title eLangdell, is about legal education, but it should be of interest to linguists and people in other fields as well. [His other two presentations are also interesting: an "Introduction to CALI" and some thoughts about a "Fantasy Supreme Court project".] There are a few things in John's presentation with a specifically linguistic cast, like his discussion of the use of speech synthesis, but the real message is what he has to say about opportunities in legal education for new kinds of educational ecologies.

The analogous possibilities are especially relevant to the field of linguistics, for two reasons: screencasts and other new media forms are especially useful for presenting analyses of speech and language; and network-accessible resources are especially helpful in letting people learn about disciplines like linguisitcs, where courses are not widely available.

Being able to integrate sound and interactive images conveniently, in packages that are easy to distribute and easy to access and use, fits the needs of linguistic education in an especially helpful way. Such screencasts are always kind of neat, but they're not always really necessary. I'm now sitting in the Pittsburgh airport, waiting to board my flight home to Philadelphia, and as I'm writing this entry in one window on my laptop, I'm watching and listening to John's presentation in another. It's nice to hear his voice and have the information presented in an interesting and well-paced way -- and the narrated presentation allowed John to be present at the workshop in virtual form -- but frankly, I could have gotten the same information pretty well in a purely textual form. But some things are very hard to get across without sound and interactive graphics, and much of what we need to teach in linguistics falls into this category. In phonetics, for example, it's invaluable to be able to play sounds, to show waveforms and spectrograms and pitch tracks, to point out and comment on various features of the displays, and so on, all arranged in time and space just as we do it for someone in real time, sitting together in front of a computer. Similar things apply for presenting ideas in syntax or discourse analysis or pretty much any other kind of speech or language analysis.

Engaging, easily-available material of this kind would make linguistic analysis accessible to many people who are now effectively prevented from learning about it. If you want to learn physics or biology, it's pretty well guaranteed that your high school, college or university has a whole sequence of courses. If you want to learn linguistics, you're probably on your own, unless you're lucky enough to be a student at one of the relatively few colleges and universities that have a linguistics program. Accessible course materials would make it easier for interested students to start learning, and would also help interested teachers in other fields to include more and better linguistics in their courses.

[Update -- John Mayer writes:

Thanks for blogging about eLangdell.
I have a longer version of that presentation that I have at the 2006 CALI Conference online as well.
Here is the video...
..and here is the screencast...
I have never had to "phone in" a presentation like that before. I would appreciate your feedback on how well that worked in the room. Could folks hear what I was saying? Did it generate discussion? Was it too long or too short?

I thought that John's presentations worked pretty well, though I did miss the chance to ask him some questions (and hear the answers, of course). We could have been provided by a live voice connection, if we'd thought to set it up. ]

Posted by Mark Liberman at 06:52 AM

December 02, 2006

Singular "their": public health edition

Yesterday, as the poisoning of former Russian spy Alexander Litvinenko began to broaden into a wider radiation scare, Great Britain's Health Protection Agency released the following statement:

The Health Protection Agency is continuing to provide expert advice on the public health issues surrounding the death of Mr Alexander Litvinenko.
The Health Protection Agency can also confirm it was informed this morning that tests have established that a further person who was in direct and very close contact with Mr Litvinenko has a significant quantity of the radioactive isotope Polonium-210 (Po-210) in their body.
This person is now to be investigated further in hospital.

For whatever reason, the HPA felt the need to conceal the identity of the "further person," even though British news organizations such as Reuters and the Guardian swiftly revealed that it was Mario Scaramella, an Italian security expert who met with Litvinenko the day he was poisoned. Even the gender of the person could not be disclosed, which led to a spot of trouble in choosing a possessive pronoun to modify "body." The writer of the statement decided to go with "their" instead of the wordier "his or her." It's an age-old solution to the lack of an epicene pronoun in English (and a Language Log chestnut, most recently discussed here).

Charlie Clingen spotted this usage in a Reuters report, commenting:

It still seems a little strange to me to find "one person" referred to by "their" in the same sentence, but I guess I'd better get used to it. In fact, I find myself doing the same thing more and more frequently these days. It seems to be the most pragmatic solution to the "gender camouflaging" problem, although in many situations it can be a transparent deception.

The gender camouflaging in the HPA's statement is, to my mind's ear, not entirely successful. Let's compare it with another recent official announcement with singular "their," from the (U.S.) Transportation Security Administration (discussed here and here):

We encourage everyone to pack gel-filled bras in their checked baggage.

In the TSA case, "their" refers back to the antecedent "everyone," a more comfortable fit since "everyone" is an indefinite quantifier, as opposed to the concrete (albeit concealed) individual specified by the HPA as "a further person." Furthermore, in the TSA announcement "their" modifies "(checked) baggage," a mass noun that does not indicate the number of the possessor (i.e., one person or many people could possess checked baggage). The HPA statement, on the other hand, has "their" modifying singular "body," which could only belong to a single person. So the TSA's use of singular "their" floats by rather unobtrusively (with commenters instead focusing disingenuously on the idea that everyone needs to pack gel-filled bras), while the HPA's usage is more conspicuous. Perhaps British public health officials could get together with American airline security to draw up some guidelines on gender camouflaging... but somehow I think they all have more pressing concerns at the moment.

(For a discussion of conditions on singular "their" in the context of Jane Austen's writing, see Henry Churchyard's informative page.)

[Update, 12/3/06: Readers have been emailing with some differing viewpoints. First, from Adrienne York:

I find I disagree with your contention that the HPA's use of singular their rings false while the TSA's use is appropriate. After all, the HPA's article was written to emphasize that they were not identifying Mr. Litvinenko's contact in any possible way, so the singular their obscuring gender makes sense.
On the other hand, it is reasonable to assume that anyone carrying a gel-filled bra is female, and so singular their, in order to obscure gender, strikes me as a case of overcorrection for gender neutrality.
My understanding of the use of singular they is not that it leaves unmarked the numbers to which it is referring, but the gender. If it is obscuring the numbers of people to whom one is referring, then it isn't really singular, is it? It's a standard issue plural they, and shouldn't one use a plural verb to refer to it?

My point above wasn't that the usage of "their" obscures a singular vs. plural distinction, but rather that "their" works better as a singular gender-neutral pronoun when the number of the antecedent isn't entirely explicit. So when the antecedent is an indefinite quantifier like "everyone," there is no conspicuous mismatch with the anaphoric use of "their." In the TSA example, one could even construe "everyone" as a plural quantifier if one were so inclined, agreeing with a plural reading of "their." In the HPA example, on the other hand, there's no getting around the fact that we're talking about a single known person, with a singular body contaminated by Polonium-210. In my native-speaker judgment (about which I make no claims of generalizability), "their" fits more comfortably with a singular antecedent when the semantic and syntactic context do not foreground the singularness of the referent. But as Henry Churchyard notes in connection to singular "their" as used by Jane Austen, we're talking about a gradient of acceptability, with the least acceptable end of the continuum occurring when singular "their" refers to "a strongly-individualized single person about whom there is some specific information."

Next, John Atkinson writes:

Maybe things are different in your dialect, mate, but for me "their" is just fine in this sentence, whether the antecedent is male, female, or epicene. That is, if they'd said
The Health Protection Agency can also confirm it was informed this morning that tests have established that a young woman who was in direct and very close contact with Mr Litvinenko has a significant quantity of the radioactive isotope Polonium-210 (Po-210) in their body.
that would have sounded just fine too. And this is certainly not a new development here.
Is it perhaps an American idiosyncrasy, that it's ungrammatical for a speaker to use "their" when the gender is known? How weird!

Explicit knowledge of gender helps to individualize and concretize the person in question, which makes singular "their" slightly less palatable — again, in my non-generalizable judgment. But contemporary American speakers are not particularly choosy in this regard, as noted in Geoffrey Pullum's post, "Singular they with known sex" (1/3/06).

And finally John Cowan writes:

I think what counts is not the indefiniteness of the quantifier, but the indefiniteness of the determiner in general. "A person feels bad when they ...", e.g.

Indeed — quantifiers like "anyone" and "everyone" are not the only type of indefinite determiner that works well with singular "their," but they're a particularly common type. And there's no question about the quantifiers' indefiniteness, as opposed to "a person," where there could be an ambiguity in construing whether the referent is a particular specified person or not.]

Posted by Benjamin Zimmer at 09:32 PM

Class consciousness

Via Neal Goldfarb, the 12/2/2006 Mother Goose & Grimm:

There is a remarkable degree of unanimity about this matter, around the English-speaking world and across nearly a century of time. As illustration, I can't resist reprinting the whom humor from LL of 4/18/2004:

Calvin Trillin, cited in Anne Lobeck, Discovering Grammar:

As far as I'm concerned, "whom" is a word that was invented to make everyone sound like a butler.

James Thurber, Ladies' and Gentlemen's Guide to Modern English Usage:

The number of people who use "whom" and "who" wrongly is appalling. The problem is a difficult one and it is complicated by the importance of tone, or taste. Take the common expression, "Whom are you, anyways?" That is of course, strictly speaking, correct - and yet how formal, how stilted! The usage to be preferred in ordinary speech and writing is "Who are you, anyways?" "Whom" should be used in the nominative case only when a note of dignity or austerity is desired. For example, if a writer is dealing with a meeting of, say, the British Cabinet, it would be better to have the Premier greet a new arrival, such as an under-secretary, with a "Whom are you, anyways?" rather than a "Who are you, anyways?" - always granted that the Premier is sincerely unaware of the man's identity. To address a person one knows by a "Whom are you?" is a mark either of incredible lapse of memory or inexcusable arrogance. "How are you?" is a much kindlier salutation.

The Buried Whom, as it is called, forms a special problem. That is where the word occurs deep in a sentence. For a ready example, take the common expression: "He did not know whether he knew her or not because he had not heard whom the other had said she was until too late to see her." The simplest way out of this is to abandon the "whom" altogether and substitute "where" (a reading of the sentence that way will show how much better it is). Unfortunately, it is only in rare cases that "where" can be used in place of "whom." Nothing could be more flagrantly bad, for instance, than to say "Where are you?" in demanding a person's identity. The only conceivable answer is "Here I am," which would give no hint at all as to whom the person was. Thus the conversation, or piece of writing, would, from being built upon a false foundation, fall of its own weight.A common rule for determining whether "who" or "whom" is right is to substitute "she" for "who," and "her" for "whom," and see which sounds the better. Take the sentence, "He met a woman who they said was an actress." Now if "who" is correct then "she" can be used in its place. Let us try it. "He met a woman she they said was an actress." That instantly rings false. It can't be right. Hence the proper usage is "whom."In certain cases grammatical correctness must often be subordinated to a consideration of taste. For instance, suppose that the same person had met a man whom they said was a street cleaner. The word "whom" is too austere to use in connection with a lowly worker, like a street-cleaner, and its use in this form is known as False Administration or Pathetic Fallacy.

You might say: "There is, then, no hard and fast rule?" ("was then" would be better, since "then" refers to what is past). You might better say (or have said): "There was then (or is now) no hard and fast rule?" Only this, that it is better to use "whom" when in doubt, and even better to re-word the statement, and leave out all the relative pronouns, except ad, ante, con, in , inter, ob, post, prae, pro, sub, and super.

And last, to demonstrate that whom has for a long time been defeating even very classy people -- P.G. Wodehouse, Jeeves in the offing:

Normally as genial a soul as ever broke biscuit, this aunt, when stirred, can become the haughtiest of grandes dames before whose wrath the stoutest quail, and she doesn't, like some, have to use a lorgnette to reduce the citizenry to pulp, she does it all with the naked eye. "Oh?" she said, "so you have decided to revise my guest list for me? You have the nerve, the--- the---"

I saw she needed helping out.

"Audacity," I said, throwing her the line.

"The audacity to dictate to me who I shall have in my house."

It should have been "whom," but I let it go.

"You have the---"

"Crust."

"---the immortal rind," she amended, and I had to admit it was stronger, "to tell me whom"---she got it right that time---"I may entertain at Brinkley Court and who"---wrong again---"I may not. Very well, if you feel unable to breathe the same air as my friends, you must please yourself. I believe the 'Bull and Bush' in Market Snodsbury is quite comfortable."

Posted by Mark Liberman at 08:36 PM

Bible Science stories

Seeded by a breezy Daily Mail article that didn't even get the author's name and book title right, two pieces of quantitative psych-lore have been spreading through the world's media over the past few days: women talk three times as much as men, and men think of sex every 52 seconds, compared to once a day for women. These "facts", we've been told by Matt Drudge and fark.com and dozens of newspapers and CNN, the BBC and NPR, have been "discovered" or "confirmed" by Dr. Louann Brizendine's scientific studies.

The public reaction has mostly been that this is like doing experiments to discover that the sun rises in the east, or to confirm that animals deprived of food will starve. In fact, however, the "facts" about word counts and sexual thoughts are false: Louann Brizendine hasn't done any research on either topic, the sources she cites contain no relevant evidence, and existing studies contradict her claims. You can read about talking here and sexual thoughts here, and more on the pseudo-science of sex differences here.

But to insist on the concept of "fact" in this context is a recipe for frustration. As I've watched the reaction to Louann Brizendine's book over the past few months, I've concluded that "scientific studies" like these have taken over the place that bible stories used to occupy. It's only fundamentalists like me who worry about whether they're true. For most people, it's only important that they're morally instructive.

What would the producers of CNN Headline News, NPR's "Wait, wait, don't tell me" or the BBC's "Have I got news for you" say, if presented with evidence that they've been peddling falsehoods? I imagine that their reaction would be roughly like that of an Episcopalian Sunday-school teacher, confronted with evidence from DNA phylogeny that the animals of the world could not possibly have gone through the genetic bottleneck required by the story of Noah's ark. I mean, lighten up, man, it's just a story.

[Update 12/3/2006 -- Phil Resnik writes:

Peter Sagal of "Wait, Wait, Don't Tell Me" is an old college friend. I asked him what he thought of your blog entry, since you mentioned the show, and he replied with the following (with permission to include on the blog):

Your friend Mark is correct, although our reaction would be something slightly more like: "Ah, come on, it's too good to check." We do a fair amount of 'dumb scientific studies' stories -- in fact, every now and then we devote an entire segment to something like the IgNobels. We are, of course, a satire and comedy show, so we expect our audience to understand that we don't vouch for the study's accuracy, nor should our mention of it be taken by anyone to mean that it's scientifically solid -- no more than anyone listening to us should think that President Bush really is a poopyhead, as confirmed by peer reviewed double blind studies.
True that. Wait, Wait is a different beast than CNN Headline News, I think!

The trouble is, it's pretty clear that CNN Headline News also treats most stories in the human sciences as either "too boring to run" or "too good to check". And most other media outlets are basically the same: the word count and sex-thought frequency factoids have appeared in more than 100 wire service, newspaper, magazine and broadcast stories, at all levels of the journalistic food chain.

Wait, wait, here's an idea: amuse the audience by making fun of nonsense in the news! Nah, that would require insight as well as irony...]

[Update #2 -- fev from Headsup: the Blog emails:

Mark: You're nailing some excellent stuff of late, in particular your comment in pop-science stories:

"For most people, it's only important that they're morally instructive."

It's probly worth bearing in mind, tho, that the comment's equally true of much of what happens in coverage of politics, economics and the like; see how long it takes you to find an assertion in the press that Nixon was impeached because of the efforts of Woodward and Bernstein. Or, on the other side of the fence, that the Iron Curtain collapsed on Ronald Reagan's watch. Both sorts of morality play are, in fact and implication, false, but -- as you correctly note -- they're lessons in how the world ought to work more than accounts of how it does.

For better or worse, that's a function of journalism; it transmits cultural norms and empirical data in roughly equal proportions. The risk is that the audience can't (or doesn't have any reason to) tell the difference. We're trying to work on some of that. Meanwhile, keep on pummeling the Brizendine stuff.

Let's say, "lessons in how someone thinks the world ought to work".

Anyhow, fev has now revealed himself as an anthropologist working undercover as an editor -- I can't wait to read the monograph he'll produce when he drops the disguise. If you haven't picked out a title yet, fev, how about "Tristes Topiques"? (OK, "topique" is a false friend, but still...)]

Posted by Mark Liberman at 04:17 PM

Does anybody have a word for this? Probably not.

Here at the Queries Desk at Language Log Plaza, we get a lot of mail about words -- their meanings, uses, pronunciations, spellings, histories, social statuses, and so on. Often the appropriate response is just a pointer to a standard source (the OED, MWDEU, DARE, whatever); sometimes we are pleased to be offered intriguing data that we didn't know about; and occasionally we're at a loss. In particular, we're not usually prepared to give informative answers to questions of the form "Does anybody have a word for this?"

A little while back Owen Cunningham wrote us to ask "Is there a known language that has a word for this idea?", quoting an episode of the television show "Six Feet Under" in which the principal female character, Brenda Chenowith (played by the wonderful Rachel Griffiths), muses:

You know what I find interesting? If you lose a spouse, you're called a widow, or a widower. If you're a child and you lose your parents, then you're an orphan. But what's the word to describe a parent who loses a child? I guess that's just too fucking awful to even have a name.

My answer to Cunningham's question was: probably not, but not because the loss is so awful. I'll explain.

But first, two clarifications, one about what we're going to mean by "a word" here, one about the concept as described by Brenda.

We're going to have to allow not only simple words but also compound words of several types (crash course, father-in-law, stepmother) and some multi-word phrases (Dutch treat, second cousin). What we're really after is "fixed expressions", of whatever size, so long as they're not semantically transparent, that is "idiomatic fixed expressions"; that's an awkward phrase, so for the moment I'm going to stretch the meaning of word a bit. What WON'T count as a word here is an expression whose meaning is compositional, like aunts and uncles or male cousin; English doesn't have a word for aunts and uncles taken together, or a word for male cousins as opposed to female cousins, though of course we have ways of talking about these people.

A second proviso on words in this context is that they should be used in ordinary (as opposed to technical) language and that they should be reasonably widely known. Yes, there is an anatomical term philtrum for the groove between the mouth and nose, but it is neither an ordinary-language term nor widely known. Yes, some people have invented the (ordinary-language) expressions elbow pit and knee pit (on analogy to armpit), but these expressions haven't gained sufficient currency to appear in dictionaries, even the OED.

So what we're after is "ordinary-language fixed expressions of some currency".

Now, to Brenda's description of the missing word in English: "a parent who loses a child". I'm a fan of this series, and I remember being worried a bit about the way she framed things. For orphan you need to lose both your parents -- English has no word for someone who has lost just one parent, no matter which one (motherless child and fatherless child are too specific, since they cover a particular missing parent; and they are also too broad, since they cover cases in which the parent in question is not known as well as cases in which the parent is not living) -- but Brenda talks about losing A child, which is not parallel to the interpretation of orphan. Of course, English has no noun for either the single-child case OR the all-children case.

My guess would be a word for the one-child case would be very rare in the languages of the world, not because the loss is so awful, but because until recently it was so very common (and still is, in many places).   My Swiss grandfather was one of 14 children, only 8 of whom survived past the age of two. (My great-grandparents, frugally, recycled the names!)

For the all-children case, such a word would only be properly usable when the parent in question is no longer able to bear children, since before then the birth of new children is always possible. Well, I suppose you could have a noun meaning 'someone all of whose children thus far have died'.   Whether either of these meanings is encoded as a word in any language, I don't know -- but it would require that the status in question be somehow culturally significant in the society, as the status of orphans and widow(er)s (and the childless) is in our society. Whether there are societies in which one or another of these statuses is significant is a question for anthropologists, not linguists.

But even if the anthropologists find some cultures like this, there's no guarantee that the associated languages will have words for the statuses in question. The fact is that, though the existence of a word (in the sense I'm using here) in a language indicates that the associated concept is significant in the society in question, languages don't get anywhere near the number of words that they need: a great many culturally significant concepts are not lexicalized. (One result of this fact is that anthropologists and sociologists are forever having to invent technical terminology to refer to these unlexicalized concepts.)

In some domains of meaning, there are whole clusters of missing words; this happens when culturally important semantic features are sometimes undercoded and sometimes overcoded. Take the domain of kinship. In our culture, people's sex is important, and, for relatives, it's important whether they are related to us by blood or by marriage (whether they are consanguineal or affine kin, as the anthropologists put it). Yet, the marking of these features in the ordinary English vocabulary of kinship is a puzzling patchwork.

Ideally, we'd have both more specific words, distinguishing relatives on these dimensions, and also more general words, disregarding one feature so that relatives can be grouped together. Parent vs. mother/father and child vs. daughter/son come close to this ideal situation.   Sibling vs. brother/sister is a more dubious case, since for many people sibling is a technical term. Then we get to cousin, which is undercoded (there's a sex-neutral word, but no sex-specific ones), and niece/nephew, which is overcoded (there are sex-specific words, but no sex-neutral one).

And to aunt/uncle, which is overcoded on one dimension (there are sex-specific words, but no sex-neutral one) and undercoded on another (there are no words distinguishing consanguineal aunts/uncles from affine aunts/uncles).

Then there's sister-in-law/brother-in-law, which are overcoded on the sex dimension, but undercoded in another way. These words encode both an affine and a consanguineal relationship, but with two different scopings: brother-in-law is either spouse's brother or sibling's husband. Many people feel that these two relationships are not equally close -- in marrying, your spouse's family is joined with yours, but when your sister marries, her husband's family is not joined with yours in this fashion -- so that these people find the use of a single word for them uncomfortable. (As a result of the familial closeness of spouse's brother, some people -- I am one -- are willing to extend sister-in-law to spouse's brother's wife.)

In any case, you can feel a need for a word and quite easily have none to hand.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:28 PM

Femail again

Having struck gold with Fiona Macrae's article "Women talk three times as much as men, says study" (11/28/2006), The Femail section of the Daily Mail tried again the next day with Carol Sarler's Why we women will NEVER stop talking" (11/29/2006). Sarler takes a slightly different perspective on the topic: she spells Louann Brizendine's name correctly, and she argues with Brizendine's explanation for the alleged fact that women are several times more talkative than men:

The good doctor says it’s science; I say it’s sociology. Tucked away in the jungle, with identical patterns to their days and identical concerns - appetites and fears — you can scarcely put a syllable between David and Jan or Dean and Myleene.

But in the less rarefied social climate of the real world, there is no such even playing field. Traditional male lifestyles do not require perpetual motion of the mouth [...]

Traditional women’s lifestyles, by contrast, could not function like that. You cannot raise children in companionable silence, nor skip on a chat with the elderly.

You can’t shop without telling somebody what you want, nor yell at the gas company without parting your lips. And when it comes to work, it is noticeable that the ‘caring professions’, which demand a deal of soothing talk, have always been dominated by women.

So there we have it. We are programmed to talk more because we must, and because we must, we can’t be blamed. Men can write rude rabbit-rabbit songs about us all they like, it is not our fault.

This is all sensible and plausible. The trouble is, being more talkative than men is not not only not women's fault, it's not even true. Sarler has presented 1200 words in attempted mitigation of a charge of which her sex is apparently innocent -- details can be found here.

Sarler also observes that Brizendine's book exemplifies a tendency to accept traditional misogynistic stereotypes while spinning them as positive feminist values:

What is our fault, however - and for this we do only have ourselves to blame - is the growing vogue for making a virtue out of necessity.

Where there used to be at least an awareness that you can have too much of a good thing, where a woman would once scold herself, "Oooh, listen to me, chattering away; I really must get on" - many of today’s women actually take pride in their excesses of verbal dribbling: "We are," they will boast to any who will listen, "so much better than men at communicating."

It's interesting to see so much ideology, positive and negative, erected on a foundation of ... nothing at all? Well, there are obviously some deep emotional currents here, and I don't mean to trivialize them. But wherever these convictions about male and female behavior come from, there's no evidence whatsoever that they come from the facts of how (or how much) actual men and actual women talk.

It's depressing to read the comments that these publications evoke. I've quoted a sample in some previous posts; Sarler's article gets similar treatment from its readers:

"Gk Ex-Pat, Australia": They may talk more than men but there is no depth to their conversations. Most women are full of wind and lack the ability to have a meaningful and quality conversation due to their natural desire to complain all the time and put others down.

"Janet H, Melbourne Australia": Can't say that I completely agree, I think the 'strong and silent types' are often the 'intense and disinterested types'. Some men just do not like talking - they've got this far away look in their eye and any attempt to attract interest is pointless. I've often been tempted to change the subject mid-sentence with ''I'm having an affair'' or ''I went shoplifting today'' - to see if I get a reaction.

There sure are a lot of unhappy people out there.

I'm going to adopt an optimistic interpretation: the folks who fill the comments sections in response to articles like these are a self-selected sample, unrepresentative of humanity as a whole.

[Update -- Ben Zimmer writes:

Carol Sarler, by the way, figured in another recent media-driven flurry of linguistic disinformation. See my Slate piece on the bogus trend of name-blending ("Keeping Up With the Smoneses ", 8/16/2006), on which Sarler and other British commentators spilled much ink.

]

Posted by Mark Liberman at 12:20 PM

December 01, 2006

The name's Lastname. Firstname Lastname

James Bond made his special style of self-introduction famous: "Bond. James Bond", he says. (In the new film Casino Royale, which is terrific, Daniel Craig as Bond does it at least twice; one is in the very last frame.) But Richard von Busack, a James Bond fan who works as film critic for a San Jose and Santa Cruz area free newspaper, the Metro, says that in the 1946 film My Darling Clementine, Henry Fonda introduces himself as "Earp. Wyatt Earp". So, von Busack asks, since James Bond didn't start it, who did? How old is it?

Someone around Language Log Plaza will probably be able to make progress on dating it. Perhaps Zimmer. Ben Zimmer. [Update: Already Lew Furr has pointed out to me that in The Big Sleep, another 1946 movie, we get a character introducing himself as "Jones; Harry Jones". That doesn't push it back to before 1946, but Lew thinks that A. A. Fair/Erle Stanley Gardner's private detective Donald Lam might predate it: Lam regularly introduces himself as "Lam; Donald Lam." ]

By the way, another thing von Busack wants to know is what ELLIPSIS stands for in the new Casino Royale. It's a linguistic term for leaving stuff out because it's understood (as in I could kill you, but I won't ____, where the "____" means "kill you"). But in the film, as far as I can see, it's an arbitrary word that has been chosen as the password of the day for a locked door at the Miami International Airport. We see it when it gets texted to an assassin's cell phone. By guessing its intended use (and that part is easy to miss, I think), Bond is able to get into the staff-only areas of the airport and, in one of several stunning chase-and-action suspense scenes (plot spoiler!), thwart a terrorist attack on an airliner.

Posted by Geoffrey K. Pullum at 02:50 PM

Fun with co-voting percentages

[Warning: significant geekery ahead. But those who like this sort of thing will find it just the sort of thing that they like.] I'm at CMU for a workshop on approaches to the analysis of U.S. Supreme Court oral arguments. This comes out of an NSF-funded research project that aims to take the tapes and transcripts from the shelves of the National Archives, and turn them into an on-line resource accessible to the public -- and also to scientists. The group doing this includes Jerry Goldman (a political scientist who is the proprietor of the oyez.org web site at Northwestern), Brian MacWhinney (a psychologist who runs the CHILDES and TalkBank sites), Tim Johnson (a political scientist from the University of Minnesota), and me, along with others at our various institutions. We've made quite a bit of progress towards the goal, and the purpose of this workshop is to engage some of our fellow scientists and scholars in thinking about how to use the data in research, as more and more of it becomes available, in more and more usable forms. The participants are a wide range of interesting people of all kinds, from legal scholars to psychologists, computer scientists, speech pathologists and sociolinguists (including my fellow Language Logger Roger Shuy!)

At some point in this morning's discussion, Tim Johnson mentioned the historical records of how various justices voted on various cases. I wondered whether anyone had analyzed such data using multi-dimensional scaling or similar techniques; and Jason Czarnezki mentioned some work by Peter Hook, recently discussed on the Empirical Legal Studies blog, that used a "spring force algorithm" to create a spatial arrangement of justices, based on mapping higher cumulative co-voting percentages to stronger springs. This isn't exactly the same thing, though it's conceptually similar -- perhaps someone has tried MDS on this problem as well, I don't know. So Jason also sent me a link to a paper in the Harvard Law Review ("Nine Justices ten years: a statistical retrospective", 118(1), November 2004), which presented exactly the sort of matrix needed to try it:

So I typed in the percentages:

100 51.0 81.3 78.0 84.9 63.8 79.4 63.2 63.0
51.0 100 57.0 44.3 58.7 76.2 45.3 78.0 75.6
81.3 57.0 100 70.2 78.5 71.3 70.4 66.1 71.6
78.0 44.3 70.2 100 72.9 55.4 86.7 53.5 51.7
84.9 58.7 78.5 72.9 100 67.9 73.7 66.7 65.6
63.8 76.2 71.3 55.4 67.9 100 54.1 85.6 81.5
79.4 45.3 70.4 86.7 73.7 54.1 100 52.2 51.0
63.2 78.0 66.1 53.5 66.7 85.6 52.2 100 82.1
63.0 75.6 71.6 51.7 65.6 81.5 51.0 82.1 100

And wrote a little R script to try Joe Kruskal's isoMDS algorithm:

Justices <- c("Rehnquist", "Stevens", "O'Connor", "Scalia", 
              "Kennedy", "Souter", "Thomas", "Ginsburg", "Breyer")
 Colors <- c("red", "black", "darkblue", "chocolate",
             "orange", "brown", "seagreen", "tomato1", "olivedrab")
 A <- read.table("SCOTUS.txt") # the table above...
 D <- 1.0 - A/100
 R <- isoMDS(as.dist(D), trace=T)
 xrange <- range(R$points[,1])
 yrange <- range(R$points[,2])
 xinc <- .1*(xrange[2]-xrange[1]); xrange[1] <- xrange[1]-xinc; xrange[2] <- xrange[2]+xinc
 #png(filename="Justices.png", width=700, height=700)
 plot(R$points[,1], R$points[,2], 
 xlab="Dimension 1", ylab="Dimension 2", xlim=xrange, type="n")
 text(R$points[,1], R$points[,2], labels=Justices, col=Colors)

The result:

(The superimposed pair in the upper right of the plot is Scalia and Thomas.)

I guess this layout is a reasonable one.

Of course, that's the trouble with techniques like this -- usually, either they show you something that you already knew, or they show you something that doesn't make any sense. Still, this was easy and fun. It's not linguistics, but I guess it's the sort of thing that you might be able to use as part of a system for trying to understand what's going on in an argument.

[Update -- Fernando Pereira writes:

I thought that a log transform might work better, since "distances" in frequency space are best thought of as log or log ratios (cf. KL divergence). Result attached. As you commented, these things are best to confirm one's prejudices ;)

If you're singing along at home, just substitute:

D <- -log(A/100)
R <- isoMDS(as.dist(D), trace=T)

]

Posted by Mark Liberman at 02:49 PM

Sprung from a common source

Mark Liberman's latest posting on l'affaire Brizendine follows the diffusion of misinformation from Brizendine's book through a recent review of it by Fiona Macrae in the Daily Mail and then on to (at last count) over 60 media outlets. Tracking this diffusion is made possible by idiosyncratic errors in Macrae's piece:

Macrae misspelled Dr. Brizendine's first name as "Luan" (instead of "Louann"), and second, she cited the book as The Female Mind (instead of The Female Brain). These scribal errors are as good as a fingerprint or a hyperlink, and they will allow future scholars of media influence to track the flow of misinformation from Brizendine via Macrae to all sorts of places around the globe, simply by text search.

Here we see an echo (surely intended by Mark) of the methods of historical linguistics, and before that, of studies of textual descent.

The crucial step is to use shared innovations to group languages (or texts) together, as likely to have sprung from a common source. The inference is stronger for a shared innovation that's unusual (no one will be much impressed by languages that share intervocalic voicing of consonants, or word-final devoicing, since these are such common changes; and no one will be much impressed by English texts that share the misspelling of the as teh, or of its as it's, since these are fabulously common errors), and it's stronger when more than one independent innovation is shared. The inference that takes many recent media reports on Brizendine back to Macrae is supported by both types of evidence.

Using "mind" for "brain" (or vice versa) is probably a reasonably common error, so let's put that aside for the moment. But "Luan" for "Louann" seems to be rare indeed: removing dupes, I get 25 webhits for "Louanne Brizendine" and 2 for "Louan Brizendine"; there's a huge pile for "Luan Brizendine", but all of them (so far as I can see) from the last few days.

The evidence for grouping the "Luan Brizendine" spellings together as likely to have sprung from a common source is even stronger than it might at first have seemed, since the "Luan" spelling is actually a composite of two separate misspellings: "u" for "ou" and "n" for "nn". All the other attested misspellings of Brizendine's first name ("Louanne" and "Louan") preserve the "ou" -- "Luanne Brizendine" and "Luann Brizendine" are not attested -- so "u" for "ou" stands out as an unusual error. As for "n" for "nn", the only moderately frequent misspelling of her first name (before Macrae's review), "Louanne", preserves the "nn" as well as the "ou", so this misspelling, too, is unusual.

So much for the misspellings. The other error is "mind" for "brain", which is surely independent of the misspelling; absolutely nothing would predict that someone who makes one of these errors would be likely to make the other. So we have TWO shared independent innovations/errors, and stronger evidence of descent from a common source.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:41 PM

Contagious misinformation

If you're interested in how the press expresses, validates and amplifies popular prejudices, you've got a great opportunity. It starts in the "Femail" section of the Daily Mail on 11/28/2006, where Fiona Macrae reviewed Dr. Louann Brizendine's book The Female Brain ("Women talk three times as much as men, says study"). Macrae focused especially on two of Brizendine's quantitative bullet points:

[W]omen talk almost three times as much as men, with the average woman chalking up 20,000 words in a day - 13,000 more than the average man. [...]
Studies have shown that while a man will think about sex every 52 seconds, the subject tends to cross women's minds just once a day.

Now, just for the record, Dr. Brizendine has never done any research on either of these topics, and none of the sources that she cites provide any support for either the talking numbers or the sex-thoughts numbers. And the scientific studies that actually have counted words find that men and women appear to talk about the same amount, on average, with men sometimes a bit ahead; and the one study I've found that counted sexual thoughts reports that the frequency for males was every 12,300 seconds on average, compared to every 19,200 seconds for the females. If you care about the science, you can read about talking here and sexual thoughts here, and more on the science (and pseudo-science) of all sorts of sex differences here.

But this post is not about the science of sex differences -- we're talking about the epidemiology of influence. And in that respect, Macrae's Daily Mail article contained two pieces of information of great value to science. First, Macrae misspelled Dr. Brizendine's first name as "Luan" (instead of "Louann"), and second, she cited the book as The Female Mind (instead of The Female Brain). These scribal errors are as good as a fingerprint or a hyperlink, and they will allow future scholars of media influence to track the flow of misinformation from Brizendine via Macrae to all sorts of places around the globe, simply by text search.

The rhetoric of the reactions is fascinating. I've cited some in an earlier post, from reader comments on the Daily Mail site and from fark.com, where the dominant reaction was "Why spend money on studying the obvious?" This is a deliciously ironic reaction, since the numbers were apparently invented without any studies being done, and the generalizations based on them appear to be false, and the only fiscal flow has been from the public to Dr. Brizendine and her publisher, Bertelsmann/Random House/Morgan Road.

But it's not just Joe and Jane Sixpack who have reacted in that way. Emil Steiner, blogging at the Washington Post, crafted a more sophisticated version of the same viewpoint in a post on 11/30/2006:

Women talk too much, and men only think about sex. You might expect to find such "high-brow" observations in the pages of Maxim, right next to the "A-B-Cs of B-R-As" column. But in her new book, The Female Mind, clinical psychiatrist and self-titled feminist Dr. Luan Brizendine seemingly uses science to prove what stand-up comics have been telling us for years. [...] Taken together, her work seems to indicate that in inter-gender communication women talk, and men zone out and think about sex. And you need a PhD to work that out?

(Actually, I believe that Dr. Brizendine's degree is an M.D. -- not that it matters.) I was able to find Steiner's post, and to be sure that he got his misinformation via The Daily Mail, because he misspells Dr. Brizendine's first name as "Luan", and calls her book The Female Mind.

I already knew that Ann Althouse had blogged about the Daily Mail article, because she also linked to one of my posts, and so I saw referrals from her site in our server logs. Her commenters are a couple of intellectual steps up from the gang at fark.com, but (as with Steiner) this mainly changes the way they express themselves, not the opinions they express:

Hmmmm.
So. This scientific study suggests women like to talk and men like to fantasize about sex?
This is news?
I once told my wife
"your capacity to talk exceeds my capacity to listen"
Telling the truth CAN get you into trouble.
[timeout to fantasize about sex]
Wow.
Anyway, sometimes science needs to prove the obvious, at least to give us guys some cover.
They needed a study to tell us this?

However, one wag among Ann's commenters manages to mix in an ethnic stereotype, with only six well-chosen words:

Doctor Benzedrine niver met'un Irishmun.

If someone can figure out how to add racial stereotypes in a similarly pithy way, they'll win a sort of Trifecta of Received Opinion.

At the other end of the sophistication food chain, the now-familiar reactions are expressed in a slightly different way at the forum at soompi.com ("K-pop for the masses")

Women talk more than men, what else is new?
hrmm... why am i not surprised...
lol no kidding. Is this even news?
i always thought it was higher than 3 times

And it's not just the Anglosphere. Brecht de Groote writes:

I have read your recent contributions to Language Log on Luan Brizendine's The Female Brain with interest. When a post about the Daily Mail's credulous journalism cropped up, I felt the storm approaching. Surely, worldwide press would not for the umpteenth repeat the ancient vice of treating English newspapers like the fount of knowledge? A quick check on Google News Belgium, France and Germany netted no results. I should have known better.

As I was skimming through www.onzetaal.nl today, I chanced upon a link to newspaper article. The link read, "Vrouwen worden soort van high van praten", which is a rather bizarre way of translating "Women get sort of high of talking". Following the link, I am led to a newspaper article in the Flemish newspaper Het Nieuwsblad. (link: http://www.nieuwsblad.be/Article/Detail.aspx?articleID=DMF29112006_034 ) The article, proudly proclaiming "women speak three times as much as men", citing as its references Daily Mail and Headline News. Now Het Nieuwsblad is not exactly the pivotal point of quality journalism in Belgium.

But the news is spreading. On querying for "Brizendine" on (Flemish) Google, no less than three hits come up. It appears Het Volk and De Standaard have joined the in-crowd. I was not to astounded to see Het Volk, another vestige of ignorance, joining in. I had expected more of De Standaard (link) , however, which is ranked as one of the better-quality papers with normally thorough fact-checking.

It won't take long for other newspapers to chime in with the leading press. Other news agencies across Europe will just suppose the fact-checking has already been done for them. Mischief, thou art afoot...

PS: While writing this e-mail, someone asked whether I had already read De Standaard. On replying I hadn't, the person was so kind as to quote the article on Brizendine at tedious length. See?

So far, news.google.fr only knows about one pickup in French ("Les femmes sont trois fois plus bavardes que les hommes!"), but of course several French media outlets have sued to prevent Google from indexing their news feeds.

[For readers who might have missed it, I should note that Stephen Moss of the Guardian reached Louann Brizendine by phone, for his 11/27/2006 story on sex differences in talkativeness -- and she (with considerable grace) retracted her book's assertions about word counts and speech rates. But the interesting point here is that I have yet to see any other media outlet pick up Brizendine's retraction, during the period that Macrae's sloppy replication of the original misinformation has been picked up by (at last count) more than 60 periodicals and several major broadcast organizations.

Now, can some enterprising journalist persuade Dr. Brizendine to retract the "52 seconds vs. two days" business? Probably, since it's equally unsupported, and she seems to react honestly and forthrightly to being challenged on the facts. And will anyone notice? Probably not, since in this case, the truth is not nearly as much fun as the fiction.]

[Update -- Alex Baumanns observes that the news briefs from the Dutch e-zine taalpost.nl, expertly edited by "Marc van Oostendorp (Genootschap Onze Taal) en Ludo Permentier (Van Dale Lexicografie bv)", did pick up the Daily Mail piece, with a one-sentence description "Vrouwen praten driemaal zo veel als mannen". As Alex says:

It must be said that the section Taalnieuws usually contains all kinds of odd bits of news. Still, it now had the Stamp of Approval of bona fide linguists. Where will this end?

From the point of view of the public's beliefs, I suppose that it'll end pretty much in the same place that it started, but with a kind of dim collective memory of scientific support for popular prejudices.]

[Update -- the meme is spreading through the quiz shows as well. Nicholas Waller writes:

You may be interested to note that the women-say-3x-as-many-words-as-men "research" made it to a sort-of question on BBC1's comedy news quiz "Have I got News For You?" last night (1 Dec 2006). The contestants knew the "answer" as stated in the papers, but not any subsequent debunking from Language Log (and neither did the host, nor indeed the question-setter). But that round was on newspaper headlines.
Regular team captains are Ian Hislop, editor of Private Eye, the fortnightly satirical mag, and Paul Merton, a professional comedian, and the guest host this week, fyi, was Tory parliamentarian Ann Widdecombe, aka Doris Karloff.

And Arnold Zwicky, among others, informed me that both the more-words and the more-sexual-thoughts claims were retailed as facts/discoveries on "Wait, Wait, Don't Tell Me" 12/2/2006.]

Posted by Mark Liberman at 09:45 AM

English has no dialects????

"Italian has a lot of dialects." "Swiss German is almost a different language from Standard German." "The Moroccan speaker of Arabic can barely converse with an Iraqi."

Never mind that statements like these are rather like saying "Golly, there's more than one kind of elephant."

William Grimes' review of Jean-Benoit Nadeau and Julie Barlow's "The Story of French" in the Times on Wednesday dismayed me slightly, in its perpetuation of the misconception that when a language consists of several divergent dialects, it is an unusual circumstance rather than the norm.

The crucial passage in Grimes' review was:

"English speakers ... take a much more casual attitude towards their own language, perhaps because English spread through the British Isles much more rapidly than French did through France, a country where regional dialects persisted until the mid-twentieth century."

Not having gotten to the book yet, I cannot know whether this surmise is from Grimes or the authors of the book. However, it implies that one kind of English was brought to Britain by the fabled Angles, Saxons and Jutes and by itself "took over," while over in France, for some reason when Latin speakers settled there, the result was a sprouting of several "dialects," such that what we know as French has had to be defended as the "proper" variety in comparison to all of its bastard relatives.

But this neglects that the language brought to Britain "took over" as several distinct dialects. Old English documents are mostly West Saxon. But especially by Middle English, we see that the language of southwest England was vastly different (in Cornwall YOU was EE, and HE was AW!). Then in the east there was Kentish (where, famously, a woman asked by a traveller for EGGS thought he was speaking French since she was used to the local term EYREN). And never mind Scots up north, an "English" fitfully comprehensible to standard speakers (remember TRAINSPOTTING?), even today argued by some of its speakers to be a separate tongue.

The Standard English we know emerged only by the 1300's, almost a millennium after English was brought to Britain. And even after that, regional dialects lived on, in a society where reading and writing in standard English were marginal activities to all but a few until several centuries later.

In this light, Grimes continues:

"On the eve of the French Revolution only about 3 million French citizens out of a population of 28 million spoke French well, and as late as 1940 about half of the people spoke a regional dialect as their mother tongue."

But this is not as different from the equivalent situations in Britain as Grimes implies. In the late 1700s, certainly many people in France were more comfortable in regional dialects like that of Picardy, but then over in England vast numbers of people were raised speaking the English dialects of Cornwall or Yorkshire. The same was true in 1940.

And to the extent that in reference to France, dialectal diversity encompasses the likes of Occitan in the south, which is by all metrics a separate language from French, then Scots comes into the equation again, even if it is not entirely as divergent from standard English as Occitan is from Parisian French.

English in Britain, then, has always been a patchwork of dialects just as French has been. And while it is true that French speakers are less tolerant of "mistakes" than English speakers (as will be attested by any Anglophone traveller who has had a Parisian waiter switch to English the second you make a gender mistake, with the exception of attractive young women who they often serenade by pretending to suppose that they are simply from some exotic Francophone location...), Grimes' review suggests that English speakers have been much more laissez-faire about "proper English" than reality indicates.

Right around when France was Revolutioning, various self-appointed grammarian martinets were delineating what "good" English was, such that today, the use of "I" in BILLY AND I WENT TO THE STORE, produced spontaneously by no child, qualifies as a unique example of a grammatical construction that has become effortless to countless millions of adults via prescriptive psychological abuse.

The idea that English speakers are less vigilant about what "good" usage is also fails when we consider things like the highly cosseted register that actors in old movies in America were coached in. The "anything goes" orientation to English sprouted only a few decades ago, amidst the countercultural revolution in the sixties.

With all due respect to William Grimes and the authors of "The Story of French," this had nothing to do with a purported, and erroneous, vision of one kind of English taking Britain by storm in the fifth century. It is TYPICAL for languages to consist of a bundle of dialects, and English, spoken by so many for so long, has hardly been an exception.

Posted by John McWhorter at 03:42 AM

What's "spurious" in English?

In response to my post about having misunderstood his pronunciation of "air accidents" as "ear accidents", Jock McNaught wrote:

I'm flat-eared to have figured in your language log... Not the first time someone has had cause to do a double-attend in respect of my idiolect.

Of course, I suffer, after over a quarter of a century in England, of being accused of "talking like a Sassenach" when I go back to Scotland. Here in England students complain of my "broad Scottish accent". I take this with a pinch of salt nowadays, after a group interrupted one of my lectures to request me to "stop using scotticisms". I replied that I had not been aware of using any. Came the joint response: "yes, you used the word 'spurious' several times". (A sad comment on the state of English education, eh?)

My linguistic heritage is somewhat convoluted: parents from the Glasgow area (urban Glaswegian and West Central Scots), grew up in firstly an isolated fishing village in NE Scotland, then in an equally isolated farming community 20 miles inland. This was at a time when a dialect would change markedly just going from one fishing village to the next a few miles along the coast. The farming community had its own dialect, a variety of Mid Northern Scots (Buchan). The family then moved near Stonehaven, where South Northern Scots intruded (Mearns dialect: see Lewis Grassic Gibbon's works for examples of this - http://www.grassicgibbon.com/). I went to university in Aberdeen, home of "the Doric" (which is largely pronounced like MNS).

However, I rarely use any of these dialects in Manchester, and have moreover modified my pronunciation to be more readily understood by my customers (sorry, students).There's clearly a Scots-of-some-sort base. But the realisation changes according to interlocutor. Nothing new there, just normal language contact and change and adaption under differing sociolinguistic conditions, but it makes it difficult for some people to pin down where I am from (apart from Scotland). The only person who did pin down a more than reasonable linguistic provenance for me was the famed Stanley Ellis of the Survey of English Dialects -- who also had a keen interest in Scots, Welsh, Irish English. There's a good picture of him doing field work at: http://www.bl.uk/pdf/playback31.pdf on page 6 (also contains details of audio resources that may be of some interest to you if you had not known of them). Stanley was also involved in studying the "Yorkshire Ripper tapes". His party piece was to listen to someone then pin their accent down to within a few miles. He had a little difficulty at first with me, but then plumped for the far NE of Scotland, within 20 miles or so of what you might consider to be the main linguistic influence on me.

Here's a good site that gives details on the various dialects of Scots, including phonetic/phonological information, with useful references: http://www.scotsgate.com/

Check out also: http://www.scotsgate.com/survey.html which is the results of a survey into who speaks Scots. As you'll see, 60% of respondents in the NE of Scotland speak a form of Scots, the highest regional number, and of course there is a higher proportion of older people who claim to speak Scots.

Just off to a seminar so will close. However, I know that if I meet someone in the lift (elevator) and they ask me what floor I want (on campus, floors tend to be lettered rather than numbered), they'll have difficulty understanding whether I want 'J' or 'G'...

[Guest post by Jock McNaught]

Posted by Mark Liberman at 12:46 AM