Group glee

One of the best things about teaching undergraduates is how much you learn. Yesterday, a student and I were discussing possible sources for his term paper on "intrinsically funny words", and as we poked around on Google Scholar, we stumbled over Lawrence W. Sherman, "An Ecological Study of Glee in Small Groups of Preschool Children", Child Development, 46(1) 53-61 1975. The abstract:

A phenomenon called group glee was studied in videotapes of 596 formal lessons in a preschool. This was characterized by joyful screaming, laughing, and intense physical acts which occurred in simultaneous bursts or which spread in a contagious fashion from one child to another. A variety of precipitating factors were identified, the most prevalent being teacher requests for volunteers, unstructured lags in lessons, gross physical-motor actions, and cognitive incongruities. Distinctions between group glee and laughter were pointed out. While most events of glee did not disrupt the ongoing lesson, those which did tended to produce a protective reaction on the part of teachers. Group glee tended to occur most often in large groups (7-9 children) and in groups containing both sexes. The latter finding was related to Darwin's theory of differentiating vocal signals in animals and man.

The exact definition of "group glee", from the body of the paper:

A general description of group glee was first established as a very intense, joyfully affective state maintained throughout a majority of the group (one-half or more). To isolate a critical incident of group glee, two crieteria, noted in the codes as behavioral manifestation and ratio, had to be present.

Behavioral manifestation. -- Three categories of overt behaviors through which group glee manifested itself were laughter (Laf), screaming (Scr), and intense physical acts (Phys). Laughter was limited to instances of vigorous and joyful laughter. Screaming as limited to ebullient vocalizations which were emitted either in an organized, chantlike fashion or in random disarray. Intense physical actions were described as joyful physical behaviors such as hand clapping, jumpting up and down, or other intense physical expressions. [...]

Ratio. -- If one of the behavioral manifestations or combinations thereof was recognized, a ratio of the number of children involved in the the incident to the number of children present at its occurrence was calculated. If this ratio was 50% or more, an incident of group glee was noted as being present.

Sherman found ten categories of "precipitating causes". All of them are familiar -- group physical activity (like dancing lessons), cognitive incongruity (painting with a string, or making a speech error), taboo-breaking (transgressing the teacher's authority or using a taboo word like "stinky-pu"), suspense-resolution and terminal points of activities, etc. The student's concept of "instrinsically funny word" came up in a couple of different categories: nonsense words and nonsensical phrases are examples of "cognitive incongruity", whereas words like "underwear" would be in the taboo-breaking category. (The commonest single category among Sherman's precipitating factors was the "me, me, me" response to requests for volunteers, which seems to like a somewhat different kind of behavior from the other sorts of group glee discussed.)

What caught my attention was the table reproduced below:

This shows a slight but significant tendency for group glee to occur more often in mixed-sex groups. You may be able to see the pattern a bit more clearly if we redo the table in terms of the percent of lessons in which group glee occurred:

Group composition
Group glee
(percent of lessons)
All girls
Girls > boys
Girls = boys
Girls < boys
All boys

Sherman's paper was published in 1975, and there seems to have been relatively little work on the subject since then. If I was the Emperor of Academia, we'd have Departments of Group Glee Studies, Institutes for Interdisciplinary Group Glee Research, international workshops on Cross-cultural Group Glee Investigations, annual meetings of the American Group Glee Association (and its splinter group, the Association for Group Glee Science)... Well, maybe not. But at least there'd be some more research. Just think of the applications in stand-up comedy, for example.

Fear and loathing on Massachusetts Avenue

It's just three days before an invited lecture by a linguist at MIT's Brain and Cognitive Sciences (BCS) department, and suddenly (this was on Tuesday) someone from outside the MIT community sends a message to all the lists usually reserved for advertising talks to MIT Linguistics faculty, students, and visitors, and attempts to send it to the BCS department too (though that list turns out to be closed). But the message is not an announcement. It's a diatribe claiming that the invited lecturer is a liar who falsifies the cultural and linguistic evidence (even, curiously, when conflicting evidence is available in his own earlier published works). The sender is concerned that he and his friends might not get a chance to expose the lies and make all these allegations from the floor of the lecture room before being "cut off", so he is getting them in early by email. He ends his barrage of charges with a sarcastic mock advertisement for exploitation of native peoples for personal gain:

You, too, can enjoy the spotlight of mass media and closet exoticists! Just find a remote tribe and exploit them for your own fame by making claims nobody will bother to check!

What we have here is something I've never seen before: an attack ad against a linguist. What the hell is going on? Who is this hated linguist who is publicly libeled and branded as a self-promoting fabricator before he can even arrive at Logan Airport and take the taxi ride to MIT to give his talk?

The linguist at the sending end of this nasty piece of dirty campaigning will remain nameless here (and I repeat, he is not a member of MIT's excellent department). But I can name the linguist under attack: it's Daniel Everett, now at Illinois State University. He's a distinguished scholar in the field of Amazonian linguistics whom I first came to know when Desmond Derbyshire and I published, in Handbook of Amazonian Languages, Volume 1 (Mouton, 1986), a remarkable 200-page chapter of Everett's about the grammar of a fascinating and quite unusual language called Pirahã. Everett's recent work has been discussed in a number of magazine articles in the past few months (that's where the "spotlight of mass media" comes in), and several times here on Language Log (this post includes a reference list of our various mentions of him).

The work Everett will be talking about at MIT has to do with what he says are cultural factors that help to make the Pirahã and their language so unusual. He presents an account of his claims in print, in Current Anthropology 46(4), 2005, pp. 621-647, where you can also read critical comments by other linguists (pp. 635-641), plus Everett's reply to them (pp. 641-644). A very brief paraphrase of his claims follows (in my words, not his, but I'm doing my best simply to report his views).

According to Everett, the Pirahã place such a high cultural value on concreteness and immediacy that they have essentially no time for considerations of matters historical, artistic, or mathematical. They couldn't care less how the universe began or who was alive two hundred years ago; they don't engage in aesthetically motivated pictorial or dramatic art; and they don't do mathematics — they don't even count. This focus on the here and now, the real and concrete, is a positive value for them, not a lack or deficit, so it has great force, enough to account for the fact that their complete absence of interest in the hypothetical and the abstract has remained stable for some 200 years of contented and almost totally monolingual and monocultural life. Some of the features of their language are results of the influence of this culture: they don't have names for the colors of the rainbow, they don't have a system of names for numbers, and their language lacks certain grammatical features that most linguists think are universal — notably, there aren't any tenses that involve reference points distinct from the point of utterance (as in It had vanished, which says that the vanishing took place earlier than some reference point in the past, which itself is earlier than the moment of utterance), and there also isn't any recursive syntactic subordination (clauses [which are inside clauses [which are inside other clauses [which are inside other clauses]]], and so on). These, roughly, are the claims that Everett is prepared to defend in detail in his lecture.

So if Everett has it right, some languages are less like well-studied European languages than we thought any languages were. Fair enough, you might think. But what's all this about lying and self-promoting and exoticizing and misleading the gullible public? Why the hostility and abuse by advance email attack ad?

The attack message seemed to me to reveal a certain anxiety, even panic, which had spilled over into anger. I have ruminated on what is driving this, and I am led to consider three possible motivating forces, perhaps all simultaneously involved: (1) on the linguistics front, a certain defensiveness concerning cherished hypotheses about linguistic universals; (2) on the political and ideological side, a strong reaction against any perceived negative criticism of a Third World people's culture; and (3) with regard to religion, a prejudice against (particularly fundamentalist Protestant) Christianity. I am not going to try and adjudicate things here (though my revulsion against advance discrediting of visiting speakers by defamatory email should be clear). Let me just try to explain, as briefly and neutrally as I can, what I mean by (1)-(3), and where these forces seem to come from, and how they apply to this case. I'll leave it at that.

(1) Linguistic universals. The business about linguistic universals is perhaps the most reasonable of the causes for angst. For those many linguists who closely follow the thinking of Noam Chomsky, it is an article of their scientific creed that languages are really very similar under the skin. Superficially they look wildly different, and the feeling may not very rapidly recede even after some effort at trying to learn a new language, but if you can just attain a deep enough theoretical insight into how things work, you will be able to perceive that in profound ways all languages share the same organizing principles. It is not just that they constitute what philosophers call a natural kind, but that they are skeletally identical. Distracting lexical and morphological variety may blind us to that, but deep down it will be found (through difficult theoretical work) to be true. They regard it as extremely important for their whole branch of the field of linguistics that this should be so, and it threatens very deep beliefs of theirs when a linguist comes along saying something really shocking, like that he knows of a language with, say, no subordinate clauses of any sort.

Everyone should agree that remarkable claims should occasion remarkable amounts of debate. Such debates are what theoretical linguistics is all about. But wait: at MIT there are talks modifying or jettisoning hypothesized linguistic universals every week. Why has this particular talk attracted such a rare thing as a public message of abuse sent out in advance? I think we have to consider the other two points as well.

(2) Political sensibilities. The majority of academics today, especially in the social sciences (including linguistics), bend over backwards to express — and I have no doubt they actually feel — an enormously strong intuitive revulsion against saying anything that might be perceived as even remotely critical of another ethnic, racial, or cultural group or its cultural products — particularly criticism of a poor or Third World culture by people from a dominant Western culture. It is extremely unusual to find anthropologists who will come out and directly attack a culture they know well (it has happened, as in the case of Colin Turnbull's surprising condemnation of the Ik in his book The Mountain People, but it's very rare).

This sensibility is well-meant. People who are hostile to the rights and status of minorities talk about it scathingly using the term "political correctness" (PC), but I'm not interested in giving any coarse anti-PC tirade here; I just want to get a clear view of what's been going on.

The fact is that academics in fields like linguistics, anthropology, and comparative psychology have often spent years trying to be sensitive to the merits and capabilities of despised groups of people. They're not wrong to feel that way, and to some extent I hold views of the same sort myself. I have met ordinary people in Australia who don't know what they're talking about who would be happy to tell me how the Aborigines are vile, stupid, drunken, violent, worthless savages. This disgusts me; but I can just imagine how much more it must infuriate linguists who have devoted thousands of difficult hours studying the astoundingly wonderful languages of Australian Aboriginal tribes, and getting to know and like the people who speak them. Use of the dread phrase "primitive language", or even a hint of it, will often send a linguist over the edge with anger. I think some younger linguists may pick up this attitude of infuriatedness even before they have spent thousands of hours on fieldwork that has taught them just how complex little-studied languages of preliterate peoples can be. That may have happened here.

The malicious sender has picked an odd target in Everett, however. Everett has done his thousands of hours, and he is not saying anything denigratory about the Pirahã. He likes and admires them. He has found their language astonishing, and extremely complicated. Here's what he said about in his Current Anthropology paper:

I thank the Pirahã for their friendship and help for more than half of my life. Since 1977 the people have taught me about their language and way of understanding the world. ... No one should draw the conclusion from this paper that the Pirahã language is in any way "primitive." It has the most complex verbal morphology I am aware of and a strikingly complex prosodic system. The Pirahã are some of the brightest, pleasantest, most fun-loving people that I know. The absence of formal fiction, myths, etc., does not mean that they do not or cannot joke or lie, both of which they particularly enjoy doing at my expense, always good-naturedly. Questioning Pirahã's implications for the design features of human language is not at all equivalent to questioning their intelligence or the richness of their cultural experience and knowledge.

That's what Everett actually says. Yet still, I think, it is so hard for a sensitive linguist or social scientist to hear claims that might be construed as critical of a culture that the deep revulsion may spring up unbidden before the actual claims are even put out on the table, before the lecture is even given, before the taxi has even left the airport.

(3) Religion. And so we come to anti-Christian prejudice. Earlier in his life, Everett was a missionary linguist working with the Summer Institute of Linguistics, the organization of fundamentalist Protestants founded by Kenneth Pike for the purpose of analyzing the remaining undescribed indigenous languages of the world and translating the Bible into each of them. Not everyone knows that Everett left SIL many years ago, and now does not believe Christian doctrines or practice a religious faith. Still, the SIL's work goes on, and it did provide Everett's original motivation to go to Brazil, back in the 1970s, and commence working on the Pirahã language.

In that context, consider these three propositions, which I think are true. (a) There is a tendency for more people to be atheists in academia than in the rest of the population (it doesn't matter why). (b) There is a tendency for social scientists to believe (with some justification, of course) that missionaries over the centuries have done great harm to indigenous people around the world, particularly in earlier centuries. (c) Christian fundamentalists have been doing their own cause great harm among intellectuals by repeatedly attempting crazy things like taking over school boards and pushing creationist or cryptocreationist ideas illicitly into science classes. If you put (a), (b), and (c) together, you have some basis for a certain amount of suspicion toward anyone who is thought to be an active, practising, Protestant-fundamentalist missionary operating or appearing within the academic sphere.

I say you have a basis for some suspicion; I don't say you have an excuse for being prejudiced. I personally think prejudice against Christians is as unedifying and immoral as prejudice against Jews. Suppose (contrary to fact) that Everett were still a missionary: suppose he really did still think that God had instructed him to ensure that the Pirahã can read the Gospel according to St Mark in their native tongue. Would this be grounds for a prejudice so deep that it would insist that everything he did was evil and twisted, and everything he said about the language was some devious lie? I've worked with linguists who are practising Christians. I don't share their religious beliefs, but they seem to be perfectly honest. I don't know why they would lie about something as banal as subordinate clauses (as opposed to the origin of the universe or the issue of whether we have immortal souls). Where's the gain for God in telling lies about tensed complements?

I think that if you now consider the effects of the linguistic, political, and religious points working together, you might be able to put together the rudiments of a sort of psychiatric explanation of the message sender's outburst of anxiety and rage — though not a justification for it.

Anyway, whether that's right or not, I do know this: the lucky people who live in the Boston area (I regret that I now do not) have a chance to hear Everett in person on Friday, because despite the hate campaign he still plans to get in that taxi at Logan Airport and take it to MIT's Building 46. His lecture is called "Culture and Grammar in Pirahã", and it's on Friday, December 1, from noon to 1:30 p.m., in room 46-3310 at MIT (that is, Room 3310 of building 46; MIT people do have a system of number names, and they use them to name buildings). Language Log readers in New England who get there early enough to find a seat can check out what Everett actually says, rather than what his enemies say he says, and then make up their own minds.

[Update: Dan Everett's talk did place as scheduled on December 1; it was not boycotted by the linguists in the area; about 125 people showed up, in fact; and a good, spirited discussion followed in the question period. You can actually listen to it, and look at the handout, thanks to Ted Gibson's lab: handout in PDF form here, and audio for Windows Media Player here.]

Censorship at the Daily Mail?

This is amusing. Apparently whoever moderates readers' comments over at the Daily Mail doesn't want Fiona Macrae's carelessness and credulousness to be exposed. She's the writer who basically copied out the press release for Louann Brizendine's book The Female Brain, using as her lede a factoid (about women talking three times more than men) which has repeatedly been debunked, most recently the day before in the Guardian, and which Dr. Brizendine herself has withdrawn after I pointed out that no actual studies support any similar numbers. (See this Language Log post for a list of links that go into mind-numbing detail on the factual background -- which is that the numbers reproduced in the Daily Mail piece are a pseudo-scientific urban legend, unconnected to any actual study; and that the many studies that do correlate talkativeness and sex find only small differences, often in the direction of more words from men.)

As of this writing, there are some 20 comments on Macrae's Daily Mail article, generally along the lines of

Like we didn't know this already?
Only three times as much?
Why spend money on studying the obvious?
Someone had to do a study to figure this out?
You know... in the 90's they said that women spoke about 6000 words a day, while men spoke around 2000. It seems that the count is different, but the ratio stayed the same. Interesting.
I don't really think that it took several doctors doing a clinical study or writing a book to conclude that women talk more then men! Good Grief ask any husband or honest woman.

Several Language Log readers have sent me email to the effect that they attempted to submit a comment (using the facility at the bottom of the online Daily Mail article), referencing my Boston Globe article ("Sex on the brain, 9/24/2006), or the 11/27/2006 Guardian article, or some of the Language Log posts on the subject. Some have also attempted to correct Macrae's careless mis-copying of the book's name and author, which she renders as The Female Mind by Louan Brizendine instead of The Female Brain by Louann Brizendine -- but none of these comments have appeared on the Daily Mail's site, now nearly two days after the first of them was submitted. Perhaps the intern who moderates comments has gone on to other things.

The many unmoderated comments at Fark on Macrae's story are generally similar to those at the Daily Mail, though some of the farkers are even more straightforwardly misogynistic:

Nah, the male scientists just stopped paying attention at three.
The funny thing is, while women *TALK* three times as much as men, they don't really communicate three times as much information. Thus, a lot of what they say is either redundant or null data.
That is why our eyes glaze over, and we say "yes, dear".
Take all your clothes off and we will pay rapt attention to whatever you are saying.
I saw something about this on 20/20 weeks back - they said the female brain releases a chemical like endorphins when talking, so the vimens actually catch a buzz by yapping so much. Eventually they will evolve where their tongues are hung in the middle so they can flap on both ends...
I thought it just felt like three times as long. My wife's stories take longer for her to tell than the actual event being described.
Listening to women speak is like torture. It is the worst torture.
Its not the endless yapping that really gets to me, its the shrieking laughter that gets louder and more shrill the more women are in a group.
Just have her sit so there's a TV on behind her and over her shoulder, nod, smile, etc. Just try not to exclaim, "God, I'd like to fark that!" when a hottie appears on the TV and the GF is talking about her mother.
"There are, however, advantages to being the strong, silent type. Dr Brizendine explains that testosterone also reduces the size of the section of the brain involved in hearing - allowing men to become "deaf" to the most logical of arguments put forward by their wives and girlfriends."
I call shenanigans, This is feminist crap. I cannot become "deaf" when a harpy is around. "Logical". Ha. No doubt.
From observing my ex-girlfriends, it's like they just don't feel right if they aren't vomiting up an endless stream of words at all times. It's almost always due to expanding every posiible tangent in their "stories" into an avalanche of pointless detail.
STFU, women. Just STFU. The one-sided therapy sessions that you have the nerve to call "conversations" make your boyfriend/hubby fantasize about ditching your chatty ass.

I took the zipped-lip picture from a comment at fark, which was intended (I guess) as as an imaginary solution to the wish expressed in the last comment. But a couple of the fark commenters did link to a Language Log post debunking the factoid.

Somewhere, I expect, there is a web forum where the anti-male counterparts of these comments are on display, in reaction to the same story. The old-fashioned version is something along the lines of "that's because we have to repeat everything three times to get it through men's thick skulls", but there are many newer ones, spinning the "women talk three times more" factoid in terms of men's stereotypically lower verbal ability, men's stereotypical inability to create rapport through communication, men's stereotypical difficulty in expressing or even understanding their emotions, and men's brutish characteristics in general. In fact, come to think of it, that's pretty much the theme of the part of Dr. Brizendine's book that discusses men.

It's interesting how much people enjoy hearing about "scientific studies" that confirm their prejudices, and how easily this allows pseudo-scientific urban legends to be established. The public's appetite for stereotype-affirming bamboozlement explains a lot about science journalism -- and pop psychology books as well. What with population increases and all, there are now thousands of suckers born every minute.

[Update -- Looking around the rest of the Daily Mail's site, it seems that they routinely stop accepting comments after the first ten or twenty, thus maintaining the traditional one-way correction-free newspaper model, while giving the impression of reader participation without the expense and trouble of actually allowing it to take place.]

[Update #2: this issue was picked up by Carol Lloyd at Salon ("'The Female Brain": It's ba-ack!", 11/30/2006). Her conclusion:

Normally when I hear about the latest study confirming some female stereotype, I don't bat an eye. So, we talk more than men, whatever. Maybe it's true, maybe it will be debunked. But peeling back the onion of the book's press coverage gave me pause. At a moment, when enthusiastic publicity is given to studies concluding women spend eight and half years of their lives shopping and proponents of single-sex classrooms argue that boys should be allowed to roughhouse while girls should not, the tenacity of idiotic stereotypes is unsettling. No doubt the study of differences between women's and men's brains will unravel untold wonders, but it's hard to underestimate how rife with scientific imposters the path there will be.


Yet another epicene pronoun: Hu are we kidding?

On his excellent "Web of Language" site, Dennis Baron writes of the latest effort to introduce a non-gender-specific (or "epicene") singular third-person pronoun into English. D.N. DeLuna, a part-time writing teacher at Johns Hopkins University, has proposed using "hu" as a concise replacement for "he or she." As she told the Chronicle of Higher Education, DeLuna intends "hu" to be pronounced as "huh" — "except with not as much aspiration." (Huh?)  In addition to the Chronicle, DeLuna has managed to get press attention from the Los Angeles Times and the Hartford Courant. But despite the flurry of interest, Baron convincingly argues that we shouldn't count on "hu" being any more successful than the dozens and dozens of other epicene pronouns that have been proposed over the past century and a half.

And besides, as has been noted in these precincts countless times (e.g., here, here, here, here, here, here, here, here, and here), English speakers already have a perfectly serviceable word for the job: "they." We even have it on divine authority.

The neuroendocrinologist formerly known as Prince

Trust Ann Althouse to come up with the best summary of the pop psychology of sex differences ("My brain as a hypodermic needle. Your brain as an international airport.", 11/28/2006):

I love when a book explains supposedly scientific information in language that approximates that Prince song "International Lover."

That's such a nice summary, it doesn't matter than Ann gets the name of the book wrong (it should be "The Female Brain", not "The Female Mind"), or that she quotes the Daily Mail's bullet points as if they were true.

If you care about the science as well as the rock lyrics, there are some links here.

Regression to the mean in British journalism

Ironically, just a day after the article by Stephen Moss in the Guardian, "Do women really talk more?" (1/27/2006), which quotes Dr. Louann Brizendine retracting her assertion that "A woman uses about 20,000 words per day while a man uses about 7,000", the Daily Mail published an article by Fiona Macrae, "Women talk three times as much as men, says study" (1//28/2006), which presents Dr. Brizendine's assertion as fact. (For more on the relevant science, see the links here.)

Russell Craig, who describes himself as "a devoted Language Log reader", sent in this note about some of the ripples from the Daily Mail's belated splash into the sex-words pool:

On the Drudge Report yesterday was the following headline: "Women 'talk three times as much as men'...". Of course Language Log has alerted me to be wary of such claims. I checked out the link, which is a story in the UK Daily Mail talking about Dr. Luan Brizendine's infamous book. The opening lines:

It is something one half of the population has long suspected - and the other half always vocally denied. Women really do talk more than men.

In fact, women talk almost three times as much as men, with the average woman chalking up 20,000 words in a day - 13,000 more than the average man.

The story doesn't mention that the study has its nay-sayers, so I decided to post a helpful comment with a link to Language Log. This web site reviews and filters comments, so when the comment had not appeared later in the day, I sent in a new comment, this time with a specific link to some of your articles specifically discussing this book. When I checked back this morning, the option to post comments on this story had been taken away.

This does not say much for the quality of journalism at the UK Daily Mail, particularly their "Femail" division.

Well, as I've often had occasion to remark, the traditional media will never be able to fulfill their undoubted promise as an information source until they can find a way to impose some of the elementary standards of accuracy and accountability that we take for granted in the blogosphere.

[By the way, the misspelling of Dr. Brizendine's first name, which should be "Louann" with two n's, is not Russell's fault -- he got it from the Daily Mail article.]

[Update -- John Lawler points out that

just noticed that there are already 9 pickups of the Daily Mail article on Google News (grouped with one about women's response to porn; one wonders how the content-matching program was trained), from all over the Anglophone world. You can't keep a good factoid down, apparently.

An hour later, the number of pickups is up to 11. Do I hear 20?]

An early New Year's resolution

Note to self: when talking to the press, never mention Eskimos and their words for snow.

In an otherwise clear and fair article on sex differences in talkativeness ("Do women really talk more?", The Guardian, 11/27/2006), Stephen Moss manages to misrepresent both me and the poor old Inuit:

In the end, [Liberman] concluded that the figures were probably based on guesswork, likening the "fact" that women talk more than men to the often stated "fact" that the Inuit have 17 words for snow. Both, he said, were myths. The Inuit actually have only one word for snow; and research shows only minute differences between the amount that men and women talk. "Whatever the average female v male difference turns out to be," he concluded, "it will be small compared to the variation among women and among men; and there will also be big differences, for any given individual, from one social setting to another."

Here's how I used the Eskimo-snow-words analogy in the Boston Globe article from which Moss takes that concluding quote:

EXPERTS TELL US the Eskimos have about four dozen words for snow. Or is it 200? Or seven? Or maybe four? Here's a hint: It's roughly the same number as in English. And here's another hint: Most of the people who throw Eskimo snow-word numbers around don't know anything about it, and haven't bothered to look it up.

A summary of the truth about Inuit snow vocabulary can be found in an earlier Language Log post by Geoff Pullum, "Sasha Aikhenvald on Inuit snow words: a clarification", 1/30/2004. I believe that Moss' reference to the number 17 comes indirectly from a Language Log post by Arnold Zwicky, "Only 17 words for snow", 1/9/2006, referring to a classic snowclone sighting in a book review by Christopher Buckley:

The Inuit language contains -- what? -- 17 different words for "snow"? The AD's must have twice that many for "vomit."

On the phone with Stephen Moss, I may well have mentioned "17" as one of the the smallest of the many invented-and-unsourced counts for Eskimo snow vocabulary; and I may have tried to explain how a single Inuit root can give rise to an indefinitely large number of derived words, given the polysynthetic nature of the language. But I certainly would not have said that "the Inuit actually have only one word for snow".

Frankly, I don't remember exactly how that part of the conversation went. But in the future, I've decided, I'm going to swear off the Eskimos completely when talking with representatives of the fourth estate. No good can possibly come of it.

Word counts

When scientists want to support a factual assertion in print, they either present some experimental evidence, or they add a footnote referencing some earlier publication of evidence. Journalists have an analogous pair of methods: one is to report what they themselves experienced, and the other is to quote an eye-witness, an official spokesperson, or an expert. But every once in a while, journalists act like scientists and do an experiment. In yesterday's Guardian, Stephen Moss gives an example: "Do women really talk more?" For this article, he wired up a man and a woman -- Tim Dowling and Hannah Pool, who (I think) are Guardian staffers -- and recorded and transcribed everything they said for a day.

I think this started because back in September, I wrote a piece in the Boston Globe, "Sex on the brain", about Louann Brizendine's claim that women use about 20,000 words a day and men only about 7,000. This in turn followed up on some Language Log posts during the previous month, which you can find listed here. I noted that none of Brizendine's end-notes provided any factual support for the words-per-day claim; that version of this claim are common in psychological self-help books and even religious tracts; and that the relevant parts of the experimental literature show no meaningful sex difference in talkativeness, with several studies even showing men as slightly talkier.

The Guardian's experiement was consistent with the literature:

Hannah said 12,329* words
Tim said 11,279 words
*Hannah accidentally turned off her recorder for two hours, however, so her real total could be 14,000.

And Stephen Moss even reached Louann Brizendine by phone -- in a picturesque location! -- and she graciously conceded the point:

When I reach Brizendine, just as she is crossing the Golden Gate bridge, she tells me that she has accepted the criticism of the numbers quoted in the book - on both volume of words and rate of speech - and will be deleting them from future editions. Nor will they appear in the UK edition, to be published by Bantam in April. "I understand Mark Liberman's point and I am grateful to him," she says. "He felt I was passing on data that was not nailed down, and thus perpetuating a myth, so it will be taken out in future editions." She admits language is not her specialism, and she had been reliant on the advice of others.

This is excellent journalism. And it warms a linguist's heart to see how engaged Moss gets in the details of the project -- he's learning to do linguistic research, and he seems to have enjoyed it. But I'm afraid that this wasn't very good science, all the same.

In fact, Moss understands this (some of the following is apparently quoted from observations by "our linguist, Dr. Jane Sunderland"):

This is one man and one woman sampled on one, not necessarily, typical day. Moreover, our man admits that he is naturally reserved, while our woman is noted for her effervescence and says she always feels the need to act as a facilitator in conversations. They might almost have been chosen to act out the urban myth of taciturn man and talkative woman. [...]

Tim spent the first part of his recording at home, watching television, not talking to his family, and made two 40-minute tube journeys alone. He spent the day in the Guardian offices - which he doesn't usually - surrounded by people he did not know particularly well, and with his head down. (Hannah was also in the office, but she works there every day and is very relaxed in the environment.) Despite this (and despite at one point describing himself as "a man of few words"), Tim produced more than 11,000 words over 14 hours. [...]

In contrast to Tim, Hannah was with people most of the day (the exception being shopping in Sainsbury's). When you are with people you usually talk to them. (Incidentally, Hannah's figure suggests that for anyone to produce 20,000 words in a day would be difficult.)

We should add that the two subjects in this case knew what the point of the experiment was, and were able to adjust their behavior to influence the results. If you really wanted to draw conclusions about men and women in general, you'd need to record a demographically balanced sample of people in a balanced sample of contexts. With one woman and one man, you could get almost any result at all. It's nice that the Guardian's result was a plausible one, but it's a puzzle for philosophers, I think, why people are so ready to be influenced by the results of single-trial experiments on phenomena they know to be highly variable.

By a curious coincidence, another study featuring the interpretation of word counts from a small sample recently played a prominent role in a major English-language publication. This study was not a one-day journalistic lark, but a serious, decade-long study that has played a major role in influencing a public-policy debate that is central to our society. And yet, it has some issues in common with what Stephen Moss did.

I'm talking about Betty Hart and Todd Risley's classic research on social-class differences in language acquisition (Betty Hart and Todd Risley, "Meaningful Differences in the Everyday Experience of Young American Children", 1995; Betty Hart, "A Natural History of Early Language Experience", Topics in Early Childhood Special Education, 20(1), 2000; Betty Hart and Todd Risley, "The Early Catastrophe: the 30 Million Word Gap", American Educator, 27(1) pp. 4-9, 2003). This work was featured in Paul Tough's article in last Sunday's New York Times Magazine last Sunday, "What it takes to make a student".

Here's the abstract from Hart and Risley (2003):

By age 3, children from privileged families have heard 30 million more words than children from underprivileged families. Longitudinal data on 42 families examined what accounted for enormous differences in rates of vocabulary growth. Children turned out to be like their parents in stature, activity level, vocabulary resources, and language and interaction styles. Follow-up data indicated that the 3-year-old measures of accomplishment predicted third grade school achievement.

This is obviously serious stuff. Here's some of Tough's discussion:

They found ... that vocabulary growth differed sharply by class and that the gap between the classes opened early. By age 3, children whose parents were professionals had vocabularies of about 1,100 words, and children whose parents were on welfare had vocabularies of about 525 words. The children’s I.Q.’s correlated closely to their vocabularies. The average I.Q. among the professional children was 117, and the welfare children had an average I.Q. of 79.

When Hart and Risley then addressed the question of just what caused those variations, the answer they arrived at was startling. By comparing the vocabulary scores with their observations of each child’s home life, they were able to conclude that the size of each child’s vocabulary correlated most closely to one simple factor: the number of words the parents spoke to the child. That varied greatly across the homes they visited, and again, it varied by class. In the professional homes, parents directed an average of 487 “utterances” — anything from a one-word command to a full soliloquy — to their children each hour. In welfare homes, the children heard 178 utterances per hour.

What’s more, the kinds of words and statements that children heard varied by class. The most basic difference was in the number of “discouragements” a child heard — prohibitions and words of disapproval — compared with the number of encouragements, or words of praise and approval. By age 3, the average child of a professional heard about 500,000 encouragements and 80,000 discouragements. For the welfare children, the situation was reversed: they heard, on average, about 75,000 encouragements and 200,000 discouragements. Hart and Risley found that as the number of words a child heard increased, the complexity of that language increased as well. As conversation moved beyond simple instructions, it blossomed into discussions of the past and future, of feelings, of abstractions, of the way one thing causes another — all of which stimulated intellectual development.

Hart and Risley showed that language exposure in early childhood correlated strongly with I.Q. and academic success later on in a child’s life. Hearing fewer words, and a lot of prohibitions and discouragements, had a negative effect on I.Q.; hearing lots of words, and more affirmations and complex sentences, had a positive effect on I.Q. The professional parents were giving their children an advantage with every word they spoke, and the advantage just kept building up.

This is certainly consistent with our expectations -- our stereotypes -- and unlike the 20,000-vs.-7,000 legend, it's based on experimental data. However, as Hart and Risley write:

All parent-child research is based on the assumption that the data (laboratory or field) reflect what people typically do. In most studies, there are as many reasons that the averages would be higher than reported as there are that they would be lower. But all researchers caution against extrapolating their findings to people and circumstances they did not include. Our data provide us, however, a first approximation to the absolute magnitude of children’s early experience, a basis sufficient for estimating the actual size of the intervention task needed to provide equal experience and, thus, equal opportunities to children living in poverty. We depend on future studies to refine this estimate.

They also tell us clearly that their sample was a small one:

Our final sample consisted of 42 families who remained in the study from beginning to end. From each of these families, we have almost 2 1/2 years or more of sequential monthly hour-long observations. On the basis of occupation, 13 of the families were upper socioeconomic status (SES), 10 were middle SES, 13 were lower SES, and six were on welfare.

Now, six is a bigger number than one, obviously, and it's big enough that it makes sense to do statistical significance tests on tables like this one, taken from Hart and Risley (2003):

Families' Language and Use Differ Across Income Groups



13 Professional

23 Working-class

6 Welfare

Measures & Scores Parent Child Parent Child Parent Child
Protest scorea 41   31   14  
2,176 1,116 1,498 749 974 525
   utterances per 
487 310 301 223 176 168
Average different  
   words per hour
382 297 251 216 167 149
a When we began the longitudinal study, we asked the parents to complete a vocabulary pretest. At the first observation each parent was asked to complete a form abstracted from the Peabody Picture Vocabulary Test (PPVT). We gave each parent a list of 46 vocabulary words and a series of pictures (four options per vocabulary word) and asked the parent to write beside each word the number of the picture that corresponded to the written word. Parent performance on the test was highly correlated with years of education (r = .57).
b Parent utterances and different words were averaged over 13-36 months of child age. Child utterances and different words were averaged for the four observations when the children were 33-36 months old.

But six should not be a big enough number to lay our concerns to rest. We wouldn't try to predict the results of a national election based on an in-depth survey of six people in one city. Should we make national educational policy based on a similarly small sample,even if the data comes from 2 1/2 years of monthly visits? Does a sample of children from six poor families in 1980's St. Louis, as observed in a monthly visit from researchers with recording equipment, gives a meaningful picture of the experience of the millions of people that Tough's article takes them to represent? In particular, it's not clear how to reconcile this picture of monetary poverty engendering linguistic poverty with the central role that "lower SES" people have always played in American linguistic creativity.

This is not a criticism of Hart and Risley, who did a marvelous piece of research. But I think that it amounts to a criticism of several related scientific disciplines, including my own. More than a decade after Hart and Risley's first publication, the "future studies" that they "depend on to refine [their] estimate" are mostly still just as hypothetical as ever.

[Update -- Mark McConville writes:

I remember the Guardian's Polly Toynbee using this research three years ago to argue for non-selective education -- "We can break the vice of the great unmentionable", 1/2/2004.

I'm not sure to what extent she has misrepresented the sample size though:

"Meaningful Differences in the Everyday Experience of Young American Children is one of the most thorough studies ever conducted."

"There is no room here to do justice to this epic analysis, but no one could fail to be convinced by it."

She didn't mention it was based on just "six poor families in 1980's St. Louis" :-)

Well, it was indeed an "epic analysis", especially for its time -- they collected 42*2.5*12 = 1,260 hours of recordings, transcribed and coded them in many ways, and then followed the same kids through their subsequent career in school, and cross-correlated everything with everything, more or less. That doesn't change the fact that the "welfare" group, whose kids are at greatest risk of low achievement in school, was N=6.]

Very very good pope

This morning I heard the voice of a Turkish woman, with a thick accent, being interviewed in an Istanbul street by an NPR reporter. The woman said:

Jampol very, very good.

Jampol is John Paul; she was speaking about the last Pope (who was popular and well received when he visited Turkey), and she was saying that he thought him a very, very good man.

And it made me realize that there was something amazing about what she said: her English was bad (note the lack of is in her utterance quoted above), but in one way it went beyond anything that had been described in English grammar textbooks by the end of the 20th century.

At least, I can say this much (and the part that follows has been slightly revised since I first posted it, to make it more accurate): I have never been able to find a published grammar which gives a full description of the following fact: pre-head modifiers of both adjective and adverb categories, in noun phrases and adjective phrases and adverb phrases, can be repeated to express intensification of the expressed quality, the number of repetitions being a signal of the degree of intensification (so that very, very good is better than very good; big, big, big problems are bigger problems than big, big problems, and so on).

Now, I am assuming that this is one generalization, not two. That is, I am taking very, very good and a good, good man to be instances of the same phenomehon. This could be wrong. But if it's right, I don't know of any prior grammar that points it out in full generality. The closest approach is in the best of the earlier large grammars, Randolph Quirk et al.'s A Comprehensive Grammar of the English Language (Longman, 1985), page 473: "Some intensifiers can be repeated for emphasis." (It's not really emphasis, it's intensification; I used the word "emphasis" in the first version of this post, but I shouldn't have.) An additional observation is made there: "the repetition is permissible only if the repeated items come first or follow so". That is, so very very nice is possible, and much much too kind, and very much too kind, but not *very much much too kind. Nice point.

What I don't find anywhere in Quirk et al.'s big book is the observation that attributive adjectives can repeat for intensificatory effect as well, as in a good, good man.

When I realized in 1999 that intensificatory reduplication (of both adjective modifiers in the noun phrase and adverb premodifiers in adjective phrases and adverb phrases) needed to be described in the Adjectives and Adverbs chapter of The Cambridge Grammar of the English Language, I rummaged around in all the earlier reference grammars I could find to see what they had said about it, and the answer was that the exact facts had apparently never been recorded. What Rodney Huddleston and I wrote for Chapter 6 of The Cambridge Grammar (pages 561-562) was apparently the first description that dealt with both adjectives and adverbs.

Yet the Turkish woman apparently knew how to do this kind of intensificatory repetition. At least, she knew the part of it that applies to adverbs like very. She has almost certainly never seen either the Quirk volume or The Cambridge Grammar (the latter has only been out four years and costs about $150). So how did she know you could repeat such words for intensificatory effect, when it is almost inconceivable that she could have learned it from a book?

You might say this is no big deal: it's not a difficult thing to learn, and hardly needs a description. I don't know about the first part of that, but the second part is not true. Here's why you need a description: not all adjectives can reduplicate, and which ones can does not follow from any basic principle known to me. This is not relevant for very, which really only occurs as a pre-head modifier; but there's an interesting general point about this not being a matter of mere basic common sense. Notice that in the following examples, the starred ones are not grammatical:

 Just then a huge huge huge spider appeared.
*The spider looked huge huge huge in comparison to the fly.
 Whether Airbus can overcome its major, major problems is not clear.
*Whether Airbus's problems are major, major is not clear.
 We need a good, good man to do this job.
*We need a man who is good good to do this job.

The generalization is simple: you can reduplicate an adjective for emphasis if it's in what The Cambridge Grammar calls attributive function, but not if the adjective is in predicative function. I don't see how you could have guessed that if you hadn't looked at the data and I hadn't told you what the answer was.

There are similar facts regarding adverbs. As pre-head modifiers they can reduplicate, but as (for example) verb phrase adjuncts they cannot:

 They did a really nice thing for me on my birthday.
 They didn't need to do anything for my birthday, really.
 They did a really, really nice thing for me on my birthday.
*They didn't need to do anything for my birthday, really really.
 He is totally awesome.
 The program wasn't eliminated totally.
 He is just totally, totally awesome.
*The program wasn't eliminated totally totally.

Well, the obvious answer to how the Turkish woman learned to reduplicate the modifier very is that she had heard people who spoke English saying very very, and she knew enough to imitate them in this regard.

Another possibility would be that reduplication of modifiers for emphasis happens to be a property of Turkish (I have been told that this statement is indeed true), and the woman tacitly knew just enough about English (namely that very was a pre-head modifier in an adjective phrase) that she was able to unthinkingly transfer it from Turkish to English, and by good luck she was right, because it is a feature of English too.

Somewhat less plausible, in my view, would be a Chomskyan line: that reduplication of modifiers for emphasis is a linguistic universal, held in common by all natural languages and built into human brains at birth or conception, so no one ever has to learn it.

I don't know which is right. But I have sometimes seen statements, by philosophers and other people who haven't done much close-up study of language acquisition process, suggesting that foreign adults learn the rules of the target language out of books, or are told what the rules are by their teachers. In the case at hand, such statements seem extraordinarily implausible. If it's right that no grammarian had written any account of this simple feature of English before 2002, we can be sure at least that any foreign speaker of English who has learned emphatic reduplication of pre-head adjective and adverb modifiers learned it in some other way than by reading about it in a grammar textbook, and any teacher who has ever given a lesson on it deserves to be congratulated for having done some original research.

[Update: I know I'm going to be flooded with mail (I won't be able to answer it all) from people who insist you can reduplicate predicative adjectives, and they'll send me examples like You were wrong, wrong, wrong!. Briefly, let me point out that you have to distinguish among different but superficially similar phenomena. Certainly, you can pause at the end of a sentence after an adjectival predication, and simply repeat the adjective, like this: It is disgraceful. Disgraceful. But in fact it's the whole adjective phrase that's repeated here: It is sad to see. Sad to see.

And you can do the same with any kind of phrase; it doesn't need to be an adjective phrase, it could be a preposition phrase: It is beyond belief. Simply beyond belief. You can even interrupt yourself and do a repetition in the middle of a sentence, as in the famous line from Casablanca: "I am shocked, shocked, to find that gambling is going on here." But these examples involve the intonation breaks associated with parenthetical additions or interruptions (and they are often written with dashes rather than commas). They do not have the quality the smoothly integrated arbitrary repetition for degrees of intensification that you find with attributive adjectives modifying nouns and adverbs modifying adjectives.

My examples above were carefully designed to be unsuitable for the sort of parenthetical restatement I'm referring to here. It's not that no one can find a predicative adjective being repeated; it's that close attention to the details reveals that attributive adjectives and pre-head adverb modifiers are being reduplicatively emphasized in ways that predicative adjectives and post-head adverb phrase adjuncts are not.

One other point: another undescribed feature of English I discovered in 1999, also described in The Cambridge Grammar (page 562) was tautologous use of synonymous but distinct adjectives for intensification, as in tiny little bird (or little tiny bird) or great big hole. And again, this is restricted to attributives: no one says *The bird was little tiny, or *The hole was great big.]

Slurry accent II

OK, there's new data, and so I've got a new hypothesis about how Lawrence Henry came to refer to "a kind of commercial London speech known as 'slurry.'" It wasn't a slip of the ear, heard in place of "Estuary". It wasn't a malapropism, merging "estuary", "Surrey" and "slang". No, it was a textual misreading.

When I posted about this a few days ago ("Slurry", 11/24/2006), I also wrote a letter of inquiry to the editor of The American Spectator, where Henry's column was published. They put my note into their online reader mail (including my own slip of the fingers, "slip of the error" for "slip of the ear"!), and Lawrence Henry responded:

I learned of the accent I call "slurry" from none other than Dick Francis, can't now remember which novel. He described it in some detail as a kind of commercial affection based on suburban London, and gave extensive examples in a character's speech.

Dick Francis has written many books, I'm afraid -- too many for me to search, even if I owned them all. If anyone knows or can find where he used the word slurry to describe an accent of the right kind, please let me know. While waiting for the true citation, though, I have a guess about what has happened. If we search books on a9 for {"slurry accent"}, we get seven results. None of them are to Dick Francis books. But they're like the sample of four below:

Dorothy Garlock, The Listening Sky, p. 17: "If she's got her eye on the boss man, it'll do her no good:" This voice had the slurry accent of the South. "Who ain't got a eye on him? Lordy mercy."

A.J. Zerries, The Lost Van Gogh, p. 72: "Hello, Ryder, and not so tense -- it's your old friend, Aaron," said the man on the stoop in a slurry accent.

Kavita Daswani, The Village Bride of Bevery Hills, p. 38: I kept my eyes lowered as I heard these people in their relaxed, slurry accents talking about what had happened that morning or debating between Chinese and a sandwich.

Paul Garrison, Red Sky at Morning, p. 129: "We will speak English to spare your sailors unnecessary distress." "Admiral," the well-dressed Wong responded with a slurry accent." His English came less easily than Admiral Tang's.

It would be easy to misread a phrase like a slurry accent as if it were analogous to "a brummy accent" rather than to "a fussy accent". So my hypothesis is that Dick Francis described a character's affectation of Estuary English, using a phrase like "a slurry accent" or "his slurry accent", and Lawrence Henry misread slurry as a name rather than a description.

[Update -- Ben Zimmer writes:

I haven't found any references to a "slurry" accent in a Dick Francis novel, but he frequently uses "sloppy" to describe a disfavored southeastern (UK) speech pattern:

To The Hilt, p. 19
I'm not good at voices and accents, but I'd say his was sloppy southeast England.

Trial Run, p. 29
They had a rough, sloppy way of speaking, swallowing all the consonants. Southern England. London or the Southeast , I should think, or Berkshire.

Twice Shy, p. 41
I listened to the utterly English sloppy accent and thought that it couldn't have less matched the body it came from.

And it's not like Francis is unfamiliar with "Estuary" as a dialectal descriptor:

Shattered, p. 22
Her accent was Estuary, Essex or Thames: take your pick.


Posted by Mark Liberman at 05:14 PM

Map of South Asia

I came across this terrific map on Wikipedia. It shows South Asia, with the names of the various countries, the states and union territories of India, and the provinces of Pakistan in the local language and writing system. The caption भारत in the lower right says "India" in Hindi, but the map actually includes Sri Lanka, Pakistan, China, Tibet, Nepal, Bangladesh and Burma as well. [Click on the map for a larger version.]

Map of South Asia with Native Names

In addition to being pretty, it's a nice reminder of some of the writing systems you (or at least I) have still to learn. Sinhala, in particular, continues to intimidate me. All those curlicues give me a headache.

Addendum 2006-11-27

Reader Vajra Chandrasekara points out that Sri Lanka is mis-spelled. It is written ශ්රී ලංකාව, which makes sense but is not the way the word is actually written, which is ශ්‍රීලංකාව. Stephen Carlson points out that the Chinese for "People's Republic of China" is in traditional characters, not the simplified characters in use in China today. For example, the last character, "country", would be 国 in simplified characters. I'm not sure why that is - the little bit of information on the creator's Wikimedia user page indicates that he is "from China". Anyhow, its okay with me - I prefer traditional characters.

Greetings, comrade

This explains a lot of public discourse about language, at least in America:

You could start with Mark Twain's famous linguistic takedowns of Fenimore Cooper and Mary Baker Eddy, and work forward. The traditional British equivalent is mocking social inferiors who presume above their station -- or is that an unfair stereotype? Anyhow, both impulses together support the Bushisms business, whereby elitists can mock the powerful for being low-class.

Doublespeak and the War on Terror

A briefing paper entitled Doublespeak and the War on Terrorism by Timothy Lynch of the Cato Institute seems to be getting belated attention. It appeared in September, but this AP report by Calvin Woodward came out today. The briefing paper addresses the attempt of the Bush Administration to make more palatable its violations of civil liberties by using doublespeak, e.g. dubbing "warrants" "national security letters" in the hope that the courts will be fooled into thinking that judicial oversight is not required, or describing the suicide attempts of prisoners at Guantanamo Bay (referred to by the Bush Administration as "detainees", as if they were witnesses to a traffic accident asked by the police to remain until they could be interviewed) as "self-injurious behavior incidents".

The AP article adds a few examples from other areas, such as the use of "food insecurity" rather than "hunger", and "redeployment" rather than "retreat" in reference to Iraq. It also suggests that advocates of abortion rights speak of "choice" because "abortion" sounds unpleasant. It may be true that abortion rights advocates prefer to avoid the term "abortion", but I think there's more to it than that. Describing one's movement as "pro-abortion" suggests that one actually favors abortion, that is, considers that abortions are a fine thing. Few if any advocates of abortion rights take such a position. Their position is, rather, that women should have the right to have an abortion if they consider it the best choice: "pro-choice" really is a more accurate description than "pro-abortion". In the abortion debate if one wants an example of the use of propagandistic use of language, it is the use of the self-designation "pro-life" by opponents of abortion rights. Opponents of abortion rights are not in general advocates of a "pro-life" stance: many of them are quite sympathetic to military activity and favor the death penalty, both of which are considered by many others to be "anti-life" stances. And those who oppose abortion under any circumstances, even when the life of the mother is threatened, are not "pro-life" even in this narrow context. Rather, they take a position that values the life of the foetus over that of the mother. So "anti-abortion" is a much more accurate term than "pro-life".

If it bothers you that the doublespeak addressed in the Cato Institute paper is all on the part of the right wing (that is, the authoritarian branch - the Cato Institute is itself considered right wing, but it represents the libertarian branch), a better example of left wing doublespeak would be "diversity", a replacement for "affirmative action" meant to persuade people that something different and better is intended.

Descriptivism in literature

While Mark was reporting on prescriptivism in Pynchon, I found a nice example of level-headed descriptivism in Philip Roth's I Married a Communist (not Roth's newest, but I've been catching up). It's on the first page of chapter 4:

It was like penetrating a foreign language and discovering that, despite the alienating exoticism of its sounds, the foreigners fluently speaking it are saying no more than what you've been hearing in English all your life.

Obviously, this is being used as a metaphor for something else; it's not as direct as Mark's example. Still, it's very clearly descriptive as opposed to prescriptive.

Please feel free to add more examples of this kind in the comments area.

Prescriptivism in literature

I've been reading Thomas Pynchon's new novel, "Against the Day", and found a dramatization of linguistic prescriptivism on the very first page:

"Oh, boy!" cried Darby Suckling, as he leaned over the lifelines to watch the national heartland deeply swung in a whirling blur of green far below, his tow-colored locks streaming in the wind past the gondola like a banner to leeward. [...] "I can't hardly wait!" he exclaimed.

"For which you have just earned five more demerits!", advised a stern voice close to his ear, as he was abruptly seized from behind and lifted clear of the lifelines. "Or shall we say ten? How many times," continued Lindsay Noseworth, second-in-command here and known for his impatience with all manifestations of the slack, "have you been warned, Suckling, against informality of speech?" With the deftness of long habit, he flipped Darby upside down, and held the flyweight lad dangling by the ankles out into empty space -- "terra firma" by now being easily half a mile below -- proceeding to lecture him on the many evils of looseness in one's expression, not least among them being the ease with which it may lead to profanity, and worse. As all the while, however, Darby was screaming in terror, it is doubtful how many of the useful sentiments actually found their mark.

Though I'm sure that there are many literary examples of prescriptivism -- several in Tom Sawyer and Huckleberry Finn alone, no doubt-- I can't actually call any to mind at the moment. Wait, I take that back, here's (a marginal) one from P.G. Wodehouse. Anyhow, send me your favorites, and I'll add them to this post.

[I thought that I remembered some prescription in Tom Sawyer or Huckleberry Finn, but I haven't turned it up. However, I did find this interesting passage from A Tramp Abroad, which presents the view that "bad grammar" is a natural cause for shame:

Animals talk to each other, of course. There can be no question about that; but I suppose there are very few people who can understand them. I never knew but one man who could. I knew he could, however, because he told me so himself. He was a middle-aged, simple-hearted miner who had lived in a lonely corner of California, among the woods and mountains, a good many years, and had studied the ways of his only neighbors, the beasts and the birds, until he believed he could accurately translate any remark which they made. This was Jim Baker. According to Jim Baker, some animals have only a limited education, and some use only simple words, and scarcely ever a comparison or a flowery figure; whereas, certain other animals have a large vocabulary, a fine command of language and a ready and fluent delivery; consequently these latter talk a great deal; they like it; they are so conscious of their talent, and they enjoy "showing off." Baker said, that after long and careful observation, he had come to the conclusion that the bluejays were the best talkers he had found among birds and beasts. Said he:

"There's more TO a bluejay than any other creature. He has got more moods, and more different kinds of feelings than other creatures; and, mind you, whatever a bluejay feels, he can put into language. And no mere commonplace language, either, but rattling, out-and-out book-talk--and bristling with metaphor, too--just bristling! And as for command of language--why YOU never see a bluejay get stuck for a word. No man ever did. They just boil out of him! And another thing: I've noticed a good deal, and there's no bird, or cow, or anything that uses as good grammar as a bluejay. You may say a cat uses good grammar. Well, a cat does--but you let a cat get excited once; you let a cat get to pulling fur with another cat on a shed, nights, and you'll hear grammar that will give you the lockjaw. Ignorant people think it's the NOISE which fighting cats make that is so aggravating, but it ain't so; it's the sickening grammar they use. Now I've never heard a jay use bad grammar but very seldom; and when they do, they are as ashamed as a human; they shut right down and leave.

On first reading this, I thought that "bad grammar" might be a euphemism for cussing, but I don't think it is, since Jim Baker goes on to explain that

Now, on top of all this, there's another thing; a jay can out-swear any gentleman in the mines. You think a cat can swear. Well, a cat can; but you give a bluejay a subject that calls for his reserve-powers, and where is your cat? Don't talk to ME--I know too much about this thing; in the one little particular of scolding--just good, clean, out-and-out scolding--a bluejay can lay over anything, human or divine.


Not necessarily literature, but my all-time favorite example of grammatical authoritarianism comes from the film "Life of Brian."

[Brian is writing graffiti on the palace wall. The Centurion catches him in the act]
Centurion: What's this, then? "Romanes eunt domus"? People called Romanes, they go, the house?
Brian: It says, "Romans go home"
Centurion: No it doesn't ! What's the Latin for "Roman"? Come on, come on!
Brian: Er, "Romanus"!
Centurion: Vocative plural of "Romanus" is?
Brian: Er, er, "Romani" !
Centurion: [Writes "Romani" over Brian's graffiti] "Eunt"? What is "eunt"? Conjugate the verb, "to go"!
Brian: Er, "Ire". Er, "eo", "is", "it", "imus", "itis", "eunt".
Centurion: So, "eunt" is...?
Brian: Third person plural present indicative, "they go".
Centurion: But, "Romans, go home" is an order. So you must use...?
[He twists Brian's ear]
Brian: Aaagh ! The imperative!
Centurion: Which is...?
Brian: Aaaagh ! Er, er, "i" !
Centurion: How many Romans?
Brian: Aaaaagh ! Plural, plural, er, "ite" !
Centurion: [Writes "ite"] "Domus"? Nominative? "Go home" is motion towards, isn't it?
Brian: Dative !
[the Centurion holds a sword to his throat]
Brian: Aaagh ! Not the dative, not the dative ! Er, er, accusative, "Domum"!
Centurion: But "Domus" takes the locative, which is...?
Brian: Er, "Domum"!
Centurion: [Writes "Domum"] Understand? Now, write it out a hundred times.
Brian: Yes sir. Thank you, sir. Hail Caesar, sir.
Centurion: Hail Caesar ! And if it's not done by sunrise, I'll cut your balls off.

Daniel Barkalow adds a relevant note about this example:

Of course, the centurion in Life of Brian is a great example, because he messes up the rules he's enforcing. The locative is "domi" and it's used for locations, not motion towards. The fact that "domus" takes the locative is relevant, but it just means that accusative appears without a preposition (much like "go home" in English, in fact).

And from Simon Cauchi:

You asked us to send you our favourites. Here are two of mine, the first relating to speech and the second to writing.

The poem "The Schoolmaster" (subtitle "abroad with his son"), by C. S. Calverley (1831-84), consists of eight stanzas. The fourth stanza goes like this:

The noise of those sheep-bells, how faint it
   Sounds here -- (on account of our height)!
And this hillock itself -- who could paint it,
   With its changes of shadows and light?
Is it not -- (never, Eddy, say "ain't it") --
        A marvellous sight?

And there's also Calverley's "Forever", which I will copy out in full because all the online texts seem to be corrupt:

Forever; 'tis a single word!
  Our rude forefathers deem'd it two:
Can you imagine so absurd
        A  view?

Forever! What abysms of woe
  The word reveals, what frenzy, what
Despair! For ever (printed so)
        Did not.

It looks, ah me! how trite and tame!
  It fails to sadden or appal
Or solace -- it is not the same
        At all.

O thou to whom it first occurr'd
  To solder the disjoin'd, and dower
Thy native language with a word
        Of power:

We bless thee! Whether far or near
  Thy dwelling, whether dark or fair
Thy kingly brow, is neither here
        Nor there.

But in men's hearts shall be thy throne,
  While the great pulse of England beats:
Thou coiner of a word unknown
        To Keats!

And nevermore must printer do
  As men did longago; but run
"For" into "ever," bidding two
        Be one.

Forever! passion-fraught, it throws
    O'er the dim page a gloom, a glamour:
It's sweet, it's strange; and I suppose
        It's grammar.

Forever! 'Tis a single word!
  And yet our fathers deem'd it two:
Nor am I confident they err'd;
        Are you?


[Artur Jachacy sends in a selection from Stephen Fry's autobiography Moab is My Washpot:

I had had good teachers. At prep school an English master called Chris Coley had awoken my first love of poetry with lessons on Ted Hughes, Thom Gunn, Charles Causley and Seamus Heaney. His predecessor, Burchall, was more a Kipling-and-none-of-this-damned-poofery sort of chap, indeed he actually straight-facedly taught U and Non-U pronunciation and usage as part of lessons: 'A gentleman does not pronounce Monday as Monday, but as Mundy. Yesterday is yesterdi. The first 'e' of interesting is not sounded,' and so on.

I remember boys would get terrible tongue lashings if he ever overheard them using words like 'toilet' or 'serviette'. Even 'radio' and 'mirror' were not to be borne. It had to be 'wireless' and 'glass' or 'looking-glass'. Similarly we learned to say formidable, not formidable, primarily not primarily and circumstance not circumstance and never, for a second would such horrors as cirumstahntial or substahntial be countenanced. I remember the monumentally amusing games that would go on when a temporary matron called Mrs Amos kept trying to tell boys to say 'pardon' or 'pardon me' after they had burped. The same spin upper-middle-class families get into to this very day when Nanny teaches the children words that Mummy doesn't think are quite the thing.

'Manners! Say "pardon me".'

'But we're not allowed to, Matron.'

'Stuff and nonsense!'

It came to a head one breakfast. Naturally it was I who engineered the moment. Burchall was sitting at the head of our table, Mrs Amos just happened to be passing.

'Bre-e-eughk!' I belched.

'Say "pardon me", Fry.'

'You dare to use that disgusting phrase, Fry and I'll thrash you to within an inch of your life,' said Burchall, not even looking up from his Telegraph — pronounced, naturally, Tellygraff.

'I beg your pardon, Mr Burchall?'

'You can beg what you like, woman.'

'I am trying to instil,' said Mrs Amos, (and if you're an Archers listener you will be able to use Linda Snell's voice here for the proper effect, it saves me having to write 'A am traying to instil' and all that), 'some manners into these boys. Manners maketh man, you know.'

Burchall, who looked just like the 30s and 40s actor Roland Young — same moustache, same eyes — put down his Tellygraff, glared at Mrs Amos and then addressed the room in a booming voice. 'If any boy here is ever told to say "Pardon me", "I beg your pardon", or heaven forfend, "I beg pardon", they are to say to the idiot who told them to say it, "I refuse to lower myself to such depths, madam." Is that understood?'

We nodded vigorously. Matron flounced out with a 'Well, reelly!' and Burchall resumed his study of the racing column.

Some readers may need a refresher course in the mysteries of the whole U vs. non-U thing, which most Americans find roughly as familiar as the interpretation of West African scarification patterns. ]

[And how could I forget this previously posted passage from Wodehouse's Jeeves in the Offing:

Normally as genial a soul as ever broke biscuit, this aunt, when stirred, can become the haughtiest of grandes dames before whose wrath the stoutest quail, and she doesn't, like some, have to use a lorgnette to reduce the citizenry to pulp, she does it all with the naked eye. "Oh?" she said, "so you have decided to revise my guest list for me? You have the nerve, the--- the---"

I saw she needed helping out.

"Audacity," I said, throwing her the line.

"The audacity to dictate to me who I shall have in my house."

It should have been "whom," but I let it go.

"You have the---"


"---the immortal rind," she amended, and I had to admit it was stronger, "to tell me whom"---she got it right that time---"I may entertain at Brinkley Court and who"---wrong again---"I may not. Very well, if you feel unable to breathe the same air as my friends, you must please yourself. I believe the 'Bull and Bush' in Market Snodsbury is quite comfortable."


Cyber Monday vs. eDay

As countless media reports are informing us, tomorrow is "Cyber Monday," the day that supposedly kicks off the online holiday shopping season. The brazenly cynical coinage of "Cyber Monday" was recounted here last year, when the masterminds at saw "an opportunity to create some consumer excitement" by anointing the Monday after Thanksgiving with a new title modeled on "Black Friday." The idea was to make "Cyber Monday" a kind of self-fulfilling prophecy, boosting online sales on a day that had previously ranked as only the twelfth busiest on the shopping calendar.

So how much did last year's "Cyber Monday" hype pay off? Depends who you ask. According to press accounts relying on statistics from, the Monday after Thanksgiving was the second biggest day for online retail sales in 2005. But as far as I can tell by's holiday shopping report, all they can actually claim is that Cyber Monday received the second-most votes in a survey asking retailers, "What day during the 2005 holiday season represented the largest amount of revenue from sales?" Market research from comScore suggests that Cyber Monday was in fact the ninth busiest online shopping day last year, with $485 million in transactions. That paled in comparison to the real peak two weeks later: Dec. 12, 2005 saw $556 million spent online.

Just to confuse matters further, a company called Coremetrics says that the zenith of the online shopping season occurs not two weeks after Cyber Monday but one week after, or December 4 on this year's calendar. A Nov. 6 press release from Coremetrics seeks to debunk the "marketing myth" of Cyber Monday and introduces yet another neologism for what the company believes will be the busiest online shopping day: "eDay." In this battle of marketing coinages, "eDay" has certain advantages: the snappy "e-" prefix is a bit more au courant than "cyber-", William Gibson fans notwithstanding. (Really, when was the last time you heard anyone refer to "cyberspace" unironically? It sounds so Matrix-y and Y2K-ish.)  Plus, "eDay" has triumphal resonances with "V-Day" and "D-Day."

But I wouldn't count on "eDay" gaining the neologistic upper hand over "Cyber Monday." Media commentators have firmly latched on to the "Cyber Monday" concept, even as they acknowledge that it isn't really the busiest online shopping day of the season. Perhaps writing about Cyber Monday helps fill the post-Thanksgiving lull in the news cycle, and it's an easy followup to the boilerplate "Black Friday" shopping stories. I would also expect online retailers to continue transforming Cyber Monday into a legitimate shopping event by offering all sorts of sales and promotions for the Monday after Thanksgiving. It could take another year or two, but the self-fulfilling marketing prophecy of Cyber Monday might still come to pass.

Posted by Benjamin Zimmer at 11:15 AM

Dialect representation, resented

A couple of days ago, I commented on the use of "unusual spelling intended to represent dialectal or colloquial idiosyncrasies of speech" (from the OED's definition of "eye dialect"), noting that this is likely to be understood as expressing contempt.   A case in point, from a letter in the NYT Book Review, 5/8/05, from Butch Trucks (of the Allman Brothers Band), about a Rolling Stone story about the band written by Grover Lewis:

In Lewis's article, all the dialogue among members of our group seemed to be taken directly from Faulkner.  We are from the South.  We did and still do have Southern accents.  We are not stupid.  The people in the article were creations of Grover Lewis.  They did not exist in reality.

(The letter went to the Times because a review, by Roy Blount Jr., of Splendor in the Short Grass: The Grover Lewis Reader, which reprints the RS story about the band on tour, had appeared in the Book Review on 4/3/05.)

The reference to Faulkner is surprising.  If you go back and look at your Faulkner, you'll see that he is sparing in his use of special spellings of all types, including those representing ordinary casual speech (goin' for going, wanna for want to) and those representing dialect features (ma for my, brotha for brother).   I suspect that he NEVER uses unusual spellings for perfectly ordinary pronunciations (enuff for enough and the like).  He does indicate non-standard and dialectal features of morphology, syntax, and the lexicon, though, as in this dialogue from the black maid Dilsey early on in The Sound and the Fury:

Aint you got no better sense than that.  What you want to listen to Roskus for, anyway.

That's quite enough to let us "hear" the characters in our head and supply some version of the phonetics.  For the classier white characters, like the Compsons, we're pretty much on our own, though we can be sure that their speech had regional features.

As for Grover Lewis, I'll have to get hold of the book to see just how he represented the speech of the Allman Brothers and their crew.   What Trucks tells us in his letter isn't about pronunciation specifically:

We had a road manager that was a graduate of Georgia Tech and before coming with us had been a bank auditor. He was an educated and sophisticated man. Mr. Lewis quotes him as calling the desert as we flew over Arizona "a right smart of sand". I worked with this man for many years and never did I hear him use a phrase that even resembled this.

Lewis, by the way, was a Texan, complete with cowboy boots and a Texas accent.

Noting a previous Language Log post on extended senses of the neologism gaydar ("Lexical drift (1)", 3/27/2006), Pekka Karjalainen writes:

Just a while ago I came across the word sarcasmdar somewhere on the net. It gave 75 Google hits at the time of the writing, which indicate its meaning is something along the lines of a device for detecting sarcasm. Most of the time people are using it when claiming theirs or someone elses is broken.

Perhaps there is even a snowclone in the making - my X-dar is broken. The relatively low number of ghits might indicate it is quite new, or maybe just unpopular. I couldn't think of any other -dars to look for yet. Oh wait, of course:

grammardar : 5 hits (duplicates)
"My grammardar just imploded"

Pekka's morphemedar is clearly in working order. There are plenty of other instances: jewdar, blackdar, sexdar, fishdar, etc. My guess is that there has been a low-frequency process of spontaneous neologism-formation going on here for some time. The fact that radar -- though originally coined as an acronym for "radio detection and ranging" -- can be re-analysed as ra(dio)+dar means that the new morpheme -dar probably sprung into fitful existence soon after radar came into general use. A few of the -dar neologisms -- notably gaydar so far -- have caught on and spread, which presumably somewhat increases the productivity of the background process.

[Barbara Zimmer writes:

I found a few references to humordar, including this one:
It be strongly advised that ye turn on your 'humordar' t'distinguish between reality an' fiction or fantasy. Me opinions are simply that-me opinions


Posted by Mark Liberman at 07:33 AM

November 24, 2006

Mapuche is ours, not yours

Back in 2004, prompted by Bill Poser's report of a lawsuit in which a relative of the person who coined the term googol was suing Google over a property claim on the word Google, I satirically claimed personal ownership of the nouns crump, ether, parsley, helicopter, oligarchy, and rhodium, the preposition of, and all derivatives of the verb snuggle. I took it to be self-evidently hilarious that anyone could claim ownership of some ordinary non-trademarked dictionary word, especially on grounds of a family connection (and never mind the fact that Google and googol are not the same word). Now the Mapuches seek to claim ownership of their entire language, on the basis of a tribal connection, and they regard Microsoft's localization of its software by translating messages into Mapuche as theft of the Mapuche people's stuff. It really is very hard for a satirist to keep out ahead of real life, isn't it?

A couple of correspondents have suggested to me and Mark that the press reports are crazier than reality; they claim Spanish-language accounts of what is going on reveal that the Mapuche people are objecting to pre-emption of decisions about which alphabetic writing system to adopt for their language. Quite a few have been on offer, and the Microsoft decision to go for one of them, known as Azmfuche, was taken without general agreement by the Mapuche but will now be definitive. Published reports of the situation include this one and this one, and one page that does discuss the orthographic issue is here. But my general impression is that even the Spanish sources reveal plenty of fundamentally misguided political ranting on the part of the Mapuches. A language is not something that could be or should be controlled by a people or its political leadership, and making software available in a certain writing system or language is not a threat to, or a theft of, cultural patrimony. Not even if it does encourage a tendency toward standardization in some particular direction. So far I still see this story as having a tinge of the ridiculous about it. For some serious and informed discussion of the issue that takes a more sympathetic view, see this post by Jane Simpson.

Thanks to Henry Heller and Luis Casillas for informed correspondence about the Spanish sources.

Posted by Geoffrey K. Pullum at 06:30 PM

Language as property?

It's hard to know what is really going on in Chile, where Reuters tells us that "Mapuche tribal leaders have accused [Microsoft] of violating their cultural and collective heritage by translating the software into Mapuzugun without their permission" ("Chilean Mapuches in language row with Microsoft", 11/23/2006) . In particular, it's not clear from the article what the basis for the suit is, or what relief is being sought. The theory may be that a language is a piece of property belonging to (some representative body of) the people who speak it. If this idea were really to be accepted into the system governing the usual laws of property, I suspect that the consequences would surprise and displease many of those who start out supporting it . For some discussion, see "The Algonquian morpheme auction" (3/3/2004).

I haven't seen much about these issues within the "free culture" movement -- but there are some links here, for example this (more details here). Here's a question: if the use of a language has to be licensed by the tribal elders, can they withhold this permission from someone who wants to criticize them, or to say something else that they don't approve of?

Posted by Mark Liberman at 03:14 PM


In response to our discussion of Lawrence Henry's discourse on accents ("A linguist's Thanksgiving" and "Why Americans can't learn foreign languages", 11/22/2006), Martyn Cornell writes:

... and what is this "slurry" accent Mr Henry claims some Londoners have? Does he mean Estuary?

I wondered about that myself. What Mr. Henry wrote (in "To Accent or No", The American Spectator, 11/22/2006) was this:

To the Pygmalion audience, a glottal "t" indicated a yob. Today's Brits have adopted it as part of a kind of commercial London speech known as "slurry."

The context and the quotation marks show that he thinks of slurry as a name for a type of speech, not a description. I've never heard any such term for a London-area dialect -- or any other variety of English -- though I certainly have heard of "Estuary English", and have even blogged about it ("Estuary English", 8/28/2004). The UCL Department of Phonetics and Linguistics has a web site devoted to Estuary English, which they define as "a name given to the form(s) of English widely spoken in and around London and, more generally, in the southeast of England — along the river Thames and its estuary".

Overall, what Mr. Henry has to say about "slurry" fits "Estuary English" pretty well, so I suspect that we're looking at a slip of the ear: Mr. Henry heard someone talk about "Estuary", and heard (or remembered) it as "slurry". I'm no kind of expert in the modern sociolinguistics of the British Isles, so I admit that I might be missing something here (though Google seems to be missing it too, as far as I can tell.) If anyone can provide evidence that slurry has wider use as a term for London-area speech, please let me know.

The reason for Henry to bring up "slurry" in the first place was his unhappiness with the speech of a TV sports personality:

MY BUGABOO, HOWEVER, IS THE GLOTTAL "T." Around here, you hear it especially in the phrase "at home," which becomes "a' home." A certain class of English speaker, heard especially on the BBC, employs glottal "t's" in a self-conscious way, as a cultural signal of knowingness or savvy or in-crowdism. Listen to a BBC reporter. He will not always use the glottal "t," but will suddenly begin to employ it the more insinuating becomes his tone.

Newly anointed CBS golf anchor Nick Faldo uses more glottals the more clever he becomes, a shame, because he is in fact clever, but the glottals render him almost incomprehensible to an American audience. You're a broadcaster now, Nick. Time for some speech lessons.

To the Pygmalion audience, a glottal "t" indicated a yob. Today's Brits have adopted it as part of a kind of commercial London speech known as "slurry."

Henry's sociolinguistic observations about "a certain class of English speaker" may well be correct -- in an earlier post, I quoted Kate Joester's opinion that

I think there's probably another dimension in prejudice against Estuary English in particular. It's associated with "youth culture" and with being a fake accent acquired by speakers who are "really" something else in order to be youthful and cool.

However, Martyn Cornell goes on to argue that Henry has misconstrued Nick Faldo's accent:

Nick Faldo, who comes in for a kicking from Mr Henry over his alleged glottal stops, grew up in Welwyn Garden City, Hertfordshire, about 10 miles from where I grew up and where I once worked as a reporter on the local paper - I interviewed Nick just after his first victory in a major, and to me, naturally, he has a perfectly fine lower middle class Northern Home Counties accent not that different from my own.

There's something interestingly typical going on here. Henry's article displays an intense interest in matters of pronunciation and an obviously acute faculty of observation; but it also displays an almost complete ignorance of the concepts, skills and background knowledge that are relevant to the kinds of linguistic description that interest him. This combination of intense interest and spectacular ignorance is, I think, unique to the area of speech and language. You don't find birders obsessively compiling a life list that includes insects and bats under the misapprehension that these are also members of the taxonomic class Aves. You don't find photography enthusiasts who think that f-number is a measure of film speed, or that nikon is a noble gas used to prevent condensation inside lens assemblies.

This point has come up before, of course -- and as always, I blame the linguists.

If you're familiar with its history, you might argue that The American Spectator is a special case. (If you're not familiar with its history, read the Wikipedia entry or Byron York's Atlantic article from November, 2001.) But I don't think so. There's no particular political connection here.

[Update -- Stephen Jones wrote that "Where the guy got the word [slurry] from is beyond me. Slurry and Estuary don't even sound alike". I agree -- though some pronunciations of "estuary" do have all but one of the phonetic segments in "slurry" -- but what else could it be? The Cupertino effect seems even less likely than a mis-hearing, unless there is a possible typo that hasn't occurred to me.]

[John Cowan has another idea:

I shouldn't be surprised if the confusion is semantic. Googling for "slurry estuary" (no quotes) shows that what's at the bottom of most estuaries (probably including the Thames) is in fact a slurry.


[Update 11/27/2006 -- Paul Farrington-Douglas has some useful information to offer:

I hadn't encountered the term, but suspected an eye-pun (as 'twere) on the slurring of sounds, plus a play on the assonance of slurry (the manure-derived fertilizer) and the county Surrey. Searching for Slurrey as an alternative spelling consistent with this hypothesis initially threw up lots of derogative references to Surrey, BC, but some narrowing of the search terms threw up some more appropriate references. Inevitably, there were several other references to the English Surrey as Slurrey, such as one to 'Slutton in Surrey' (a reference to the Surrey town of Sutton). On a talkboard I saw something suggesting a cultural reference, too, which may be indirectly relevant:

everyone knows about the slurrey sluts. ... that was the mid-19th century, when Surrey had different prostitution laws than London proper. Nowadays it's a pretty nice place.

There's also a couple called the Slurreys on the British TV comedy 'Stella Street'; it wouldn't take much to find out if they have an appropriate accent, but I can't play YouTube videos on this computer so I can't check myself. Most illuminating of all, though, was this note on the schedule for a pub crawl on

Green Dragon (near the top of Surrey Street market, opposite something called "The Ship"), Croydon, South London/Surrey (or "Slurrey").

Though I can't claim to have found much to suggest the term is exactly dominant, there's enough to say that it was probably more than merely a mishearing of 'Estuary' (a somewhat unlikely explanation anyway). The link to Surrey is more likely still if 'he has a perfectly fine lower middle class Northern Home Counties accent', of course -- this hypothesis would suggest the 'Slurrey accent' is not Estuary English at all.

When it comes to Martyn Cornell's objection to Lawrence Henry's analysis, it's worth bearing in mind that snobbishness is where you find it -- a 'lower middle class' accent is quite enough to qualify for derision among many, especially in the context of Middle England stereotypes: this is precisely the demographic stereotyped, for example, in the Dursley family in Harry Potter.

This is helpful background. But I don't think it's likely that Mr. Henry meant slurry to describe a traditional lower-middle-class accent different from Estuary English, since he describes it as "a kind of commercial London speech" that "today's Brits have adopted", as an artificial "cultural signal of knowingness or savvy or in-crowdism".

And John Wells, who ought to know if anyone does, wrote that he has "never hear of an accent called 'slurry'".

My conclusion? Mr. Henry might be referring to a clear and well-thought-through concept, precisely named by a term that the rest of us haven't learned yet; and then again, he might just be venting a few incoherent and ill-informed prejudices, and "slurry" might just be a careless blend of half-remembered words like estuary, Surrey and slang. We report, you decide.]

Posted by Mark Liberman at 12:10 PM

Non-standard pronunciations ate my IQ

The Zippy Thanksgiving cartoon:

Not only do non-standard pronunciations betray stupidity, the effect of using them is cumulative.  You've been warned!

The IQ Meter is also an entertaining conceit.

Posted by Arnold Zwicky at 10:17 AM

You should head on over to TstT, to read the Tensor's comments on Timothy Takemoto's suggestion that Japanese would make a good international language. Takemoto's arguments are really much funnier than anything that pundits in the Anglosphere have come up with recently.

My only comment: it's an interesting (and I think healthy) sign that new and more outward-looking forms of nonsense are replacing the traditional discourse of linguistic Nihonjinron (see "The Japanese are Japanese because they speak Japanese", 4/6/2005).

Posted by Mark Liberman at 08:49 AM

From a Sports Illustrated blog, by Stewart Mandell:

N.C. State fans don't take well to losing to the hated Tar Heels, nevertheless 23-9 to a 1-9 UNC team playing for a lame-duck coach.

This is nevertheless used as a negative-form additive connector, like not to mention, never mind, or to say nothing of.  John Schaefer, who found the example, speculates (reasonably enough) that it's a feature of speech that has found its way onto the net (though it's not a use of nevertheless I recall hearing before), and he wonders how you'd do a Google search to find more occurrences.  Good question.

[Addendum 11/24: Doug Wilson on ADS-L and Marilyn Martin by e-mail suggest that the target was probably "much less", a connective that follows negative clauses -- and ends in "less".  Bingo.]

So, first, a query: has anyone noticed similar examples?  Mail me if you have actual cites.

[Addendum 11/24: John Schaefer's found another one: "Miami can continue to have both and help kids that may never have seen the campus of college, nevertheless a private school." (from "dg", 7/21/06, here).]

Second, some thoughts on searching (on Google or elsewhere) for what is probably a very infrequent use of a very frequent word.  One way that might get the task down to something manageable would be to do it in two steps: first, determine what material follows not to mention and to say nothing of with some frequency (never mind might not work, because it so often stands alone); and then, search for nevertheless followed by this material. 

This is the sort of strategy that the Stanford ALL Project used recently in searching for instances of quotative all ("And she was all 'Were you in the church?'") in a gigantic database of postings on newsgroups that Google has accumulated: we refined the search by first determining the most frequent words immediately following an initial quotation mark, then used the top 40 words as part of a regular expression searching for a personal pronoun plus contracted copula, followed by all, all like, like, say, or go.  (The results of this research were reported on at the recent NWAV conference at Ohio State.)  Even so, the project took a lot of time and depended on considerable cooperation from colleagues at Google and on research assistants supported by Stanford.  Not a quick and easy task.

A friend of mine's pet bear

Ben Zimmer, following up on the first of my recent postings on possessives in English, writes about the phrase in the header, which comes from the 11/22/06 "Mark Trail" cartoon, as critiqued on the Comics Curmudgeon site.  A poster on that site refers to "the semantic nightmare of a sentence coming out Mark's mouth in panel three": "You stole a friend of mine's pet bear!"

Actually, I can't see anything to object to in this sentence, unless you object in general to possessives of NPs (like a friend of mine) that don't end in their head nouns, though these have been around for centuries and are not hard to find in real life.

To see how we get to a friend of mine's pet bear I'll have to describe with some care how the determinative possessives of English NPs work.  The details are important.

There are two special cases and then one very big generalization.

Special case 1: (definite) personal pronouns.  If the NP is one of six (definite) "personal pronouns" (note: this is a technical term) -- 1sg (nominative I), 2 (nominative you), 3sg fem (nominative she), 3sg masc (nominative he), 1pl (nominative we), 3pl (nominative they) -- its determinative possessive form is suppletive, not a simple concatenation of morphemes: respectively, my, your, her, his, our, their.

There are several other items usually labeled as possessives which go by the big generalization, rather than having their possessives stipulated, for instance:

generic one: One should never count one's chickens before they're hatched.

anaphoric indefinite one: The big cat's tail is shorter than the small one's.

compound indefinite pronouns: We're collecting everybody's opinions.

Special case 2: NPs without a possessive.

There are several classes of NPs that simply lack a possessive.  (Geoff Pullum and I talked about a number of these in a 1996 LSA paper, the handout for which is available on my website.)  Some are single words:

expletive there: *There's being no food in the refrigerator upsets me.

headless modifiers: demonstratives: This cat's tail is short. *That's is long.

headless modifiers: quantifiers: We interviewed many subjects.  *We took down each's opinion.

headless modifiers: independent possessives: Your cat's tail is short. *Mine's is long.

(Notice that in the last case we end up with unacceptable mine's.)

and some are longer phrases --

"nominal gerunds" (possessive + gerund): Your walking me home really pleases me. *Your walking me home's really pleasing me is a surprise to everyone.

infinitival clauses: For you to walk me home really pleases me. *For you to walk me home's really pleasing me is a surprise to everyone.

(These lists are merely illustrative, not exhaustive.)

The big generalization: Z
.  Otherwise, the possessive form of a NP x has a Z suffix on the last word w of x

I've said this with some care.  In particular, I did NOT say that the possessive form of x uses the possessive form of its last word w, since that wouldn't provide possessives for NPs that end in words that are not nouns (like the friend I was telling you about or everyone I know), since such words of course do not have possessive forms.

And now we make a prediction about the determinative possessive corresponding to the pet bear of a friend of mine (using an alternative expression of possession which has the preposition of): it should just follow the big generalization: a friend of mine's pet bear, with Z suffixed to the last word, mine, of the possessor phrase a friend of mine.  That's where we started.

Just to wrap up this description, here's how Z is realized:

Realization of Z.  The possessive Z is suppressed if w itself ends in a Z suffix (the birds' wings).  Otherwise, possessive Z has the same phonology as plural Z and 3sg present Z:

the basic variant is z (bird's, Chicago's);

for a word ending in a sibilant, epenthesize schwa between it and the z (Max's, judge's);

otherwise, for a word ending in a voiceless consonant, devoice the z (cat's, Rick's).

The careful reader will have noticed that I haven't said anything about the personal pronoun it.  I'm saving that for a future posting.

Posted by Arnold Zwicky at 09:11 PM

Eye dialect

In posting about Lawrence Henry's American Spectator column on accents, Mark Liberman refers in passing to Henry's "eye dialect", is challenged on this by Daniel Ezra Johnson, and defends his use of the term by saying:

Well, the OED glosses "eye dialect" as "unusual spelling intended to represent dialectal or colloquial idiosyncrasies of speech", which seems close enough in this case.

The problem here is that there are two distinct but related concepts, and we have only one widely used term to label them.

One concept is the OED's: a representation of dialect (or colloquial) pronunciations via unusual spellings.  It would certainly be useful to have a term for this, and "eye dialect" is a nearly transparent candidate for the purpose.

But there's another tradition, in which the term is used for unusual spellings for perfectly ordinary pronunciations, functioning to suggest that the speaker is uneducated or crude -- the sort of person who would spell the words that way.  AHD4's definition links the two (but gives examples only of the second):

The use of nonstandard spellings, such as enuff for enough or wuz for was, to indicate that the speaker is uneducated or using colloquial, dialectal, or nonstandard speech.

Using eye dialect (in the first sense) is a tricky business; no matter what the writer's intent (which might be just to provide local color), it's likely to be understood as expressing contempt, and in any case readers often find it tiresome.  Writers would be well advised to use it sparingly.

Using eye dialect (in the second sense) is pretty much by definition a put-down.

I've always used "eye dialect" in the second sense, so I'd suggest "dialect spelling" for the first sense.  But then who's going to listen to ME?

zwicky at-sign csli period stanford period edu

Obligatory adjectives and optional articles?

The sharp-eyed Language Log reader will have noticed that Arnold Zwicky's latest post begins with the phrase the sharp-eyed Éamonn McManus. Now, it is well known that proper names of people usually don't take definite articles, allowing for some quite rare exceptions (the Donald for Donald Trump; the Bill Clinton of 1992 for a temporal stage of Bill Clinton's life history; etc.). Arnold certainly could not have begun his post by saying *The Éamonn McManus noticed a gap in the list. Yet sharp-eyed appears to be just an ordinary adjective in attributive modifier function, as in simple Simon, poor Aunt Beth, lucky Pierre, good old John, fearless Evel Knievel, sweet Georgia Brown, Calvin Trillin's locution the wily and parsimonious Victor S. Navasky, and so on; and these are always optional: drop an attributive adjective and what's left is always a grammatical noun phrase. Yet dropping the adjective from the sharp-eyed Éamonn McManus does not leave behind a grammatical noun phrase. It produces something utterly unacceptable. So are attributive adjectives optional or not? How do we give an accurate description of what's going on here?

Don't stand there looking at me. I don't know. Syntax is hard, and people who think everything about English syntax is known already have no idea of the actual ignorance-riddled state of the art.

It's worse than I said, actually. Dropping the definite article from a singular noun phrase is normally impossible unless the noun can be construed as denoting some uncountable substance or stuff: in normal conversational English (I ignore newspaper headlines) we cannot drop the definite articles in something like Yesterday the vice president flew to Iraq to get *Yesterday vice president flew to Iraq. I couldn't have begun this post by saying *Sharp-eyed Language Log reader will have noticed... Yet in the sharp-eyed Éamonn McManus we can drop the definite article: Arnold could have begun by saying Sharp-eyed Éamonn McManus noticed a gap in the list. So is the definite article obligatory with singular non-mass nouns or not?

I repeat: don't look at me. I don't know. I have only a few short decades of experience with this extremely difficult subject.

Thanks to Paul Postal for pointing out I needed to distinguish proper names of people from other proper names. Many proper names (like "the Mississippi") not only permit the definite article, they require it.

Let's meet at mine

The sharp-eyed Éamonn McManus noticed a gap in the list of independent possessive constructions in my recent "Overpossessive" posting: I illustrated the anaphoric zero, predicative, and double genitive constructions with both pronominal and non-pronominal possessives (mine, Sandy's), but gave only a non-pronominal illustration for the locative construction: Let's meet at Sandy's.  This was not an oversight -- I find Let's meet at mine unacceptable in a context where there's no antecedent for the missing head, and most other English speakers make the same judgment -- but now McManus provides some attestations of pronominal locative possessives (from the U.K. and Ireland), which suggests that some speakers are beginning to simplify their grammars by eliminating an odd constraint on one specific construction.

Which reminded me of Baker's Paradox:  although learners generalize ("project") from the language they hear, producing many utterances that are not directly modeled for them, in some cases they resist obvious generalizations and seem to conclude that things they haven't heard just aren't grammatical; they learn lexical exceptions and very specific constraints. 

The paradox gets its name from C. L. Baker, author of the 1979 Linguistic Inquiry paper "Syntactic theory and the projection problem", in which the issue (for lexical exceptions) was clearly presented.  (Baker was the first student to write a Ph.D. dissertation under my direction, back in the  Pleistocene Epoch, so it pleases me to refer to his work here.)  More recently, Peter Culicover's book Syntactic Nuts (1999) examined a series of puzzling constructions in English, using (apparently) arbitrary differences in syntactic behavior between lexical items to conclude that learning must be, among other things, "conservative".  And now, for her dissertation, Stanford student Liz Coppock is looking at the cases from this literature, plus some others, so Baker's Paradox has been very much on my mind.

The question is how people learn things like the following:

You can give $100 to the library, give the library $100, or donate $100 to the library, but not *donate the library $100.

You can be the likely winner, be likely to win, or be the probable winner, but not *be probable to win.

You can be happy, be a happy person, or be glad, but not *be a glad person.

(Searching on the net will get you small numbers of examples like the asterisked ones above.  While most people are conservative learners, a few are more adventurous.)

On to syntactic constructions, a world in which construction-specific (though systematic) constraints are rife.  A couple of well-known examples from English:

In main-clause wh-interrogatives, prepositions can be stranded or (in a rather formal style) fronted, but in wh-interrogative complement clauses, fronted prepositions are much less acceptable:

Which city did they fly from?  [main, stranded]
From which city did they fly?  [main, fronted]
I wonder which city they flew from.  [embedded, stranded]
?? I wonder from which city they flew. [embedded, fronted]

In two serial-verb-like constructions -- which I'll call GoV and TryAndV -- for most speakers the verbs must obey the Inflection Condition of Pullum 1990 ("Constraints on intransitive quasi-serial verb constructions in modern colloquial English", in OSU WPL 39.218-39), which requires that they be in a form identical to their base form (either the base form itself, or the non-3sg present); in other similar constructions, in particular GoAndV, there is no constraint:

I'll go and see what I can do. [GoAndV, base]
  I'll go see what I can do. [GoV, base]
  I'll try and see what I can do. [TryAndV, base]
I always go and see what I can do. [GoAndV, 1sg pres]
  I always go see what I can do. [GoV, 1sg pres]
  I always try and see what I can do. [TryAndV, 1sg pres]
He always goes and sees what he can do. [GoAndV, 3sg pres]
  *He always goes see(s) what he can do. [GoV, 3sg pres]
  *He always tries and see(s) what he can do. [TryAndV, 3sg pres]
I went and saw what I could do. [GoAndV, 1sg past]
  *I went see/saw what I could do. [GoV, 1sg past]
  *I tried and see/saw what I could do. [TryAndV, 1sg past]

In all such cases, the puzzle is why so few people generalize, why so few eliminate the wrinkles in the grammar.  Locative possessives provide yet another instance of the puzzle: the other three types of independent possessives are unconstrained, but the locative construction maintains its constraint against personal pronouns.

Until recently, that is.  In McManus's words:

Up until recently I would have assumed that nobody would ever say Let's meet at mine/yours/hers/ours/theirs but in fact it seems to be current usage in England and spreading to Ireland. I'm pretty sure nobody ever used that construction when I was growing up in Dublin, but my slightly-younger brother now uses it all the time. Google finds a few hits for Let's meet at mine, including [here], where it's the title of the page and the product it's selling. There are very few hits, though (none at all for ours, theirs, hers and only one possibly non-native one for yours), for what should be a very common phrase in chat forums so it may be that it's not yet all that widespread. But I'd bet on it spreading further because it's handy, immediately comprehensible, and logical.

Getting rid of the let's pulls in a modest number of examples, most of which seem to be from British sources, for example:

Looking forward to seeing the others I agreed to meet at theirs at 8pm.  (link)

(this from a blog full of British English features).  Searches varying other parts of the search string will no doubt yield many more examples.  The independent possessive pronouns are on the march!

[Addendum: two others have written to confirm that this usage is widespread in speech in the U.K. and also in Australia and New Zealand.  Both were under the impression that it spread fairly recently.]

Posted by Arnold Zwicky at 02:47 PM

Why Americans can't learn foreign languages

What struck me most of all about Lawrence Henry's piece on accents was something Mark didn't even mention. Mr Henry notes that in American English a totally unstressed vowel is reduced to a sound usually written down as "uh" (the sound linguists call schwa); and he goes on:

It's a rampant American fault and accounts for our relatively poor performance learning foreign languages. "Effect" becomes "uh-FECT." Cassette becomes "kuh-SET."

An accurate enough phonetic observation: the first syllable in these words is pronounced with a schwa, whereas many other languages have no schwas at all, in any words. My horse laugh at the quoted remark comes not from this phonetic fact but from the astoundingly dopey idea that it is a "fault" that provides the key to the riddle of why Americans don't do so well at learning foreign languages.

Steve Jones points out me, for example, that western varieties of Catalan do not have schwa, but in Central Catalan (of Barcelona) there is reduction that makes schwa the most frequent vowel in actual speech; yet this doesn't correlate with any perceptible difference in language-learning ability Catalan speakers from different regions of eastern Spain. Henry's remark about how vowel reduction to schwa "accounts for our relatively poor performance" really is astoundingly dumb.

Why we Americans, with our staggering wealth of resources and (for example) the most highly ranked graduate schools in the world, do so poorly by any measure on our command of foreign tongues is a complex question with a mainly sociological, political, historical, educational, and social-psychological answer. (Never forget that John Kerry is said to have had to attempt concealment of his fluent French to avoid bad press during his Presidential run, and Nebraska in the early 1920s had a law making foreign language instruction illegal, and in that very same state as recently as 2003 a father was threatened by a judge with loss of the right to visit his child if he didn't speak English during his visits... This country could not exactly be said to be uniformly friendly toward polyglotism. Nor does it always honor the accomplishment of those immigrants and Native Americans who speak a heritage language at home and English elsewhere — in fact punishment of Native American children for speaking their Amerindian language while in school used to be commonplace.) It's certainly quite a bit more complex than anything traceable to the reduction of unstressed vowels to schwa. Don't give up on taking foreign language lessons simply on the grounds that as an American you are doomed to failure by your learned vowel reduction habits.

A linguist's thanksgiving

Over the three Thanksgivings that Language Log has been in existence, we've marked the holiday by noting the layered semiotics of the Macy's parade ("Same-sex Mrs. Santa: 'the semantics are confusing'", 11/27/2003), Thomas Jefferson's wisdom in refusing to proclaim a national thanksgiving day devoted to "fasting & prayer" ("Thanks giving", 11/25/2004), and the singularity of the American polity ("Life in these, uh, this United States", 11/24/2005).

This morning, as I counted my blessings, public and private, I thought about how many of them are transformed curses, and gave special thanks for all that blogging has done for me in this respect. For me as an individual linguist, it can only be frustrating and depressing to observe the conjunction of intense public interest and unprecedented public ignorance with respect to matters of speech and language. But as a writer for Language Log, I can join H. L. Mencken in viewing this as a "daily panorama ... of private and communal folly inordinately gross and preposterous, so perfectly brought up to the highest conceivable amperage, so steadily enriched with an almost fabulous daring and originality, that only the man who was born with a petrified diaphragm can fail to laugh himself to sleep every night, and to awake every morning with all the eager, unflagging expectation of a Sunday-school superintendent touring the Paris peep-shows".

In other words, it's not just another sad example of our educational system failing to provide an intellectual with the tools needed for the job -- no, it's a topic for a Language Log post!

Today's example is provided by Lawrence Henry ("To Accent or No", The American Spectator, 11/22/2006). Christopher S. Mackay brought this article to Geoff Pullum's attention, and Geoff mentioned it yesterday in the break room at Language Log Plaza, observing that it's "a feast of layperson's efforts to talk about phonetics without having the phonetics", and that it "it comes out with some very strange claims about accents and languages and sociolinguistics". One of our younger staffers, who has not yet entirely mastered Mencken's technique, remarked that "crap like that just makes my head hurt". But I agree with Pullum: it's a virtual Thanksgiving feast.

The first dish is Mr. Henry's version of the common opinion that an "accent" is what everyone else has:

Cursed with acute hearing, I have bequeathed my boy Bud unaccented speech. Bud talks…well, like Brian Williams. How did I do that? By making fun of local locutions and teaching Bud to hear.

If you look up accent in the dictionary, ignoring the stuff about stresses and diacritics, you'll find glosses like "a characteristic pronunciation, especially one determined by the regional or social background of the speaker"; "a way of speaking typical of a particular group of people and especially of the natives or residents of a region"; "a way of pronouncing words that indicates the place of origin or social background of the speaker"; "the mode of utterance peculiar to an individual, locality, or nation".

In that sense, Brian Williams has an accent, just like Tom and Ray Magliozzi do. At least, that would certainly be the opinion of a resident of London, Melbourne or Cape Town. But Mr. Henry feels that Eastern Massachusetts pronunciation is a deviation from a neutral norm:

This has cost Bud in the court of peer opinion. His confreres at school seem to regard him as a snob for correct speech.

In fact, I bet they say that poor Bud has a "snooty accent". Or maybe they use some other adjective -- but I bet they don't say "ain't it odd how Bud has no accent?" Bud's dad continues:

Massachusetts is like that. If we lived in Texas, would I have equally mocked the local tendency to say "awl" for "oil"? Something in New England speech grates me wrong, and has made me a stickler for diction.

What kind of speech grates Mr. Henry right? Well, he tells us in his last paragraph:

I would rather my boys talked like Bobby Jones than Archie Bunker. If I could choose an accent for my own, which I no longer can, I would talk like golf announcer and former Amateur champ Steve Melnyk, like Jones, a Georgian. But I strongly suspect that, like me, over time, my boys will end up talking without any real accent at all. My son Bud has noticed that his classmates' accents are less pronounced than their parents'. Absent some temporary fad, like slurry or Valley Girl, that is the established trend. I am really not sure if that is to be mourned or rejoiced.

In fact, there's some controversy about what the "established trend" is. Perhaps some social strata are becoming more homogenized -- the youth of (say) Andover MA and Alpharetta GA may be more similar in their speech than their parents are, I guess -- but in other cases, there's evidence that some regional and social dialects in America are diverging. In any case, even if all Americans ended up speaking in exactly the same way, this would not be "speaking without any real accent at all", no matter how plain and flat the participants in this unlikely confluence felt the results to be. It would still be the characteristic pronunciation of a particular class, place and time, even if the class, place and time were "all native speakers of American English", "all of the United States", and "the middle of the 21st century".

The second dish in this feast is Mr. Henry's presentation of the Law of Least Effort, prepared in a delicately-flavored reduction of the notion that standard speech is also the most highly optimized, and garnished with sprigs of eye-dialect:

Many of the characteristics of regional accents are very labor-intensive. Speech usually elides toward the easy. It is much easier to say "and" than the tortured New England "ee-und," much easier to say "ahn" than "oh-wahn" ("on"). Why do these pronunciations persist?

Now, we know that Mr. Henry knows that eastern New England speech is r-less, because he mentions it in the context of an interesting discussion of dialect ideology:

I overheard a girl from Charlestown, who was taking a speech class, say that she had a hard time saying the terminal "r" in "brother" or "sister," instead of her accustomed "brothuh" or "sistuh." "It sounds unfriendly," she objected.

To my ears, au contraire, Eastern accents sound thuggish, threatening, and aggressive. TV and radio commercial producers use those accents to suggest savvy, but usually in a working class character, like a plumber. My wife finds Southern accents threatening, in a macho sort of way. In commercials, those cultural markers, Southern accents signify much the same thing as the working class Easterner: savvy about something nitty-gritty, like motor oil.

But curiously, it doesn't occur to him to wonder why Brian Williams doesn't drop all those complicated final-r-related lingual contortions, in favor of the New Englanders' simpler and much less labor-intensive schwa. And why do "accentless" Americans insist on all that back-to-front and low-to-high tongue motion in words like "hi" and "bye", instead of the restful, open monophthongs of Sourthern States English?

For dessert, you won't be able to resist at least a taste of Mr. Henry's verbs. His accent may be American standard, but his use of verbs is distinctly innovative. For instance, he'll take a verb that usually comes with a prepositional complement, and use it as a plain transitive. His last sentence, for example -- "I am really not sure if that [trend] is to be mourned or rejoiced -- implies that it's possible to rejoice a trend -- in this case, the alleged trend towards phonetic homogenization -- rather than to rejoice at a trend, or rejoice because of a trend. And as we noted earlier, he says that New England speech "grates me wrong". This seems to be a blend of "grates on me" and "rubs me wrong", but whatever the source, it creates a distinctly non-standard relationship between the grater and the writer.

And now, it's time to turn from these linguistic delicacies to preparations for the physical feast.

[Daniel Ezra Johnson writes:

i know you've moved on to gustatory pursuits today, but i thought i'd note that the 'phonetic' spellings in henry's piece were a) surprisingly on-the-money, as i hear eastern massachusetts speech, and b) not at all fairly called 'eye-dialect', as i understand that term.

Well, the OED glosses "eye dialect" as "unusual spelling intended to represent dialectal or colloquial idiosyncrasies of speech", which seems close enough in this case. And I agree that Mr. Henry does a creditable job of representing pronunciations, whatever you call the method he uses.]

[From Peter Howard:

Your recent post reminded me of a conversation I had with an audience member after a Joy of Six poetry performance in New York. At the time, one of our number was a San Francisco native; the rest of us were from various parts of England. I was asked, "Is Wayne an American?" and I confirmed that he was. "I thought he must be." came the reply. "He's the only one of you who doesn't have an accent."


[And another amusing anecdote from Jay Cummings:

This article reminds me of the time I was in Brookhaven, NY, along with 3 of my colleagues. Two were a German Jew and a middle class Englishman, both of whom had lived long in the US, but strongly maintained (to my ears at least) their native accents. The other was a Texan, similarly unchanged in accent despite having lived in southern California for many years. And then there was me, a Minnesotan descendant of Swedes, Norwegians, Germans and English.

We were at a restaurant in town that featured a number of Greek dishes on the menu, and an obviously native Long Island staff. The waitress came to take our orders, and after I chose my entree, she asked me if I would like a Greek salad or a tourist salad with the meal. I did not know what a tourist salad was, but I didn't really like Greek olives and feta cheese, so I asked for the tourist salad, and she wrote this on her pad without comment. She left for the kitchen.

We looked at each other, and the Texan asked me what a tourist salad was. I replied I didn't know, and none of the rest of us had any idea either. Then a short time later, it dawned on me, and I laughed aloud, Oh, she meant a _tossed_ salad!" We all chuckled a bit, and the waitress returned with our beverages.

To explain our laughter, I mentioned that we had not understood her accent, and had just figured it out. With great amazement she stared at us and said, with perfect justification I think, "Youse gennelmin think _Oi_ have an accint?"


A Lawsuit over a Dictionary Entry

Back in October on a radio show pundit Norman Spector, former chief of staff to Conservative Prime Minister Brian Mulroney referred to Liberal Member of Parliament Belinda Stronach, a former Conservative who crossed the floor a year ago, as a bitch (audio here; text here). This caused a bit of a furor. Spector defended his use of the word by saying that he had used the term correctly in the sense of "a treacherous or malicious woman", for which he relied on the Oxford English Dictionary.

In her column of November 18th, Vancouver Sun columnist Daphne Bramham wrote:

the former adviser and confidante to both a prime minister and a premier sanctimoniously tried to bluster his way out of it, claiming that he was using an arcane definition from the Oxford Dictionary meaning treacherous behaviour. I've not been able to find it in any of the versions of Oxford I've consulted.

Spector is now suing Bramham along with the owner, publisher, and editor-in-chief of the Sun for libel. Whether he will win is unclear for a number of reasons, including the fact that Bramham never explicitly said that Spector made up the definition but only suggested it by innuendo, but the facts at least are on Spector's side. In the online version of the OED after sense 1a "female of the dog" and 1b "female of the fox, wolf and occasionally of other beasts", we read:

2. a. Applied opprobriously to a woman; strictly, a lewd or sensual woman. Not now in decent use; but formerly common in literature. In mod. use, esp. a malicious or treacherous woman; of things: something outstandingly difficult or unpleasant.

Nor do I see that there is anything arcane about this sense. I'd say that it describes pretty well what the word means to me. The sense that I would call arcane is 2c "A primitive form of lamp used in Alaska and Canada", with which I was unfamiliar.

The really funny thing here is that they are fighting about whether Spector's use of the term was semantically correct when you'd think that the issue would be whether it was chivalrous.

Final-vowel thankfulness

Echidne of the Snakes (via Wonkette), a suggests an ethnopoliticolinguistic "reason to be thankful" this Thanksgiving:

In our pride at having Democrats name the first woman as Speaker of the House we have forgotten two interesting and telling facts, Nancy Pelosi is the first person with a name ending in a vowel to be Speaker of the House.
She has also risen higher in power than anyone else with a name ending in a vowel in the history of the country.

Commenters tried to figure out exactly what counts as a vowel to "olvlzl" (who could use an extra vowel or two herself). Her definition of "vowel" isn't strictly orthographic, since she discounts surnames ending in silent "-e" like those of Presidents Coolidge, Fillmore, and Pierce. (Don't even mention Monroe!) And it's not strictly phonological, since "-y" doesn't seem to count when it's pronounced as /i/ (President John Kennedy, Speaker Tom Foley) or as part of a diphthong like /eɪ/ (Speaker Henry Clay, Chief Justice John Jay). But the blogger's point isn't really about vowels per se, but about Pelosi's Italian descent:

While we are looking at the facts of her gender and her party affiliation to explain her utter rejection by the Washington DC Establishment and the Republican media we shouldn't forget this fact could count for a lot of the snooty snark. We shouldn't forget that for people with a heritage from the Mediterranean basin, and elsewhere, she also represents a great leap forward.

This isn't the first time we've seen "person whose name ends in a vowel" used as code for "person of Italian (or southern European) descent." It came up a year ago when Samuel Alito was nominated for the Supreme Court. At the time, Matthew Continetti of the Weekly Standard took it as "a point of ethnic pride" that he had a vowel at the end of his name, just like Scalia and Alito (whose names were being fused into the derogatory nickname "Scalito"). As Eric Bakovic noted on phonoloblog, "Continetti's point seems to be that having a vowel at the end of your (last) name more or less identifies you(r name) as being of Italian (or at least 'ethnic') descent." As with the comment on Pelosi, "final vowels" are really ethnic markers masquerading as (folk-)phonological units. Linguists needn't concern themselves with definitional niceties in such cases... and for that we can be thankful.

[Update #1: Seth Finkelstein points out that "name ending in a vowel" as shorthand for Italianness is an old trope in American discourse on ethnicity, with Google News Archive turning up examples from the mid-'80s relating to such figures as Mario Cuomo and Geraldine Ferraro. Here's the earliest example I've found on the Proquest archive:

New York Times, Apr. 14, 1967, p. 23
Judge Di Lorenzo explained that his organization [sc. the American Italian Anti-Defamation League] was attempting to stop the press and television from using the word "Mafia" in crime stories and to abolish the stereotype criminal in movies: "He is always dark-complexioned, and his last name always ends in a vowel," the judge said.

[Update #2: John Kroll writes:

I can't argue with the online cites that clearly limit "name ends with a vowel" to Italians or at least southern Europeans. But I've used it and heard others use it much more generally.
Although my name doesn't -- my Polish ancestors even tacked on an extra consonant when they came to America -- I've got plenty of ends-with-a-vowel cousins and I've used "ends with a vowel" to distinguish between people of English/Irish/Scotch/German descent and, well, pretty much everyone else -- or, at the least, almost all other European nationalities.
For me and the Poles and Italians I grew up with, "ends with a vowel" distinguished those of us whose ancestors largely arrived in the mass immigration of the late 1800s and early 1900s, by which time the early birds had locked up the power, good jobs and good neighborhoods; and from blacks, the only group we were aware of that was clearly far worse off.
Of late, I think my sense of it has even expanded to include all Asians and Latinos, in a broader sense of being those people who fall somewhere on the spectrum of American tolerance between the Mayflower offspring and the descendants of slaves. I can find at least one online backup for that: "As for Henry Bonilla, who will be introducing himself around Dallas next week, the GOP may be attracted by the fact that his last name ends with a vowel." (link)

Posted by Benjamin Zimmer at 07:27 PM

Bird elevated

Hot news from the Association for Computational Linguistics: Steven Bird (of this parish) has been elected vice-president/president-elect (a pair of positions the association telescopes as "vice-president-elect") of the ACL, effective January 1, 2007.  There will be the usual manic elevation ceremony at Language Log Plaza, date and time to be announced.

Posted by Arnold Zwicky at 01:56 PM

Mixing idioms

Roy Hodson pointed out to me that a recent article by Larry Dignan on "The economics of Microsoft's kill switch" said:

A behind-the-envelope calculation illustrates why it makes sense for Microsoft to risk irking techies with its piracy battle.

Not quite clear whether Larry meant behind the veil, behind the curve, back of the envelope, pushing the envelope, back of the curve, behind the woodshed, back of the veil, pushing the veil, back of the woodshed, pushing the woodshed, or pushing the curve, is it?

A Google check suggests that Larry may be the only person ever to have used the phrase "behind the envelope calculation" in the history of the world. If I had found even one other occurrence, then under the OICTIQ principle I might have considered the possibility that we have a new idiom emerging here; but I think not. I'd say we're simply looking at a one-off mistake due to a confusion between two idioms. A sort of phrasal malapropism. And if you are surprised that a linguist would think a native speaker can make a mistake about the use of his own language, you shouldn't be.

Although, of course, we shouldn't forget that this is the kind of error that can sometimes act as a little seed from which a legitimate linguistic change might one day grow.

[Update: Dave Errington has made the very sensible suggestion that Dignan might have taken "back of the envelope" to relate to the phrase "in back of the envelope", meaning "behind the envelope" (as opposed to "on the back of the envelope"), and thus replaced the former by the latter either as a confusion or because he saw the two as synonymous. And John Cowan suggests that there might even have been an editorial intrusion here — a general substitution of behind for (in) back of, carelessly over-applied to a case that meant "on the back of"; there were a few (misguided) 20th-century usage handbooks that followed the opinionated grouchiness of Ambrose Bierce (Write It Right, 1909) and called (in) back of an illiteratism, for seventy or eighty years. (There is in fact nothing wrong with the phrase, though it is distinctively American rather than British.)]

Mehrabianian matters

KQED radio's "Forum" show continues to offer interviews on language-related subjects.  As reported here yesterday, on Monday it was Kitty Burns Florey on sentence diagramming.  Yesterday it was Anne Karpf talking about her recently published book The Human Voice.  Along the way she savaged the literature on the relative contributions of words, voice, and body language to communication -- both the original Mehrabian research and the "7 - 38 - 55" version that spread into folk knowledge (recently discussed here) -- and disputed claims that women talk a lot more than men (a topic that Mark Liberman has been returning to on Language Log again and again after his first postings on the subject this summer).  I haven't seen her book yet, but she sounds generally level-headed.  Meanwhile, you can listen to the Florey and Karpf interviews via the "Forum" homepage.

zwicky at-sign csli period stanford period edu

Freedom for data

Good news today for bloggers like us here at Language Log Plaza: the California Supreme Court has struck another blow against common law liability for republication. You can't be sued for libel (they claim) simply for reporting on a blog what another source has said (see this report). Ilena Rosenthal had been sued for publishing, on a web site she did not control, certain statements taken from an email by Tim Bolen about a couple of medical doctors, Stephen Barrett and Terry Polevoy, who run a web site that attacks alternative medicine (Rosenthal is a defender of alternative medicine). Among other things, she alleged that Barrett is "arrogant, bizarre, closed-minded; emotionally disturbed, professionally incompetent, intellectually dishonest, a dishonest journalist, sleazy, unethical, a quack, a thug, a bully, a Nazi, a hired gun for vested interests, the leader of a subversive organization, and engaged in criminal activity (conspiracy, extortion, filing a false police report, and other unspecified acts)".

What we learn from the Supreme Court's judgment is not just that she can't be sued for libel for reporting those judgments about Barrett in another forum, but also that I can't be sued for letting you know what she said. I suppose it could conceivably still be actionable for me to tell you that Strunk and White are dishonest, closed-minded, emotionally disturbed, professionally incompetent, unethical, fanatical, ignorant, linguistic charlatans and puppy torturers, because I'd be the primary utterer of that claim, not just a reporter of it. But hey, they're dead.

Posted by Geoffrey K. Pullum at 09:38 PM

W and Vietnam: together again, linguistically

Nathan Bierma sent in a link to a classic linguification from Jay Leno:

"President in Vietnam. I bet you never thought you'd hear those words in the same sentence. It's like saying Bill Clinton and celibacy in the same sentence." --Jay Leno

The interesting thing about this one is that the joke depends precisely on the fact that W and Vietnam have often been together linguistically, in numerous sentences discussing his efforts to stay out of Vietnam physically. During the previous two presidential campaigns, I'll bet that there were hundreds if not thousands of such sentences in the media, and probably quite a few in comedians' monologues as well. One of the issues was the way that W avoided service in Vietnam by enlisting in the National Guard, an opportunity that he was alleged to have gotten as a result of his family's political string-pulling. Another issue was W's allegedly casual attitude towards the requirements, such as they were, of his National Guard duty. (This was a time when the draft was used to supply manpower for the war, instead of the call-ups of Guard and Reserve units that are now normal.)

In a story by George Lardner Jr. and Lois Romano, published in the Washington Post on July 28, 1999, under the headline "At Height of Vietnam, Bush Picks Guard", we have these five sentences containing the words Bush and Vietnam:

1. Later, when Bush was commissioned a second lieutenant by another subordinate, Staudt again staged a special ceremony for the cameras, this time with Bush's father the congressman – a supporter of the Vietnam War – standing proudly in the background.
2. Vietnam was clearly a crucible for Bush, as it was for Bill Clinton, Al Gore and most other men who left college in the late 1960s.
3. Bush maintains that he joined the National Guard not to avoid service in Vietnam but because he wanted to be a fighter pilot.
4. As he drifted, Bush struggled with his own feelings about Vietnam and the turmoil he saw around him in America.
5. Bush says that toward the end of his training in 1970, he tried to volunteer for overseas duty, asking a commander to put his name on the list for a "Palace Alert" program, which dispatched qualified F-102 pilots in the Guard to the Europe and the Far East, occasionally to Vietnam, on three- to six-month assignments.

If we include sentences with pronouns or full noun phrases referring to W, we get three more sentences from the same article:

6. He didn't dodge the military. But he didn't volunteer to go to Vietnam and get killed, either.
7. By enlisting in the Guard, his son not only avoided Vietnam but was able to spend much of his time on active duty in his home town of Houston, flying F-102 fighter interceptors out of Ellington Air Force Base.
8. "I'm saying to myself, 'What do I want to do?' I think I don't want to be an infantry guy as a private in Vietnam. What I do decide to want to do is learn to fly."

It's easy to find more like this -- the next election brought the whole CBS memogate business -- but this is enough to make the point. Leno's linguification is paradoxical: he can make a joke about how you don't expect to find Bush and Vietnam in the same sentence, precisely because Bush's efforts to avoid Vietnam have been so extensively and memorably discussed.

[Russell Borogove suggests that Leno's joke depended on the word "in" being part of the sentence we never expected to hear. Maybe so -- but I was relying on the parallelism implied by Leno's next observation, "it's like saying Bill Clinton and celibacy in the same sentence", which seems to set up the analogy Bush:Vietnam::Clinton:celibacy. Russell's idea, I think, is that the analogy should be Bush:in Vietnam::Clinton:celibacy, which seems forced to me. I construed the joke as relying on a sort of metaphorical connection between being linguistically close and being geographically close. I might be wrong -- but as we've seen many times in the past, people are not shy about making metaphorically-intended assertions about what words do (or don't) occur together that are obviously false, if taken literally.

I guess another option might be that Leno meant a generic (U.S.) president, and not George W. Bush in particular. I considered and rejected that idea, since Clinton visited Vietnam in November of 2000, when he was still president.]

[Jim Lewis writes:

Some years ago, in the 80s, I would guess, the New Yorker ran a humor piece by Veronica Geng -- I can't find the text itself on the web, but you can find it in one of her books -- called 'Love Trouble is My Business'. The piece begins with an epigraph quoting a Village Voice article, which itself quotes a Sunday Times story. The Times story said something about Ronald Reagan and Proust; the Voice writer suggested that it would be the only time the words "Mr. Reagan" and "read Proust" would ever appear in the same sentence. In Geng's piece, the words "Mr. Reagan" and "read Proust" occur in every sentence.

A quick search on turns up "Love Trouble: New and Collected Work", for which the "search inside" feature is available. This shows that the piece entitled "Love Trouble Is My Business" starts on page 149. The opening quote is from Geoffrey Stokes (in the Village Voice, August 14, 1984). Stokes in turn quotes Francis X. Clines, writing in the Sunday Times, to the effect that "subjects such as the Soviet Union seem to haunt Mr. Reagan the way vows to read Proust dog other Americans at leisure", and comments that "This may be the only time in history in which the words "Mr. Reagan" and "read Proust" will appear in the same sentence".

Since Stokes was unwise enough to use the future tense, the door is open for Geng to write a story that begins:

I glanced over at the dame sleeping next to me, and all of a sudden I wanted some other dame, the way you see Mr. Reagan on TV and all of a sudden get a yen to read Proust. Not that she wasn't attractive, with rumpled blond curls adn a complexion so transparent you could read Proust through it -- that is, as long as her cute habit of claiming a tax deduction for salon facials didn't turn up in some IRS stool pigeon's memo to Mr. Reagan.

And so on. ]

[Jay Cummings writes:

I think what is interesting about Leno's joke is that it would not be particularly funny if it was not a linguification. "I bet you never expected to see the President in Vietnam." Of course, a professional comedian manages to make some amazing things funny, but still, the lingufied version seems funny even on the page.

Maybe this is some recognition of the absurdity of the formulation?


[And Jim Lewis adds:

Now that I think about it, I should point out that Geng cheats a little bit. The original Times story, and the Voice article that refers to it, uses "read Proust" in a way which indicates that the "read" is present tense. Geng's piece shifts between present tense "read" and the past tense "read", which I assume counts as two different words, no? Maybe not, but I'd love to have heard the discussions between her and the New Yorker's celebrated fact checkers. (I doubt that humorous pieces are granted a pass on such matters: they fact-check poems over there.)

A great piece, though.

Happy Thanksgiving,

Same to you, Jim!

But if the New Yorker's fact checkers were ever very careful about matters of linguistic fact, they aren't now. A few examples: "Those slurry, sleepy southerners" (2/25/2004), "No hurr in Nellyville?" (4/4/2004), "AW+" (4/29/2004), "Grammatical complexity and electability" (5/6/2004) "And every lion tongue cast down" (8/1/2005), "Invariably followed by the phrase" (10/23/2005) ]

Snowclones in the New Scientist

Mentioned, but also used:

When Saddam Hussein claimed the first Gulf war would be "the mother of all battles", he coined an endlessly reusable formula that has given us the mother of all plagues, stink bombs, waves, firework displays and brain cells (all, alas, taken from the pages of the mother of all science magazines).

Posted by Mark Liberman at 04:34 PM


I can see how this happened, but the result looks odd indeed:

Then there are families like R.’s and his partner’s’ that from the outset seek to create a sort of extended nuclear family... ("Gay Donor or Gay Dad", by John Bowe, New York Times Magazine 11/20/06. p. 69)

Let's take this step by step.  First, we want families like X, where X is an independent possessive (one lacking a nominal head).  For personal pronouns, there are special forms for the independent possessive -- mine in families like mine -- while for other NPs the independent possessive is identical to the determinative possessive (which is in construction with a following nominal head), for which the default form is pronounced with a final Z (with three variants, according to phonetic context), spelled with final ’s; that gives us things like families like George’s, families like my best friend’s, families like my friend from Chicago’s.

Ok, now we want the X in families like X to refer to the family comprising R. and his partner, so we need the possessive of R. and his partner, and that would be, following what I just said, R. and his partner’s: there are families like R. and his partner's that...  This is fine, but it doesn't sound quite right to some people, because it seems to coordinate R. (non-possessive) with his partner's (possessive), which looks like a failure of parallelism.  How to fix that?  Make the first conjunct possessive as well.

(Notice that warnings against non-parallel coordination might have played a role in the development of these "distributed" possessives.  Proscriptions and prescriptions can have all sorts of side effects.)

Now we have families like R.’s and his partner’s, with possessiveness distributed across the two conjuncts.  This is also fine, though it might be understood as meaning "families like R.’s family and families like his partner’s family", referring to two families rather than one.  That is, for people who can distribute possessives, the resulting expressions are systematically ambiguous between reference to one thing (the distributed possessive) and two (coordination of ordinary possessives).  This is not the end of the world; as listeners and readers, we use context, background information, and reasoning about what is plausible to discern intended meanings, and we do this all the time, with enormous speed and (usually) considerable accuracy.  (I believe that I am not inclined to distribute possessives, but I'm not about to try to stop other people from doing it, and I have no trouble figuring out what they mean when they do it.)

So far we have two versions of the independent possessive: families like R. and his partner’s and families like R.’s and his partner’s.  This would be a good moment to quit hassling the possessive and go on with the rest of the sentence, but, alas, Bowe — or an editor — chose to think some more about families like R.’s and his partner’s.  Here's the problem: R.’s and his partner’s looks like a simple coordination of two possessives.  But we want to mark possessiveness on an entire expression referring to R. and his partner as a pair.  So we need a mark of possessiveness at the end of the whole expression R’s and his partner’s.  This is where the reasoning runs off the tracks -- possessiveness is already adequately, perhaps more than adequately, marked -- but let's press on.

[Addendum later: well, maybe we shouldn't.  Daniel Ezra Johnson notes that the final apostrophe has disappeared in the on-line version of the story (I just checked, and he's right), which suggests that the whole thing might have been a cut'n'paste error.  I still have some useful things to say, but the original point is somewhat blunted.]

How would we indicate possessiveness at the end of R.’s and his partner’s?  Up above, I gave the default scheme, involving Z or 's, but there's a special case, for expressions in which the last word already has a Z suffix.  This happens most frequently when the last word is a regular plural of a noun, as in the NPs the birds and my friends: the birds’ wings, my friends’ advice (cf. my children’s advice).  This word does not have to be the head of the NP: The advice of my friends’ [not friends’s] being so helpful, I decided to...  In any case, the possessive suffix is suppressed in speech, its presence indicated in spelling by a final apostrophe.

The possessive suffix is suppressed not only by a plural Z suffix, but by other Z suffixes as well.  In particular, it's suppressed by another POSSESSIVE suffix.  It takes a little work, but you can devise examples in which two possessive suffixes would be expected but only one surfaces.  (By the way, none of the observations about English I'm making here are novel; they've been around for some time.)

Background: independent possessives occur in at least four constructions:

Anaphoric zero:  Kim’ essay was long, but mine/Sandy’s was even longer.

Predicative:  That book is mine/Sandy’s.

Double genitive:  friends of mine/Sandy’s

Locative:  Let's meet at Sandy’s.  = "Let's meet at Sandy's place/house."

These can be mixed with one another or with a determinative possessive.  I'll illustrate a few of the possibilities with double genitives:

Double genitive inside determinative:  Let's meet at that friend of Sandy's/*Sandy’s’s place.

Double genitive inside anaphoric zero:  Kim’s essay was long, but a friend of Sandy’s/*Sandy’s’s was even longer.

Double genitive inside locative:  Let's meet at that friend of Sandy's/*Sandy's's.

Now, the handbooks don't even contemplate such examples, so they don't tell you how to punctuate them.  I've chosen to minimize the number of punctuation marks, using ’s to stand for two possessive suffixes.  You could make a case for ’s’, extending the orthographic marking of a suppressed Z from the paradigm examples: Let's meet at that friend of Sandy’s’ place.  It looks ugly to me, but at least it's consistent.  This is in fact the spelling in the Times example we started with.  The spelling would be defensible, but the problem with the families of R.’s and his partner’s’ is not the orthography, but the signalling of an entirely spurious possessive suffix at the end of the independent possessive.

While we're on the subject of Astounding Possessives, let me mention two problematic cases that John Singler and I and our students at NYU and Stanford, respectively, have been looking at over the years: the Coordinated Pronoun Problem and the You Guys Problem.

The Coordinated Pronoun Problem.  Suppose you are a married man, and you want to talk about the problems that you and your wife have been having; you want to talk about X problems, where X is a possessive expression referring to your wife and you as a couple.  What you get off the shelf (see discussion above) is: my wife and I’s problems.  A lot of people recoil from this (and similar examples with other personal pronouns as a second conjunct); the I’s sounds just wrong.  The easy solution is to distribute the possessive (again, see discussion above): my wife’s and my problems.  This risks losing the sense of your wife and you as a unit, a couple.  So you might be moved to combine the virtues of the ordinary possessive and the distributed possessive.

A number of people have stretched English grammar in search of a solution.  (Sightings of these non-standard variants go back at least to a 10/16/91 posting to the Linguist List by Steve Harlow.)  Such a solution will have a possessive 's at the end of X, as in the ordinary possessive, but it will avoid the ugly I’s, in favor of something less ugly — for instance, my’s, using the my from the distributed possessive: my wife and my’s problems.  (The parallel for the Times example would be families like R. and his partner’s’.)  Singler and I have collected examples, and you can google some up — 21 webhits for my wife and my’s -- though people have tried a variety of other solutions, covering all the morphological possibilities: my wife and me’s (2 hits), my wife and myself’s (35), my wife’s and mine’s (47).  (In contrast, I get 11,000 hits for my wife and I’s and 29,300 for my wife’s and my, though maybe half of the latter are irrelevant.)

Yet another solution is the exact parallel to the Times example: distributed possessives plus final ’s, that is, my wife’s and my’s problems.  Again, Singler and I have some examples, but this time Google is not our friend: no webhits for my wife’s and my’s or my wife’s and me’s, two for my wife’s and mine’s, 13 for my wife’s and myself’s.

[Addendum: Aaron Dinkin points out yet another resolution: my wife and my problems (for 'the problems of my wife and me').  It's hard to tell how common this one is, since you can really search for examples only with a head noun supplied.  But there are at least a few examples out there.]

The You Guys Problem.  The combination of a plural personal pronoun (you, we, or us) with a plural noun presents a puzzle in syntactic analysis: is the pronoun a determiner modifying the noun as head; or is the pronoun the head, with the following noun in apposition to it; or are they co-heads, in a kind of copulative compound?  Might different speakers have different analyses?  Might some speakers have more than one analysis?  Syntacticians have puzzled over these questions for years.  For the first person plural pronouns, the topic is especially vexed, since prescriptions about pronoun case interfere with attempts to collect judgments.

For one particular instance of this combination, the very frequent informal you guys, speakers exhibit much more variation in their choice of possessive forms than for others, in ways that suggest that they see the combination as having two equal parts AND that they treat the whole thing as an expression that doesn't necessarily involve an ordinary plural noun guys

First, the off-she-shelf possessive would be you guys’, as in you guys’ ideas.  A lot of people shrink back from that; I myself am not particularly comfortable with it.  One pretty common alternative distributes the possessive: your guys’, as in your guys’ ideas = "the ideas that you guys have".  I collected my first examples at the 2005 Berkeley Linguistics Society meeting, where one commenter on a paper referred repeatedly to your guys’ analysis.  A little while later I heard Barry Bonds use this possessive (referring to the reporters at a press conference), then found piles of examples on the net, and collected some more examples from the speech of graduate students and colleagues.

An alternative is to treat you guys as an expression that just happens to end in /z/.  Then the off-the-shelf possessive would be you guys’s, and Singler and his students have plenty of instances.  About 11,700 Google webhits, which certainly isn't chopped liver.

Finally, you can do both at once: your guys’s.  Some informants report preferring this to the singly marked you guys’s, and it gets a lot of webhits (about 26,500), though many of these are probably references to the line "Could I use your guys’s phone for a sec?" in the 2004 film Napoleon Dynamite.

In more formal speech and writing, of course, you don't use you guys at all, just you, an alternative that is also available in informal speech and writing, but at the risk of ambiguity between singular and plural.  In many cases, this ambiguity is actually troublesome, so you guys is a good thing to have, especially if you speak a dialect that lacks a distinguished plural like y’all.  Once you have it, though, you're stuck with finding a possessive form for it.

Posted by Arnold Zwicky at 02:43 PM

One of those people that care(s)

The second hour of KQED's "Forum" radio program this morning had as its guest Kitty Burns Florey, author of Sister Bernadette's Barking Dog: The Quirky History and Lost Art of Diagramming Sentences (Melville House, 2006), a charming and decidedly non-technical account of Reed-Kellogg sentence diagramming and those who have loved it.  She kept reminding her listeners that she was neither a linguist nor an English teacher, she carefully made no claims about the pedagogical values of sentence diagramming, and she was realistic about change in language (while struggling to recognize what was "technically" or "traditionally" correct).  But of course most of the phone calls were from people retailing their pet peeves about English grammar and usage, complaints that will be familiar to readers of Language Log.

Early on in the calls came one beginning firmly:

I'm one of those people that cares... that care.

(meaning that the caller cared about prescriptive correctness).  The caller laughed and then went on with her complaints, and nobody remarked on either of the usage points in her first sentence.

[Correction: now that I can access the recording of the show, I see that I got my transcription backwards: "Susan from Berkeley" says: "I was going to say that uh I'm one of those people that care [laugh]... that cares."  This is a bit more delicious than what I thought I heard the first time, as we'll see below.  (Thanks to Jonathan Lundell.)]

It's been five whole months since we wrote about the choice of singular or plural verb in a restrictive relative clause following one of + plural NP: singular to go with one, or plural to go with the plural NP?  (The plural variant is considerably older, but the singular has been around at least since Shakespeare and people have been complaining about it since around 1770, after it began appearing with some frequency in the works of respected writers; MWDEU suggests that there's a subtle difference in meaning or discourse function between the alternatives, so that both should be accepted as standard.)  The caller went for the singular first and then altered it to the plural, possibly recognizing the "correction" with her laugh.  Maybe she cares too much.  [Addendum: now we see that she started with the prescriptive standard (plural) and revised it to the sometimes-proscribed version (singular).]

People who maintain that they CARE about grammar very often care about that as a restrictive relativizer with human-denoting heads, maintaining that only who is acceptable in formal writing (or even acceptable, period).  MWDEU tells the convoluted story of relativizer that with reference to human beings: it came first, then fell out of favor, but was revived in the 18th century, though with a bad taste left over from its years in exile among the common people; John Simon and William Safire have deplored it.

In searching the Language Log archives for the link to my earlier posting "One of those who", I pulled up the postings in which this expression and some of its variants were used (rather than mentioned) by the bloggers.  Our usage on the singular/plural issue is divided: two to two for "one of those who" (singular in Mark Liberman's postings #2459 and #2466, plural in Geoff Pullum's #937 and Mark's #1347), an edge for the singular for "one of those people who" (Mark's #1209, #2381, and #3044, versus plural in Geoff Pullum's #1461 and someone I quoted in #3555).  In any case, we are not unhappy with the singular.

On the that/who issue, we seem not to have used that at all for reference to human beings in the contexts "one of those people..." or "one of those..."  So we're inclined to be who users.  But we wouldn't deride the "Forum" caller for her choice of that.  That's ok with us.

Posted by Arnold Zwicky at 09:27 AM

Wah piang eh! Si beh farnee!

Victor Mair writes:

I fell in love with Singaporean English when I heard it spoken in the delightful movie entitled "I not Stupid." It's an amazing mix of English, Hokkien and other Sinitic languages, Malaysian, Indian languages, and probably some other elements as well. One of the things that is most peculiar about Singlish, as it is fondly called by the natives, is the extensive use of Sinitic particles that add all sorts of nuances to an expression or sentence.

Here are a couple of examples of Singlish:

"Wah lau buay sai lah. The tee-cher say must put one. If not sure kena sai lah."

Translation: That's not possible. The professor implied otherwise. Therefore, a failure to do so would result in an unfavourable outcome for me.

"Aiyah my essay cheem meh? Where got cheem?"

Translation: Is my essay really difficult to understand? That can't be the case.

Be sure to read some of the entries in the extensive "Talking Cock" lexicon, and as background, the Wikipedia entry for Singlish.

I have an ex-Singapore army man in my Classical Chinese course. He's smart and very funny; I really like him. He told me about a movie called "Army Daze" that depicts how all young men in Singapore have to serve in the army, regardless of their background and character. To get a good taste of Singlish and the life of a Singapore army recruit, here's the whole film in nine parts:

Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9

Wah piang eh! Si beh farnee!

Posted by Mark Liberman at 07:05 AM

English as a Quasi-Official Language of China

When I got back from Manchester, two guest posts from Victor Mair were waiting in my inbox. Here's Victor's first note:

In an earlier post, I observed the ubiquity of English-language teaching in Chinese schools, in most cases starting from the elementary grades. More evidence for the growing importance of English is its usage instead of Chinese at international conferences, meetings, and diplomatic events, and even more prominently in business with other countries.

Attached hereto is a photograph that accompanies an article about the signing of an agreement between China and Cambodia concerning cultural preservation. Wen Jiabao, the Premier of the PRC, is seated just to the left of center. The signing took place in Phnom Penh on April 8, 2006. I'm only coming across this now as I go through some of the back issues of Zhongguo Wenwu Bao (China Cultural Relics News) that I brought back from a recent trip to China. This report appeared on the front page of the April 12th issue.

Note that the Kampuchean hosts permitted the use of the older English spelling of the name of their country as "Cambodia."

Click on the pictures below for larger versions.

One comment: since the event took place in Phnom Penh, wouldn't the banner have been prepared by the Cambodian hosts? And in that case, isn't this a question of Chinese diplomats permitting the use of English, and not insisting on Chinese being displayed as well, rather than a case in which the Chinese government itself prepared an English-and-Khmer sign for a bilateral meeting? But it's certainly striking to see the Chinese premier signing a bilateral agreement with his Kampuchean counterpart under a banner in English and Khmer.

[Update 11/27/2006 -- Josh Jensen writes:

I'm catching up on the recent LL posts, and I just read this one. It reminded me of a conversation I had with a Chinese young woman last year, a guide for our adoption agency group in Guangzhou, China. She complained that when she'd visited South Korea, there weren't enough English-language signs.


Posted by Mark Liberman at 06:35 AM

No Dragon in that Sausage?

Trading standards officers have ordered the Black Mountains Smokery in Powys, Wales to change the name of its Welsh Dragon sausages on the grounds that they are made with pork, not dragon meat. I'm all for truth in advertising and proper labeling, but it is hard to believe that many consumers have been misled. Even the dullest consumer presumably knows that dragon meat is extraordinarily rare, and, at least in Wales, it seems reasonable to expect consumers to know that the dragon is a symbol of Wales. I'm not surprised that the manufacturer reports that it has received no complaints about the absence of dragon meat from its products.

Beyond the lack of common sense, what I find peculiar is the assumption that the mention of an animal in a brand name implies that the product contains that animal. Do people assume that Koala™ hose connectors are made from koalas, that Grizzly™ salmon oil is made from grizzly bears, or that Deer™ red chilis contain venaison? I just hope that the trading standards people don't start regulating the Chinese names of foods. One of my favorite Chinese words, and foods, is 龍蝦 "lobster" (Cantonese luŋ4 ha1, Mandarin lóng xiā), literally "dragon shrimp".

Typographical bleeping antedated to 1591

It's common to disguise scatological or blasphemous language by replacing some letters with asterisks, hyphens, blanks or other typographical maskers. This avoids violating the letter of an explicit or implicit prohibition against printing certain words. Some might also see this as an instance of magical thinking, where it's safe to cause people to think of certain words, but saying them out loud, or writing them directly and completely, would invoke a sort of incantational power to harm. In any case, I've been curious about this history of this practice, and in some earlier posts ("The history of typographical bleeping", 6/10/2006; "The earliest typographically-bleeped F-word", 6/15/2006) we tracked English-language examples back to a poem by John Oldham published in 1680.

In response to my call for earlier cases, Simon Cauchi takes a form of this practice back almost a hundred years, to 1591. His note is beyond the jump.

I can offer three examples from the works of Sir John Harington (1560-1612), but note the use of parentheses rather than dashes or hyphens.

See Harington's 1591 translation of Orlando Furioso, Book 43, stanza 133:

In fine, he made to him the like request
As Sodomits made for the guests of Lot.
The Judge him and his motion doth detest
Who though five times repulst yet ceaseth not,
But him with so large offers still he prest
That in conclusion like a beastly sot,
So as it might be done in hugger-mugger
The Judge agreed the Negro him should ( )

Secondly, in Harington's epigram "Of a faire woman; translated out of Casineus his Catalogus gloriae mundi", the sixth couplet reads (in the printed edition of 1618):

A narrow mouth, small waste, streight ( )
Her finger, hayre, and lips, but thin and slender:

but there is no bleeping in the manuscript prepared for presentation to Prince Henry (Folger MS V. a. 249), where the text reads:

A narrow mouth, small waste, strayght privy member,
her fingers, hayr and lips, but thin and slender

(The spellings "strayght" and "streight" are of course to be understood as "strait".)

Thirdly, the epigram "Of Garlick. To my Ladie Rogers" is short enough to be quoted in its entirety. The printed edition reads:

If Leeks you like, and doe the smell disleeke,
Eate Onions, and you shall not smell the Leeke.
If you of Onions would the sent expell,
Eate Garlicke, that will drowne the Onyons smell.
    But sure, gainst Garlicks sauour, at one word,
    I know but one receit, what's that? (go looke.)

In the Folger MS the last line is also bleeped, but by other means:

I know but one receipt, what's that? Tobacco.

The Folger MS was intended for the eyes not only of Prince Henry but also of his father King James I, whose dislike of tobacco was well known.


Note that in these cases, the hint that allows the reader to infer the writer's intention is the (meter and) rhyme, rather than the initial letter.

The last example has a special twist: the reader is led to expect "a turd", and then sees "tobacco", a substitution that conveys an additional message. This is a familar technique, which I know that I've seen several times in humorous songs -- but at the moment, all that I can remember are a couple of fragments of tune, without the words. If your memory is better than mine, let me know.

[OK, here's one -- Antoine Hervier writes

In the movie Shrek, when the Ogre and Donkey arrive in Duloc, they are greeted with a cute little song, with these lyrics :

Please don't step on the grass
Shine your shoes
Wipe your [pause] face

I do remember that one now, but it's not the one that was on the tip of my tongue. Nor am I trying to remember Sweet Violets, sent in by George Kesteven. However, this note from Daniel R. Tobias nails it:

Regarding your request in the Language Log, one famous example of a song where a "bad word" is substituted with something that doesn't even rhyme is "Shaving Cream", originally written in 1946 by Benny Bell, sung by Paul Wynn, and redone by Dr. Demento in 1975.

A schoolyard rhyme that I remember as "Miss Lucy", but Wikipedia's entry calls "Miss Susie", consists of a series of stanzas each of which seems to be leading to a slightly naughty word, which is then made part of an innocent starting word beginning the next verse, like:

"...Miss Lucy went to heaven
and the steamboat went to

Hello operator
Get me number nine..."

It's so old that perhaps it actually dates to a time when some people had single-digit phone numbers.

Yes, "Miss Susie" is the one that I was remembering. How could I forget?

Anyhow, there are dozens of these songs out there. In some cases, the taboo word at the end of the stanza is replaced by a completely separate word which also starts the next stanza; in other cases, a homonym is used in the same way.]

[Jacob Coughlan contributed this set of variant verse, all the way from Melbourne, Australia -- but I recall hearing several of them in Mansfield Center, Connecticut:

In response to your article "Typographical bleeping antedated to 1591", in which you asked readers to give examples of humourous songs with ribald lyrics substituted for something more innocuous, here are the lyrics to one such song from my childhood, as I knew it. There are, of course, many variations.

Anyway, in this particular song, the lines run into each other, so that the beginning of the suggested dirty word morphs into the innocuous beginning of the next verse. The joke is rammed home by the shocking inclusion of an (unexpected) actual dirty word in the final verse:

Aunty Mary had a canary,
thought it was a duck,
took it round the corner
taught it how to...
Fried eggs for dinner,
fried eggs for tea,
the more you eat,
the more you want,
the more you gotta...

Peter had a boat,
the boat began to rock,
up came Jaws
and bit off his...

ginger ales,
forty cents a glass,
if you don't like it
you can shove it up your...

Ask no questions,
tell no lies,
I saw a Chinaman
doing up his...

Flies are bad,
mosquitoes are worse
that is the end
of my fuckin' naughty verse.

Hope you enjoyed that as much as I did.


[Eric's contribution:

I hope you're not getting flooded with rude, half-remembered schoolyard verse, but probably you are. I was struck by how much your quoted version resembled what I remember from childhood half a world away (Bronx, NY), and also how much it differed. So the real question is -- is someone tracking versions of Miss Susie, by time and place? Because with a little data, the next step is a cladogram!

The version I know has no "Susie" in it at all. And, in retrospect, it's almost certainly two independent pieces welded together at about "Engine engine number nine":

Ungowa! Shipowa!
Your mother don't take no shower!
I said it, I meant it,
I'm here to represent it.

Engine engine number nine
Sock it to me one more time!

Ikey and Dikey
Were playing in the ditch,
Ikey called Dikey a
Dirty son of a . . .

Bring along the children
And let them play with sticks
So when they get older
They'll know how to play with . . .

Dixie had a baby,
She named him Tiny Tim
She put him in the pisspot
To see if he could swim.

He swam to the bottom
He swam to the top
Along came a bumblebee
And stung him up his . . .

Cocktails, ginger ale
Five cents a glass
And if you don't like it
You can shove it up your . . .

Ask me no more questions
I'll tell you no more lies
A kid got hit with a bag of shit
Right between the eyes!

I like the idea of Miss Susie cladistics -- but a trendier name would be "memetic phylogeny". ]

[ Chris Conroy wrote:

In response to your request for songs with an expected dirty rhyme, here are two that came immediately to my mind.  The first is an unreleased Weird Al Yankovic parody, "It's Still Billy Joel to Me" (parody of "It's Still Rock n' Roll to Me" by, naturally, Billy Joel).

The relevant verse (the 'B' verse, if you're familiar with the song):

      Now everybody thinks the new wave is super
      Just ask Linda Ronstadt or even Alice Cooper
      It's a big hit, isn't it
      Even if it's a piece of junk

      It's still Billy Joel to me

It's a fun parody, at least if you're a Billy Joel fan with a sense of humor.  It's a shame Weird Al couldn't get the rights to release it.  (Apparently he always seeks permission from the songwriter, even though, as parody, he's not required to.)

The second example is more obscure.  It's a parody of an old hymn called Dies Irae, written about the "culture wars" going on in the Catholic Church between proponents of traditional hymnody and pop/folk-style contemporary music.  The writer is one of the former.  Most of the references won't mean much to someone outside of Catholic music circles, but I think you'll find this one verse is particularly clever nonetheless.  The three names mentioned in the last line are the three most prolific composers of contemporary music used in Catholic Masses today.  The final name does not actually rhyme with the other two line endings, though from the author's point of view, I suppose it might as well:

       Smite them, Lord, yet of thy pity
       Take their songsters to thy city:
       Even Haugen, Haas, and Schutte.

I bet that one has them ROSL (rolling in the sacristy laughing).]

[Daniel R. Tobias provides the version of Miss Susie from upstate New York in the 1970s:

All the variants of the Miss Susie / Lucy rhyme are interesting... does anybody know when and where it originated anyway? Occasionally, the versions have bits in them that seem to date them, like the "Jaws" reference in one of the quoted versions (that seems to refer to a mid-1970s movie). I note that the drink is five cents a glass in one version and forty in another; can that be charted against the Consumer Price Index in an attempt to date the variants? At least in the era before the Internet and such pop-cultural stuff as The Simpsons and South Park which like to make use of things like these rhymes, they spread entirely by kids teaching them to other kids without any assistance of the mass media.

Anyway, the version that I remember, from upstate New York in the mid 1970s, goes like this (similar but not identical to the Wikipedia-quoted version):

Miss Lucy had a steamboat
The steamboat had a bell
Miss Lucy went to heaven
and the steamboat went to

Hello operator
Get me number nine
If you disconnect me
I will kick your fat

Behind the refrigerator
There is a piece of glass
Miss Lucy sat upon it and
it broke her little [or: it went straight up her]

Ask me no more questions
Tell me no more lies
Boys are in the bathroom
pulling down their

Flies are in the pantry
Bees are in the park
Boys and girls are kissing in the
D-A-R-K, D-A-R-K...

Well, some people have pointed out that one piece of the song must have originated at the time when human operators were involved in making telephone connections, among a set of local numbers small enough that a single digit like "nine" would have been a normal selection among them (with connections made by human operators, not all numbers need to have the same number of digits...). Folklorists study this sort of thing, but I'm not sure whether there is a literature on Miss Susie.]

[Michael Mann offers a German example:

Your post "Typographical bleeping antedated to 1591" reminded me of a song that was quite popular here in Germany in 1978 (I wasn't yet born then, but I read that it was quite popular), sung by Rudi Carrel: "Goethe war gut". You can find the lyrics here:

Like the "turd"-example you gave, Carrel similarly played with the expectations of the audience, "rhyming":

"Lust" (lust, delight) not with "Brust" (breast) but whith "Brille" (glasses);

"was jeder weiß" not with "der größte Scheiß" (the biggest shit) but with "der größte Segen" (the biggest blessing)

and so on. Which means, as he does so in every second line, the song doesn't rhyme at all.

(See also:


[Ray Girvan has another German reference:

I recall a German schoolyard equivalent:

Scheint die Sonne so warm
Nehm ich Papier untern Arm,
Scheint die Sonne so heiss,
Setze mich nieder und

Scheint die Sonne so warm
etc etc


[Martyn Cornell sent in a URL for the "best-known (and much loved) British taboo-word-substitution-recital, It was Christmas Day in the Workhouse, which is a parody of the 19th century tear-jerker by George Sims, It Is Christmas Day in the Workhouse."]

Bitterest battles in the war on error

A peculiar feature of linguistic prescriptivism is that the most passionate assertions of rightness and wrongness often occur in precisely those areas of the language where there is the most ambivalence among native speakers. Several months ago we saw just such a case of manic overcodification when a newspaper reporter told us about an editor who preposterously insisted that the comparative form of strict should be more strict and never ever stricter. Now comes another pronouncement of rigid exactitude in the highly inexact arena of comparative and superlative inflection.

It started with this sentence in the Guardian, appearing in an article earlier this month about a poll that ranked President Bush as a greater threat to world peace in the eyes of the British public than either Kim Jong-Il or Mahmoud Ahmadinejad:

As a result, Mr Bush is ranked with some of his bitterest enemies as a cause of global anxiety.

This elicited a complaint directed to Guardian reader's editor Ian Mayes, reproduced in his Nov. 13 column:

"Surely," the reader asked, "your correspondent knows that the correct English form is 'most bitter'." I can sympathise to some extent with a writer who in this context felt driven to a new extremity. But what we are involved in here is the war on error and, following Mr Bush's example, we shall seek out errorists and bring them to justice.

But Mayes' bon mot about "the war on error" is, sadly, followed by this odd statement on the acceptability of bitterest:

One of the weapons in my arsenal is the wonderful Oxford English Dictionary on line, but it is at a total loss to find any recorded use of "bitterest".

Huh? If the esteemed reader's editor of the Guardian doesn't know how to use a damn dictionary, then all I can say is: the errorists have already won.

It's true that the online OED doesn't specifically mention bitterest as an inflected form of bitter. But guess what? It rarely specifies any comparative or superlative forms with -er/-est, unless there's something noteworthy involved. The OED is similarly mum about how to make a comparative or superlative out of dumb, but that doesn't rule dumber and dumbest out of the lexicon. (Dumberer is another matter.) Some dictionaries do in fact explicitly list comparatives formed with -er and superlatives formed with -est, and those that do, such as American Heritage and Random House, show bitterer and bitterest without comment. (Webster's Third New International hedges just a little bit, saying the inflected forms of bitter are "usually" -er/-est.)

With only a modicum of know-how in using the online OED's full-text search feature, Mayes could have quickly found no fewer than 41 citations throughout the dictionary featuring the word bitterest. The earliest of these is from Layamon's Brut, dating to the turn of the 13th century: "Her heo sculeð ibiden bitterest alre baluwen." (In a more modern rendering, that would be: "He shall therefore abide bitterest of all bales.") That citation even appears in the entry for bitter, so it's hard to miss. Elsewhere in the OED's text one can find bitterest used throughout the course of modern English, right up to the present day. Some notable examples from English literature:

Oh World, thy slippery turnes! Friends now fast sworn,
Whose double bosomes seemes to weare one heart,
Whose Houres, whose Bed, whose Meale and Exercise
Are still together: who Twin (as 'twere) in Loue,
Vnseparable, shall within this houre,
On a dissention of a Doit, breake out
To bitterest Enmity.
—Wm. Shakespeare, Coriolanus (1607)

How can He sweeten the bitterest providences, and give us cause to praise Him for dungeons and prisons!
—Daniel De Foe, Robinson Crusoe (1719)

There was nothing wrong in the sentiment; and yet I instantly reproached my heart with it in the bitterest and most reprobate of expressions.
—Laurence Sterne, A Sentimental Journey through France and Italy (1768)

The last, the bitterest pang to share,
For princedoms reft, and scutcheons riven.
—Sir Walter Scott, Marmion (1808)

It was the bitterest chillum I ever smoked.
—Wm. Makepeace Thackeray, Adventures of Major Gahagan (1839)

To these we could add many hundreds of attestations in English poetry, drama, and prose from Chadwyck-Healy's Literature Online database. Even a more modest online literary collection like will give us a wealth of examples from Jane Austen, two Bronte sisters (Anne and Emily), Wilkie Collins, Charles Dickens, Thomas Hardy, Jack London, Mark Twain, and more Thackeray. (Dickens, Hardy, London, and Twain also use bitterer, by the way.)

Additionally, the OED has recorded usage of bitterest in a wide range of modern periodicals, from the Daily Chronicle to the Catholic World to Time to, whaddayaknow, the Guardian. (Under hep, one can find this 1960 comment from a Guardian writer: "Not even its bitterest critics could accuse the Labour party of being 'hep'.") In fact, bitterest shows up a whopping 304 times in the online archive of the Guardian, averaging about 40 appearances a year since 2000.

So where does the reader's insistence on the unacceptability of bitterest come from, and why is Mayes, who should really know better, so ready to accept this "new extremity"? This is not one of the typical prescriptivist bugaboos, as it does not occur in any of the prim grammar guides that I've checked. In fact, several guides from the late 19th century explicitly give the opposite advice, though some, such as Wm. Smith and T.D. Hall's A school manual of English grammar (1887), do note that "many of those compared by er and est take also more and most." Eduard Mätzner's magisterial Englische Grammatik (1860-65) also embraces bitterest and similar forms in this passage (from a later English translation):

Others also of the twosyllabled adjectives not named above frequently form their degrees of comparison by derivational terminations; thus adjectives in ow, el, il, er, ant, t (ct), st ... especially frequent in er: Bitterer remembrances (L. BYRON). In its tenderer hour (ID.). The proper'st observations (BUTLER). The properest means (GOLDSMITH). The soberest constitutions (FIELDING). With bitterest reproaches (CONGREVE). 'twixt bitterest foemen (L. BYRON). The tend'rest eloquence (ROWE). The cleverest man (LEWES).

For more recent advice, there is James Fernald's English Grammar Simplified (1963/1979):

Various other adjectives of two syllables are also compared by er and est, according to no very definite rule: bitter, bitterer, bitterest; clever, clever, cleverest; cruel, crueller, cruellest; handsome, handsomer, handsomest; tender, tenderer; tenderest. The correct usage in such words can be learned only by careful study of the dictionary and of the best authors.

So there's clearly no proscription against bitterest even among those who care deeply about such things. As with stricter vs. more strict, we are faced with a choice between two perfectly acceptable alternatives, and that choice may be dictated by a range of phonological, prosodic, stylistic, and pragmatic concerns rather than an overt grammatical rule. CGEL is, as always, enlightening on this point, as is Britta Mondorf's article "Support for more-support" in Determinants of Grammatical Variation in English (2003). Mondorf points out that "extensive amount of more-support for adjectives in <-r, re> can be attributed to the avoidance of phonological identity effects." That would explain a preference for more bitter over bitterer, but it does nothing to weaken the case of bitterest as an acceptable alternative to most bitter. (Interestingly, Mondorf also provides evidence that certain semantic criteria can override the phonological disposition against bitterer and similar forms, theorizing an affinity between concrete meanings and -er forms. She gives two examples of comparative bitter from the Daily Telegraph, contrasting concrete and abstract usage: "the beer is bitterer" versus "the more bitter takeover battles of the past.")

If you feel strongly one way or the other on this bitter debate, register your voice in this poll hosted by Last I checked, 16% have voted for bitterer/bitterest, 38% for more bitter/bitterest, and 46% for more bitter/most bitter. There is, of course, no way to vote for "all of the above" or "depends on the context," since such wishy-washy acceptability judgments are hardly ever considered in the polarized discourse of verbal hygiene. In the war on error, you're either with us or against us.

[Update: Languagehat and his commenters continue the discussion. I particularly like LH's turn of phrase, "cockamamie ukase."]

What, that lynching stuff?

I meant to tell you a few days ago, when Trent Lott was chosen to return to the political spotlight as Senate minority whip in the next Congress, about how it reminded me of a Language Log post. On NPR this week I heard an interview with one of the Republican Senators who participated in the meeting where the decision on Lott was made. Was there any discussion, the NPR interviewer wanted to know, about the incident of 2002? (You'll recall that in 2002, at the 100th birthday party for Strom Thurmond, Trent Lott said that if the country had elected the Thurmond to the presidency in 1948, "we wouldn't have had all these problems over all these years." Thurmond in 1948 was not only militantly in favor of segregation but also dead set against Federal anti-lynching laws. The inescapable implication of Lott's remark was that a Thurmond presidency would have prevented the rise of the civil rights movement; activist negroes would have been hung from trees rather than getting to demonstrate and vote and gain access to white schools the way they finally and disruptively did in the 1960s.) What caught my ear was that in response the Republican Senator being interviewed simply said: "It didn't come up". The exact words of the punchline from the hilariously spot-on Dilbert cartoon that Mark discussed (in connection with stereotypes of male empathy deficit) just two months before: the strip where Dilbert gets the numbers from Yvonne without asking about how she is coping with her sextuplets now that her house has burned down and she's had shoulder surgery. Sometimes linguistic life imitates linguistic art so beautifully.

November 19, 2006

Find that mystery mumbling phonetician

A further case of a linguist raising serious suspicions and almost getting arrested can be found in the biography entitled The real Professor Higgins by Beverley Collins and Inger Mees (Mouton, 1999; page 352). Professor Daniel Jones. It was pointed out to me by John Wells of University College London (UCL). The story concerns Professor Daniel Jones, the distinguished founder of the Department of Phonetics at UCL (later the Department of Phonetics and Linguistics, where I taught for several years early in my career). Jones, one of the most important figures in the entire history of phonetics, was once taken for a spy — in wartime, so this could have meant the firing squad.

During the Second World War (1939-1945), the operations of the Department of Phonetics were evacuated to Aberystwyth in Wales, because London was under constant German bombing. Professor Jones did not move his domicile to Wales, but had to go there sometimes, especially to conduct examinations. (The tradition at UCL has always been to examine phonetics in part through a process of live dictation of invented nonsense words which the examinees have to write down in the International Phonetic Alphabet, and this requires the simultaneous co-presence of the candidates with an expert practical phonetician who can pronounce arbitrary syllable strings perfectly and recognize accurate transcriptions of them.) Here's the way Collins and Mees tell the tale of Jones's curious incident:

On one of his rare trips to Wales, Jones was busily checking his phonetic transcriptions for the examinations, noting snatches of the Welsh conversation in the carriage, and practising "nonsense words" to himself. He was quite unaware that some perceptive passengers had been distressed by the strange activities of an elderly gentleman who was not only apparently muttering odd noises in a strange language which was neither English nor Welsh, but also writing down peculiar signs and symbols in his notebook. On his arrival at his destination, Jones was alarmed to find the local constabulary waiting to arrest him on suspicion of being a spy.

(By the way, you are probably so young and modern and part of the cell phone generation that it may not have occurred to you that the passengers couldn't have alerted the Aberystwyth police by cell phone calls from the train, because cell phones were still science fiction, some fifty years into the future. But the technology for radio communication from trains was known as early as 1914, and there were also techniques involving inductive coupling to telegraph lines running alongside the train; it would have been possible in principle for a passenger to go down the corridor of the train and alert railway personnel, and for them to pass on the alarm in some way while the train was in motion.)

Strange mutterings, notations in the International Phonetic Alphabet, attentive listening to fellow passengers talking in Welsh... Well, it doesn't exactly sound like Casino Royale, does it? But I guess in wartime people get really nervous. Perhaps (as George Kesteven points out to me) Jones was practicing his bilabials and the other passengers took the wartime slogan "loose lips sink ships" somewhat too literally.

Jones did not, however, cause the Aberystwyth post office to be closed. Barbara Citko, that Polish bioweaponry Mata Hari, still seems to be the only linguist ever to have seemed dangerous enough that a whole post office had to be shut down in the battle to stop her evil schemes. In case anyone is keeping score on such matters.

Thanks to Bill Poser and Barbara Zimmer for research on trains and radio.

Expletive inserted

No word taboo at The New Yorker, it would seem. Bill Buford casually drops the occasionally attested colloquialism lo and fucking behold (184 Google hits) into a description of his thoughts as he hides behind a bush and watches a male turkey appear in response to a slate-scratching device that makes an imitation of a female turkey call:

... I heard a deep slow trilling. A gobble. Lo and fucking behold. I peeked, ever so slowly, through the leaves of my bush and saw him. Whoa! A gobbler, puffed and tail spread, looking like the NBC logo. Wow! I'd called him in! I'd done it!

The New Yorker arrives in American homes just like any other periodical, and has all sorts of cartoons and ads that might encourage kids to look at it. It's puzzling to me why, when The New Yorker can risk dropping the prime obscene expletive of the English language in mid fucking idiom in a feature article about turkeys, so many newspapers are so astonishingly coy that they can't mention shit without at least a couple of asterisks. (I guess I mean that last clause in both its literal and idiomatic senses.)

Fomite: panacea or backformation?

An article by Martin Veitch ("Are dirty keyboards truth or fiction?", Inquirer, 11/17/2006) taught me a new word, and so I'll offer to teach him a research technique in return. Veitch wrote:

Being ignorant about the truth or otherwise of the dangers of dirty keyboards, I asked for expert readers to mail in. Hospital physician Dtaylor took up the filthy gauntlet, replying to suggest that:

“Any ‘fomite’ (medical term for an inanimate object that can transfer contagious disease) can be a problem. Keyboards are certainly one such, and they are hard to clean. (Not as hard to clean as stuffed animals and toys in a hospital playroom, but I digress.) ‘Hospital-grade’ devices that can be cleaned more easily might conceivably help, but are certainly no panacea.”

This is a rare opportunity to correct the OED, which says that fomite is a "rare" variant of fomes (from Latin fōmes, fōmites "touchwood, tinder")

a. The morbific matter (of a disease) (obs.). b. ‘Any porous substance capable of absorbing and retaining contagious effluvia’ (Mayne).

The first mistake here is in the gloss, which reflects an old-fashioned medical misconception. As I understand it, infectious agents are sometimes better transmitted by non-porous surfaces than by porous ones; in any case, restricting the concept to porous substances is wrong. The second mistake is in retaining the original Latin singular fomes, and suggesting that fomite is a rare variant. The OED gives one citation in its entry for fomite:

1859 R. F. BURTON Centr. Afr. in Jrnl. Geog. Soc. XXIX. 134 This must be an efficacious fomite of cutaneous and pectoral disease.

and three in its entry for fomes -- but three of the four are the plural form "fomites":

1773 Gentl. Mag. XLIII. 554 If this putrid ferment could be more immediately corrected, a stop would probably be put to the flux, and the fomes of the disease likewise removed.
1803 Med. Jrnl. X. 213, I cannot say that I have known it spread from fomites.
1851-9 A. BRYSON in Man. Sc. Enq. 248 Either simply through the medium of the atmosphere or by means of fomites.
1882 Quain's Dict. Med. s.v., The most important fomites are bed-clothes, bedding, woollen garments, carpets, curtains, letters, &c.

Presumably fomite is a backformation from the plural fomites. But the current situation seems to be that the back-formation fomite is in wider use than the original fomes.

MEDLINE has 75 hits for {fomite} and 217 hits for {fomites}. There are 54 hits for {fomes}, but 53 of them are instances of fungus species, such as Fomes cajanderi, Fomes fomentarius, etc.). Ironically, in the one article where fomes is used to mean "inanimate object that can transfer contagious disease" (KL Autio, S Rosen, NJ Reynolds, JS Bright, "Studies on cross-contamination in the dental clinic", J Am Dent Assoc 100(3) 1980), it's construed as a plural:

Use of 5% iodophor in 70% isopropyl alcohol was effective in sterilizing certain fomes in the dental operatory.

The wikipedia entry for fomite also implies that this back-formation is now the standard medical term. Encarta has no entry for fomite, giving only fomites as a "plural noun" glossed as "inanimate objects capable of carrying germs from an infected person to another person". AHD gets it right, glossing fomite as "An inanimate object or substance that is capable of transmitting infectious organisms from one individual to another", and indicating that it's a back-formation from the plural fomites of fomes. Merriam-Webster gets it right as well -- the online version gives fomite as a "an object (as a dish or an article of clothing) that may be contaminated with infectious organisms and serve in their transmission", also noting its source as a back-formation.

I didn't know any of this -- or rather, I remembered the fungus species name from an old hobby of mushroom hunting, but I had never heard of the medical term until I looked it up after reading Martin Veitch's article. After quoting Dtaylor's conclusion about fomites -- "'Hospital-grade' devices that can be cleaned more easily might conceivably help, but are certainly no panacea." -- Veitch continues:

Btw, have you ever heard of anybody using the word ‘panacea’ without ‘no’ or ‘not a’ before? But now it’s me that’s digressing.

Having learned a new word from Veitch's article, I'm going to offer to return the favor by pointing out that questions about the distribution of adjacent words can now be explored by using web search. Before I looked, my own memory certainly agreed with Veitch in reckoning that panacea usually gets a preceding negative. But a quick web search turns up plenty of uses like these:

No matter what problems you face in your life, meditation is really a panacea.
Locust-bean bark seems to be a panacea for anything from toothache to impotence.
This book is a panacea for all the misinformation being disseminated about exercise, diet, weight control, and training for sports.
Determination is a true panacea, and cancer cannot win without concession.

And even a W.S. Gilbert lyric (from The Grand Duke):

Come, bumpers – aye, ever so many –
        And then, if you will, many more!
This wine doesn't cost us a penny,
        Tho' it's Pomméry seventy-four!
Old wine is a true panacea
        For ev'ry conceivable ill,
When you cherish the soothing idea
        That somebody else pays the bill!

More interesting, though, web search brings up a kind of phrasal template, often used in headlines, that hadn't occurred to me but is instantly familiar:

X: panacea or Y?

Many examples feature alliteration in the Y position, and sometimes in X as well:

Prozac: Panacea or Pandora?
Pending POPs Pact: Environmental Panacea or Parody?
Trillion FER: Is it panacea or poison?
The ITIL Configuration Management Database: Panacea or Pandora's Box?
Xenotransplantation: panacea or poisoned chalice?
E.Society - panacea or apocalypse?
Total Quality Management: Panacea or Pitfall?
Research governance: panacea or problem?
Appropriate Technology : Panacea or Pipe Dream?
Omega-3 fatty acids - panacea or poison for prostate cancer patients
DNA Evidence: potential panacea or pandemonium?
Situationism: Panacea or Placebo
RFID - panacea or pain?
Digital images: panacea or problem?
Medicare--Panacea Or Death Potion?
Abiotic oil - panacea or pipedream?
E-learning: panacea or pandemonium?
Anti-D in Midwifery: Panacea or Paradox?
Open Access: a panacea or something to be panned?
Growth hormone: panacea or punishment for short stature?

But of course there are plenty of examples without alliteration, or with alliteration only in the X position:

Open access publishing -- panacea or Trojan horse?
Private equity: panacea or crisis in waiting?
Smart Drugs - Panacea or nightmare.
Basel II: panacea or a missed opportunity?
Quantum Theory: Spiritual Panacea or Red Herring?
Outsourcing: Panacea or Bogeyman?
CDM – panacea or niche player?
Broadband aggregation : Panacea or folly?
Flat Panel Displays - panacea or fad?
JavaServer Pages: Panacea or Quagmire

There are lots more -- Google claims 215,000 pages containing the string "panacea or", and another 38,200 for "or panacea". And there's the usual penumbra of variant structures.

These snowclonish headlines may not be counter-examples, however -- it's characteristic of "negative polarity items" that they work in question contexts as well as negative ones, presumably because questions and negations share a property of "non-veridicality".

As the positive examples that I gave earlier showed, panacea is by no means a true negative polarity item. But it's got a sort of preference for non-veridical contexts, all the same.

Statisticians and their conjunctions

Matthew Cheney's "Rules for Writing":

If you use adjectives in your prose, do not use nouns. If you use nouns, you must not use verbs. If you use verbs, try to avoid verbs that specify a particular city.

When specifying particular cities in fiction, do not use cities that have been specified in poems. Poems have so few things left of their own anymore that we should let them have their own cities. [...]

If you write about the weather, use as many adjectives as you can, or else your nouns will wilt and become adverbs.

Some coaches insist adverbs are stronger than nouns, but an independent panel of statisticians has proved otherwise. Despite appearances, though, statisticians don't like nouns so much as they adore conjunctions.

I believe that this is the best stylistic advice dealing with parts of speech that is now available, pending the publication of Ben Yagoda's "When You Catch an Adjective, Kill It: The Parts of Speech, for Better And/Or Worse".

One emendation, though: statisticians only adore conjunctions of statistically-independent events.

[And you will need to interpret Ben Yagoda's title (though not the contents of his book) in the light of Geoff Pullum's classic LL post "Those who take the adjectives from the table", 2/18/2004.]

Find that mystery linguist woman

Linguists often arouse suspicion. They make field trips to unusual regions; they sit down to have lengthy conversations with members of minority populations who speak strange languages that the security forces do not know, and they make copious notes involving strange phonetic symbols. Who knows what they are really up to. They have frequently aroused the interest of police and intelligence services in various countries of the world. African linguist Jack Mapanje spent time in jail in Malawi (he still doesn't know why); Harvard linguistics student Victor Manfredi was arrested as a spy and spent three days in jail before a judge set him free (Victor spoke Igbo, and that turned out the be the judge's native language; piece of luck!). MIT-trained syntax and semantics specialist Tanya Reinhart has been jailed in Israel more than once for pro-Palestinian activism. But, to my knowledge, only one member of the theoretical linguistics profession in the USA has been taken for a domestic terrorist to an extent so serious that a major US post office had to be shut down in direct response to the threat posed. I'll tell you the story, if you wish.

The scene (picture it) is a post office in Salt Lake City three or four years ago. A dark-haired woman approaches the post office counter, and says in a slight foreign accent (ah! an alien!) that she wishes to mail a number of slightly bulky letter envelopes, all to American universities (just like the Unabomber!). But something else was noticed about the envelopes, and about the woman. Something that chilled their blood.

The mystery woman paid for the postage, and remembers being slightly surprised to see as she left the post office that the envelopes had been left on the counter untouched, rather than tossed into the outgoing mail bin. The staff seemed not even to be going anywhere near them, in fact.

Once mystery woman had exited, the whole place went into panic mode. The postmaster was called; customers were hustled out; the entire post office was closed and sealed. Specialist teams in protective suits were called in to pick up those envelopes. Police were contacted, and set off to track down the foreign-accented dark woman of mystery.

For what the post office counter clerk had noticed was that the envelopes bore light traces of a white powder, and the garments of the mystery woman had showed traces of it too. Obviously it was weapons-grade anthrax.

Only it wasn't anthrax. The mystery woman was theoretical syntactician Barbara Citko. In her first year of teaching, she had neglected a cardinal rule of our profession, which she never forgets today: don't wear black when teaching a class using white chalk.

The white chalk dust got not only on her clothes, but also onto the envelopes she had rushed to the post office straight after class. The contents was not lethal doses of lung-destructive anthrax spores, but job applications.

The police soon got to the bottom of things, and she was not charged with terrorism. Just one more day in the exciting life of a linguist. And the letters were not destroyed in an effort to kill the putative spores; they were ultimately delivered unharmed. Through one of those letters she got a job teaching at Brandeis, and after a year of that she secured a permanent tenure-track position at the great linguistics department of the University of Washington in Seattle, where I had the pleasure of seeing her again when I visited there to give a talk a few days ago. She told this tale over dinner. With its happy ending.

Linguistics in the service of Plane English

No, it's not a typo. This is about the English used by pilots and control  towers -- "Plane English." Geoff Pullum's recent post Linguistics in the service of astrophysics prompted me to describe one way that linguistic service also extends to the field of product liability. Illustrating this is a 1987 Chicago case about a private Lear Jet that crashed seven years earlier in New Orleans, killing the pilot and his four passengers. It's a story about Plane English.

The lengths to which insurance companies will go in order to avoid payment seem to have no end. In this case, the company wasn't satisfied that pilot error caused the crash. Instead, it tried to put the blame on the Garrett Corporation, manufacturer of the airplane's engine. If it wasn't the pilot's fault, what else could have caused him to veer off course as he tried to land? The insurance company came up with the idea that it was a toxic gas called trimethylol propane phosphate (TMPP) that leaked from the engine into the cockpit, disorienting the pilot and impairing his sense of judgment.

One problem with this theory was that TOXLINE and MEDLINE searches showed that little was known about the effects of TMPP on any large, living being, including humans. The only research available at that time was on small animals, such as rabbits, mice, and rats. There was nothing showing that it affects the cerebellum, motor pathways to the brain stem, basal ganglia, or the descending pyramid of the cerebral cortex.

Another problem was that no evidence of TMPP was found at the crash scene so the insurance company had only a theory to work with (sound familiar?). But it reasoned that TMPP is a class of bicyclophosphates that are GABA inhibitors and since GABA inhibition affects speech in diseases like Huntington's Disease, the insurance company theorized that it had the same effect on the pilot's behavior. If they could prove their case, it could shift the blame for the accident from human error to a product liability case against a successful and presumably prosperous company.

Garrett's defense attorneys were then faced with the unusual task of combating a theory rather than physical evidence,which is the usual basis for such lawsuits. Enter the service of linguistics. The only accident evidence available was the air-to-ground communications of the pilot between Milwaukee and New Orleans. The pilot's voice sounded okay to the defense lawyers but to verify their suspicions, they called me to analyze the intermittent tape recorded communications from the time the plane taxied down the runway in Milwaukee, as it passed over the control towers in Chicago, Kansas City, and Memphis, and as it approached its final destination. The defense had to combat one theory -- that there was TMPP in the cabin -- with another theory -- that if such happened, the pilot's speech would give evidence of it.

I analyzed the pilot's syntax, word frequency, speech acts, pause fillers, and other evidence of cooperative conversation, theorizing that if he was being overcome with TMPP, these features would be likely to show it. There is pretty good evidence that this happens when people ingest large amounts of alcohol or drugs. But who knows whether the same thing obtains when TMPP enters the system? Neither side had real proof to back up the theories proposed.

In order to determine if there were aberrations in the pilot's syntax, I first needed to study a number of other pilot-to-tower communications to find out what the normal syntax patterns of Plane English, including variability within the optional and obligatory sytactic slots. I found the following: first comes an optional acknowledgment ("okay," "Roger," etc.), followed by an optional self-identification ("Mitsubishi seven two seven," "Mike Alfa," "Six Golf Hotel," etc.), followed by an optional early closing ("out," "okay," etc.), followed by an obligatory subject ("we," "five thousand feet," etc.), followed by an obligatory predicate ("climbing to five thousand," "ready to go," etc.), and, finally,a second closing slot (used when a subject and predicate occur). I found no aberrations from this formula in any of the pilot's communications over the three tower-contact segments of the flight beginning in Milwaukee, going over the Midwest, and ending in New Orleans. His syntax didn't appear to be confused.

The pilot didn't show any loss of language ability in his use of compound sentences either. They remained constant from the beginning of the flight to the end. Nor did he start using shorter sentences.  His words per utterance remained fairly constant throughout the three segments of the flight, averaging 9, 8.27, and 7.75, respectively.

I also examined the pilot's speech acts. He reported facts, such as his location, altitude, and flight course without confusion or hesitation. Failure to do so could have been interpreted by ground control as erratic behavior but no such complaints occurred. He replied to all of ground control's questions, acknowledged all instructions, repeated all information accurately, and even thanked the control tower once. His most telling speech act, which came as the pilot was trying to land, was to correct the tower's error when it misidentified him (I'll come back to this later).

And what about pause fillers,  those "uh," "um," "er" features that most of us use in daily conversation? Did the pilot use them excessively, perhaps suggesting that he was getting sluggish or beginning to be overcome by toxic fumes? There are two types of pause fillers. One is used when speakers are trying to get the attention of a listener or holding onto their turns of talk. The other type occurs when speakers are uncertain or forgetful about how to say something. The latter type might indicate a decrease in cognitive ability. The pilot used five of these but, interestingly, they all came when he was still on the ground in Milwaukee as he prepared to taxi down the runway. His "hey, listen to me" attention grabbing pause fillers all came at the end of the flight, as the pilot was struggling to set the plane down in a torrential rainstorm.

Now for the most obvious feature. Did the pilot begin to slur his speech, especially his fricative, affricate, and interdental sounds, at some time in the flight? If alcohol or drugs can cause this, couldn't TMPP do the same?  If it could, it didn't.  The pilot had to pronounce words like "Mitsubishi," "seven," "that's," "Kansas," "thousand," "the," "taxi," "its," and "Memphis" throughout his flight. No diminution of speech ability is noticeable here.

Finally, I looked at the pilot's conversational cooperation (relevance, informativeness, sincerity, and clarity) to see if it got worse as the flight progressed. Perhaps it would take an air-traffic specialist to testify about whether any of the pilot's communications were less informative than they should have been, but the taped evidence  shows no discomfort or complains by ground control to any of the pilot's statements during any segment of the flight. Throughout, ground personnel treated the pilot's reports of his readiness, distination, movement, and flight level as though they were relevant, informative, sincere, and clear.

Okay, so why did the plane crash? Oddly enough, this wasn't the focus of the trial. It was only to show that TMPP did or did not affect the pilot's judgment. Other aspects were not considered relevant but I'll go there anyway, because this is Language Log, not a trial, and I can do whatever I want here.

At the very end of the flight, the pilot radioed the local control tower in New Orleans that he was on his approach to land.  Just before this, the radio picked up another aircraft, Six Golf Hotel, whose pilot requested permission to abort his landing because of the heavy rainstorm. After this the transmission went:

Pilot: We're on the approach.

Tower: Six Golf Hotel, you're on the approach now?

Pilot: No, Mitsubishi  seven  two seven Mike Alfa. We're on the approach.

Perhaps realizing the mistake, the tower then gave Mike Alfa a weather report and asked him to report Alger when he passed it. Alger is a specified checkpoint in the landing approach. It's difficult to know what happened next but Mike Alfa's response was, "Okay, Mike Alfa," and we never heard from him again. He had already passed Alger by that time and the tape gives no indication that he had ever reported it.

Fighting the elements, being misidentified by the tower, and having already passed the Alger checkpoint, the pilot was pretty busy trying to figure out what to do. As it turns out, he was far off course and crashed on the shore of Lake Ponchartrain. It would appear that the pilot was, indeed, confused and disoriented, but there seems to be no language evidence  suggesting that this condition was caused by ingesting TMPP. And if the pilot was confused, ground control seemed to be just as confused. Note that it was the pilot who corrected the tower's misidentification, hardly evidence of cognitive impairment.

The insurance company did not prevail in this Plane English  trial.

Dueling stereotypes

People seem remarkably comfortable with inconsistent stereotypes. Not long ago, we commented on a Zits cartoon whose point was that when teen girls hang out, they all talk at once all the time, whereas teen boys hang out together in near-silent isolation. In contrast, the strip for 11/16/2006 is based on the idea that teen boys talk enthusiastically among themselves, but have nothing to say to their parents:

If you think of all this as a collection of hypotheses about how teen boys (or girls) behave, it's pretty incoherent. If you think of it as a collection of hypotheses about how (certain) adults react to teen behavior, though, you can find a pattern: whatever the kids do, it's inappropriate.

Of course, no one wants to read about people (of any age) acting and reacting appropriately. And an easy way to get an audience is to appeal to the greatest peeves of the greatest number.

[Update -- Language Hat writes:

I read that strip too, but didn't at all have your reaction -- to me the point was not that "teen boys talk enthusiastically among themselves, but have nothing to say to their parents," but rather that teen boys refuse to share their exciting adventures with their parents. In other words, it was not about language use but secret-keeping: what happens in dudeland stays in dudeland.

In fact, I interpreted the intended message of the cartoon in exactly the same way that Hat did. But for the purposes of this post, I was focusing not the behavior's attributed motivation, but on the depicted behavior itself. What's shown here is a 15-year-old boy who's chatty with a male friend, in contrast to the group of silent boys in the earlier cartoon.]

Mrs. Olsen gets a D

John Vann sent in this Frazz strip, which ran 11/17/2006:

John's comment: "One might append '... or the Language Log'."

In this case, there's a Wikipedia article, which discusses the situation from many angles -- and provides ammunition for smart-mouthed elementary-schoolers everywhere by giving a long list of exceptions, including "oneiromancies", which breaks the rule twice, once in each direction.

And in fact, this rule (whatever its pedagogical value) performs badly as a predictor of English letter sequences, because of the high frequency of words like "their", "science" and Germanic names like "Einstein" and "Bruckheimer". In two random stories from today's NYT, I count:

  [^c]__ c__

This is a total of 29 right vs. 16 wrong, for a grade of 64 on a scale of 100, or a D.

If we evaluate the performance in the terms usually used in modern AI, machine learning and similar disciplines, we'll get an F-measure of .78. I calculate this by defining the problem as predicting cases in which 'i' precedes 'e'. Then we can re-label the table in terms of predicted and observed positive ('i' before 'e') and negative ('e' before 'i') instances:


Then the "precision" of the test (otherwise known as "positive predictive value") can be calculated as the number of true positives divided by the sum of true positives and false positives, which here is 29/(29+11) = 0.725. This is the proportion of the time that the rule is correct when it predicts a positive outcome, i.e. that 'i' precedes 'e'.

And the "recall" of the test (otherwise known as "sensitivity") is the number of true positives divided by the sum of true positives and false negatives, here 29/(29+5) ≅ 0.85. This is the proportion of the observed positive outcomes (i.e. where 'i' precedes 'e') that is predicted by the rule.

We usually take the harmonic mean of these two figures in order to get a combined score known as the "F-measure", which here is 2*.85*.725/(.85+.725) ≅ 0.78.

This might look a bit better than the elementary-school grade of D -- after all, you'll find plenty of machine-learning papers, in the best journals and conferences, with F-measures in the upper 70s. However, the referees don't let these papers get by without comparing their performance to the obvious trivial baselines, such as predicting the commonest outcome all the time. In this case, that amounts to the rule 'i' before 'e' no matter what -- and this rule actually works quite a bit better:


Now we get precision of 34/(34+11) ≅ 0.76, and recall of 34/(34+0) = 1.0, for an F-measure of 2*.76*1.0/(.76+1.0) ≅ 0.86.

In terms of elementary-school grading, that would be 100*34/(34+11) ≅ 76 -- a solid C.

So Mrs. Olsen's rule, however hallowed by tradition, is empirically pathetic.

[Update: a reader wrote to complain that two random news stories is not a very big sample. So I wrote a little program to calculate the numbers for a random month of NYT newswire, from 2001 (a total of about 8.7 million words):

  [^c]__ c__

Predicting i before e except after c gives us:


for precision = 110,430/(110,430+53,241) ≅ 0.67, and recall 110,430/(110,430+7,405) ≅ 0.94.

The F-measure is then 2*0.67*0.94/(0.67+0.94) ≅ 0.78.

So the precision was lower, the recall was higher, but the F-measure was the same.

And the grade? (110,430+3,640) = 114,070 right, (53,241+7,405) = 60,646 wrong, for a grade-school grade of: 100*114,070/(114,070+60,646) ≅ 65% correct. Again, a D.

And the alternative rule i before e no matter what?


Now we get precision of about 0.67, and recall of 1.0, for an F-measure of 0.81. Not as good as before, but still better than the conventional rule. The "grade" of 117,835 right, 56,881 wrong, or about 67% correct, is also a bit better than the grade of the conventional rule.

Of course, any bright fourth-grader ought to be able to work out a simple rule that works a lot better than either of these. (Hint: supplement the default order with a list of the N commonest exceptions...)]

[Mark Baker raises a different point:

I've always heard the rule as "I before E except after C, when the sound's E"; I didn't think anyone had ever suggested that the rule might apply to things that don't have an E sound until I saw people discussing it online. There are still lots of exceptions, but many less: does this now outperform your alternative "I before E always" rule?

The wikipedia article (which I linked to above) offers two augmented versions, identifying Mark's as "British":

An augmented American version is:

i before e
except after c
or when sounding like a
as in neighbor and weigh

which excludes many of the exceptions but still fails to correctly handle many others.

A lesser known addendum in America is: Neither financier seized either species of weird leisure.

A British version is:

when the sound is ee
it's i before e
except after c

which excludes most exceptions, as well as excluding some words (e.g. friend) which are correctly handled by the American version. The most frequent everyday failures of the British form of the rule are seize, caffeine, protein and, for those who pronounce the initial vowel sound ee, either and neither.

Obviously the expanded versions are going to work better. However, since they're dependent on the alignment between spelling and prounuciation, it's going to be harder to score them. And since they make predictions about different sets of cases, using different numbers of clauses of differing complexity and generality, it's not easy to compare their scores.

In any case, the point about the Mrs. Olsen's of the world is that they promulgate such rules not because they accurately describe the facts, but because of some long-ago assertion felt to have been authoritative.]

[And in a comment over at Pharyngula, Oolon Colluphid posted this:

"I" Before "E" Except After "C"
by Duncan McKenzie

It's a rule that is simple, concise and efficeint.
For all speceis of spelling it's more than sufficeint.
Against words wild and wierd, it's one law that shines bright
Blazing out like a beacon upon a great hieght,

It gives guidance impartial, sceintific and fair
In this language, this tongue to which we are all hier.
'Gainst the glaceirs of ignorance that icily frown,
This great precept gives warmth, like a thick iederdown.

Now, a few in soceity choose to deride,
To cast DOUBT on this anceint and venerable guide;
They unwittingly follow a foriegn agenda,
A plot hatched, I am sure, in some vile haceinda.

In our work and our liesure, our homes and our schools,
Let us follow our consceince, sieze proudly our rules!
Will I dilute my standards, make them vaguer and blither?
I say NO, I will not! I trust you will not iether.


[And this from Stephen Jones:

The wikipedia article is American, and thus biased against the British rule:
'i before 'e'
except after 'c'
when the sound is 'ee'.

which actually works for all but a handful of words ('seize' and 'protein' and 'Sheila' are the only ones I can find doing a search of the SOED).

Now, Wikipedia suggests that British pronunciation of 'sheikh' and 'either' is the result of applying the spelling rule to pronunciation. I am most dubious of this. It seems much more likely to me that the rule was imported to the USA from the UK, altered because of differences in American pronunciation, but, like most prescriptive rules never discarded.

For my pronunciation it is the most effective spelling rule I know.


Linguistics in the service of astrophysics

I happened to recall today, while socializing with some astronomer friends, that some time around fifteen years ago a couple of UC Santa Cruz astrophysicists came to me in my capacity as a linguist and asked me if I thought I could coin for them a one-word term to denote the ratio between rate of rotation of a cloud of objects rotating in 3-dimensional space such as an accretion disk (this quantity is known as vorticity) and the average number of objects visible in a unit area of the 2-dimensional outer surface of the 3-dimensional cloud (known as the surface density). I thought about it overnight, and soon got back to them with my suggestion: vortensity. The term was promptly used in a scientific paper, with a footnote credit for linguistic assistance, and it caught on. I was pleased to note just now that my term gets over 230 Google hits. Not bad for a technical word used in such a rarefied discipline.

I did that piece of terminology coinage pro bono. There are companies (Lexis Branding, for example) that charge fees in the thousands or tens of thousands of dollars when doing similar work for corporations. I work cheap for my friends and colleagues at UCSC's Astronomy and Astrophysics department. The permanent place in the scientific literature is its own reward. But in addition it's a nice feeling to know that I have a cast-iron response ready for any arrogant physical scientist (in the unlikely event that an arrogant physical scientist should ever come along; I know it's implausible) who might dare suggest that linguistics has no role to play in serious science. It'll be one of those finger- in- the- chest "oh- yeah- lemme- tell- you- something- pal" moments, won't it?

Yet Another Fieldwork Sci Fi Book

Going through my books I came across another work of science fiction that involves linguistic fieldwork that might be added to our previous lists. It is The Color of Distance by Amy Thomson (ISBN 0-441-00632-9). It is also one of the best discussions of contact with a radically different intelligent species that I have read.

Quintuple quote embedding

More on recursive quotation embedding: David J. Swift of Jackson, Wyoming, writes to point out a quintuply embedded quote in a story by Garrison Keillor. The sentence ends in a wonderful (and perfectly grammatical) string of seven successive punctuation marks (! ” ’ ” ? ’ ”). Here's the whole paragraph:

“TEEN LEADERS VOW ANTI-ROCK DRIVE, AIM SMUT BAN IN AREA,” the Gazette reported the following morning. “Longtime youth worker Diane Goodrich enjoys having as much fun as the next person [the story went on], but Monday night, watching a local rock band rip into a live chicken with their teeth at the 4-H Poultry Show dance, she decided it was time to call ‘foul.’ Evidently, more than a few people agree with her. Last night, at a meeting in the high-school auditorium attended on a word-of-mouth basis by literally dozen of parents, not to mention civic leaders and youth advisers, she spoke for the conscience of the community when she said, ‘Have we become so tolerant of deviant behavior, so sympathetic toward the sick in our society, that, in the words of Bertram Follette, “we have lost the capacity to say, ‘this is not “far out.” You have simply gone too far. Now we say “No!” ’ ”?’ ”

Keillor may have constructed this with malice aforethought, but it's really quite natural-sounding, and basically understandable.

[Update: In the first version of this post the paragraph above was presented with a mistake: the left quotation mark before Have we become was double. The first version of this update wrongly said the mistake was there in the original book. Not so. Rechecking the paragraph from a scan provided by David Swift reveals that the book was correct, and has the quote marks exactly as above. Thanks to Sridhar Ramesh in Berkeley for the first email pointing out that the quotes didn't match up before (because the left ones didn't alternate in type). This forced me to do a recount and get it right. Gratitude and apologies as necessary.]

A number of people have written to bring to my attention a story that doesn't quite amount to natural use of the language. It's called "Menelaiad", and it appears in John Barth's Lost in the Funhouse, which goes into seven nestings of quotation marks. As Jeff Binder explains in an email:

It starts out with a frame-tale, in which the main character is telling a story to some of his friends, but the story he tells turns out to be another frame tale, and so forth, until we get monstrosities such as this:

“ ‘ “ ‘ “ ‘Why?’ I repeated, ” I repeated, ’ I repeated, ” I repeated, ’ I repeated, ” I repeat.

This story is obviously very self-conscious about what it's doing, so I don't know if you would consider this use to be "in the wild," but it does show recursion taken to an extreme.

It does indeed look deliberate and artificial to me --- a literary experiment, to be classed with experiments like Italo Calvino's If on a Winter's Night a Traveler rather than an ordinary piece of literature; and that lessens its interest a bit. But it must technically be regarded as entirely grammatical, given the assumption that written English allows recursive use of quotation marks to arbitrary depth, and the rule that you alternate quotation-mark types.

I can't even spell "linguification"

Over at Watch Me Sleep, my good friend Ed Keer decided to make public our little disagreement this past weekend over whether the snowclone "can't even spell X" is an example of linguification. Ed's characterization of the argument makes it seem as if my position was just "Because Geoff Pullum said so!", and so I feel I must clarify.

(In fact, I had had enough to drink by that point that I couldn't remember anything that Geoff ever wrote except "Turkish exhibits a classic counterbleeding relationship between epenthesis and deletion" -- which, of course, is just plain wrong. But that's neither here nor there.)

Also, I want to add some more examples to the linguification mix.

Ed's position, in his own words, is that "the 'Can't even spell X' snowclone is a comment on the extent of a person's knowledge about a particular subject, so it does not qualify as a linguification". My position is that Ed's conclusion doesn't follow from his (inarguably correct) premise. In fact, most if not all linguifications that Geoff has discussed here on Language Log are (meant to be) comments on the extent of something; Geoff's point is that linguifications make rather curious comments of that sort, in particular because they fail as good examples of metaphor or hyperbole.

Now that I'm sober, I know better than to try to reconstruct Geoff's arguments from scratch; I'll start by quoting what Geoff writes about the distinction between linguification on the one hand and metaphor or hyperbole on the other hand.

Pullum on linguification

To linguify a claim about things in the world is to take that claim and construct from it an entirely different claim that makes reference to the words or other linguistic items used to talk about those things, and then use the latter claim in a context where the former would be appropriate.

Pullum on metaphor (a.o.t. linguification)

[E]ven if some linguifications can indeed be said to fall within the domain of metaphorical usage, that misses my point. In general, for most kinds of metaphors, it is easy to understand why people use them. They get the point across briefly and vividly. To say that the new office manager is a pussycat establishes instantly just as a good caricature might that the man's general demeanor and behavior suggests a cute, cuddly, playful, non-serious, easy-to-deal-with, tractable, non-fearsome nature that otherwise might take a considerable amount of time-wasting careful description to get across.

But I simply do not understand why people use linguification. If it gets the point across at all, it does it only indirectly and clumsily: we have to infer from [a] statement about [e.g.] word distributions, usually one that is false, some underlying statement that is only very imperfectly connected to it.

Pullum on hyperbole (a.o.t. linguification)

Hyperbole takes a claim and exaggerates it, so that if the hyperbolic version were true, the original claim would be true a fortiori. I do know about humorous uses of hyperbole. I believe I used it in the first line of this post ["About a million people have written to me ..."]; wouldn't you say so? Take that as an example. My underlying claim is that lots of people wrote to me. If my exaggeration in calling it millions were really true, the underlying claim would be all the truer. [...]

As I have tried to explain, patiently but fruitlessly, [...] Gilbert's figure of speech is very different. Take his underlying (broadly true) claim that parents don't get to go to the theater much after the kids are born. If he had said that parents forget what going to the theater is like, that would be hyperbole (they don't forget, it just becomes a tiny bit unfamiliar as far as recent experience is concerned). And if the hyperbolic claim were true if they completely forgot what happens in theaters then the underlying claim would be all the truer. However, he says instead that parents actually forget how to pronounce words like "theater". If they did, that would not make the underlying claim true. The loss of this snippet of pronunciation information would not mean that they had forgotten their experiences of theater. Nor would it mean they couldn't go: they get in a taxi, take it to Broadway, and point; or they could tell people they wanted to go to the big building downtown with the lights and the curtain and the actors.

Now take one of Ed's examples: "Accountability? They Can't Even Spell it!" The idea is clear: the "they" referred to here are not (held) accountable for something. But it's not hyperbole: if it were true that "they" could not spell "accountability", it wouldn't be all the truer that they aren't (held) accountable for something. It's also not a good example of a metaphor, unless you make the very improbable assumption that the first (or somehow primary) step in being (held) accountable is to be able to spell "accountability". (This could also be the first stop in not being (held) accountable, though, so the metaphor fails anyway.)

Ed closes his post with "You see what I'm sayin? Or do I have to spell it out for you?" Now in this case, I think "spell it out for X" is a great metaphor. Typically, you spell a word out for someone when you know or suspect that they don't know how to spell it; likewise, you painstakingly explain something to someone when you know or suspect that they don't understand it. It's a good metaphor, then, to say that painstaking explanation of something is like spelling out a word. Nothing linguificational about it, really.

Another example came up yesterday while I was talking to some friends. One of them was noting that he didn't have any cash, and I remembered that he had just spent some money at Guitar Center (a.k.a. "the Evil Empire" -- no link for you!), so I added: "... because you've just been to Guitar Center." Another friend said: "Those two statements often follow each other", by which I assume he meant that the Guitar-Center statement often follows the lack-of-cash statement. That's another example of linguification, similar to the kind Geoff describes here.

Finally, I'd like to suggest that my puzzlement over the "that's why they call it X" snowclone (follow-up here) is because this is also an example of linguification, though perhaps of a different kind than those that Geoff has identified. Certainly the number of comments and e-mail responses I've gotten to those posts (see also here), with completely unhelpful analyses of and excuses for this puzzling snowclone, are very much like the millions of unhelpful responses that Geoff has gotten to his various attempts to explain the difference between linguification and metaphor/hyperbole.

And so, on that hopeful note ...

"Their their," I needed to hear them say

A note from Sarah Bagby:

I'm grateful for your Language Log posts on the spelling nonsense in Scotland and New Zealand, not least because they've spurred me to read once again the George Starbuck poem appended below. It's harder to type than you might think, but I hope you'll agree it's worth the trouble.

I do agree. Sarah is responding to "Plain spelling", 11/3/2006; "Partial credit for 'pigeon English': not new in New Zealand", 11/10/2006; "Alarming decline in literacy among publicists and journalists", 11/12/2006; "When life is funnier than the funnies", 11/14/2006; and "Wanna: neither slang nor language murder", 11/14/2006. And the poem, as supplied by Sarah, is below the fold.

George Starbuck

(a poem to be inscribed in dark places and never to be spoken aloud)

My favorite student lately is the one who wrote about feeling clumbsy.
I mean if he wanted to say how it feels to be all thumbs he
Certainly picked the write language to right in in the first place.
I mean better to clutter a word up like the old Hearst place
Than to just walk off the job and not give a dam.

Another student gave me a diagragm.
"The Diagragm of the Plot in Henry the VIIIth."

Those, though, were instances of the sublime.
The wonder is in the wonders they can come up with every time.

Why do they all say heighth, but never weighth?
If chrystal can look like English to them, how come chryptic can't?
I guess cwm, chthonic, qanat, or quattrocento
Always gets looked up. But never momento.
Momento they know. Like wierd. Like differant.
It is a part of their deep deep-structure vocabulary:
Their stone axe, their dark bent-offering to the gods:
Their protoCro-Magnon pre-pre-sapient survival-against-cultural-odds.

You won't get me deputized in some Spelling Constabulary.
I'd sooner abandon the bag-toke-whiff system and go decimal.
I'm on their side. I better be, after my brush with "infinitessimal."

There it was, right where I put it, in my brand-new book.

And my friend Peter Davison read it, and he gave me this look,
And he held the look for a little while and said, "George..."

I needed my students at that moment. I, their Scourge.
I needed them. Needed their sympathy. Needed their care.
"Their their," I needed to hear them say, "their their."

You see, there are Spellers in this world, I mean mean ones too.
They shadow us around like a posse of Joe Btfsplks
Waiting for us to sit down at our study-desks and go shrdlu
So they can pop in at the windows saying "tsk tsk."

I know they're there. I know where the beggars are,
With their flash cards looking like prescriptions for the catarrh
And their mnemnmonics, blast 'em. They go too farrh.
I do not stoop to impugn, indict, or condemn;
But I know how to get back at the likes of thegm.

For a long time, I keep mumb.
I let 'em wait, while a preternatural calmn
Rises to me from the depths of my upwardly opened palmb.
Then I raise my eyes like some wizened-and-wisened gnolmbn,
Stranger to scissors, stranger to razor and coslmbn,
And I fix those birds with my gaze till my gaze strikes hoslgmbn,
And I say one word, and the word that I say is "Oslgmbnh."

"Om?" they inquire. "No, not exactly. Oslgmbnh.
Watch me carefully while I pronounce it because you've only got two more guesses
And you only get one more hint: there's an odd number of esses,
And you only get ten more seconds no nine more seconds no eight
And a wrong answer bumps you out of the losers' bracket
And disqualifies you for the National Spellathon Contestant jacket
And that's all the time extension you're going to gebt
So go pick up your consolation prizes from the usherebt
And don't be surprised if it's the bowdlerized regularized paperback abridgment of Pepys
Because around here, gentlemen, we play for kepys."

Then I drive off in my chauffeured Cadillac Fleetwood Brougham
Like something out of the last days of Fellini's Rougham
And leave them smiting their brows and exclaiming to each other "Ougham!
O-U-G-H-A-M Ougham!" and tearing their hair.

Intricate are the compoundments of despair.

Well, brevity must be the soul of something-or-other.

Not, certainly, of spelling, in the good old mother
Tongue of Shakespeare, Raleigh, Marvell, and Vaughan.
But something. One finds out as one goes aughan.

Ear accidents

I'm in Manchester for a couple of days, serving as one of the external moderators for an "externally moderated reflective self-examination" at the National Center for Text Mining (NaCTeM). Although the formal process doesn't start until tomorrow morning, I spent this afternoon learning about some of the technical work in and around the center. One stop on the tour was Bill Black's office, where he and Jock McNaught told me about the CAFETIERE information-extraction system. Jock was describing his experience in supervising undergraduate students to adapt this system to new topic areas, and one of the examples involved creating a database of information drawn from news stories about ear accidents.

As I tried to assimilate his description into my gradually-developing image of how the system works and what it would like to adapt it to a new domain, it came to me that I had stumbled on an interesting cultural difference between the U.S. and the U.K. I don't believe that I've ever seen a news story in a U.S. publication about an ear accident. No doubt such accidents do occur -- piercings gone wrong, and the like -- but in my experience, they don't make the news. Could this be an aspect of pub culture that has so far escaped my notice; or an unanticipated side effect of playing cricket? I looked puzzled, I guess, because Jock repeated the phrase, "news reports about ear accidents". "Ear accidents", I echoed meditatively, tugging on my earlobe.

Well, of course, Jock was talking about "air accidents".

[Update -- Simon Cauchi writes:

I wonder if Jock McNaught is by any chance a New Zealander.

See Margaret Betterham's article, "The apparent merger of the front centring diphthongs — EAR and AIR — in New Zealand English", in New Zealand English, edited by Allan Bell & Konraad Kuiper (Victoria University Press, 2000), pp. 111-145.

No, Jock is a Scot.

And I don't think he merges ear and air (though I didn't check). Rather, I believe that he pronounces air as [er], with a vowel perhaps on the high side of [e] and a trilled [r], whereas I'm used to hearing something closer to [ɛɹ] or perhap [ɛeɹ] , with a much lower nucleus.


Bill Gates's new TA

No, not teaching assistant. At Microsoft, "Bill's TA" means "technical assistant", the person who helps Bill Gates respond to requests for technical feedback and guidance across the company.

According to this entry on the Microsoft Office Natural Language Team Blog, the new person in this job is Joshua Goodman. As folks who do natural language processing know, Joshua's got a strong track record in statistical NLP (ranging from n-gram language models to parsing) and he's been a leader in organizing anti-spam efforts in the research community.

I'll echo what Mari Broman Olsen writes on the blog: "Pretty cool to have someone who knows about natural language work in that role!" Way to go, Joshua.

When stereotypes hang out

Here's a striking example of popular ideas about laconic guys and gabby girls, sent to me by Arnold Zwicky:

I don't know any evidence, one way or another, about the relative talkativeness of American male and female high-school students in hanging-out situations like those in the strip. But as discussed in a number of earlier posts, whenever people have measured sex differences in amount of talking in a variety of other settings, the result has generally been no difference, or a small difference in favor of more talk from males:

"Sex-linked lexical budgets", 8/6/2006
"Yet another sex-n-wordcount sighting", 8/14/2006
"The vast arctic tundra of the male brain", 9/6/2006
"Gabby guys: the effect size", 9/23/2006
"Sex on the brain" (Boston Globe, 9/24/2006)
"Secrets of the BBC sexes", 9/29/2006
"Guys are a bit gabbier in Dutch, too" 10/16/2006
" Word counts" 11/28/2006

It's fair to observe that all the cited measurements have been made in contexts where talk is expected. There might be a sex difference, at least for some groups, in whether or not talk is expected when hanging out. And then again, there might not be.

Given all the quantitative social science out there, you'd like that someone would have measured this. It wouldn't be a very hard kind of experiment to do, except for the generic problem of the Observer's Paradox. If you know of any relevant research, please tell me.

If there isn't really a sex difference in talkativeness, why would so many people think that one exists? Well, once a stereotype is well established, confirmation bias kicks in. And maybe some people think that women are more talkative because they wish that certain women would say less; and maybe some people think that men are less talkative because sometimes they wish that certain men would say more. Or something like that. In any case, it's suspicious that there are apparently no actual word or talk-time counts that confirm the stereotype.

[Update -- Jim Roberts provides an anecdote, which is consistent with my own observations of groups of young men playing video games together:

I am a white American (well, Canadian, but I’ve lived in the States for eleven years) male and, consequently, have had occasion to gather together with a group of like-minded men for many an evening of video gaming. And, good God, are we loud. Several times the guys have been in one room playing games while the women are gathered in another room . . . doing whatever it is they do. Talking, I think, and perhaps baking or knitting. Too busy gaming to really notice, frankly. And those times when we’ve been separated by gender, at least once in the course of festivities a representative from the women is sent forth to the men to tell us to shut our collective traps before the neighbours call the cops. It’s been this way since my early teens. Men are possibly at their loudest and most vocal when playing video games, in my estimation only outstripping themselves when watching a sporting event.

And Theo Vosse writes:

In a political debate (there are elections next week) between what the British would call back-benchers, the speech rate for men was quoted as being 7,000 versus 23,000 for women. The reporter in question considered his statement well researched and hence this piece of knowledge could be used to discuss the role of women in politics. The consequence seemed to be that women were more factual and placed a larger weight on arguments than men. Thus the country would be better off with more women in politics.

A little knowledge is a dangerous thing...

"A little misinformation" would be closer to the mark, in this case. (And I presume that Theo's source means "speech rate" to be denominated in words per day, no doubt estimated by the standard Eskimo snow-words technique, otherwise known as "making up numbers".)

Anyhow, I've gotten no pointers to any systematic factual comparisons of talkativeness in "hanging out" situations, as yet. ]

[Eleanor Wroblewski writes:

Well, I'm certainly a gabby girl (and just a year older than Jeremy Bucket), and with some of my female friends there's definitely a replication of the whole "everyone-talk-at-once" thing. And there are male friends who wish I would shut up sometimes, and male friends I wish would talk more sometimes. However, in my social group, there are definitely quieter people and louder people of both genders, and although admittedly I have not witnessed male-only social interactions often, I have reason to believe that the differences in discourse between groups of different gender compositions, if you ignore specific individuals, probably has a lot more to do with the "texture" of conversation, rather than the wordcount.

Well, if "texture" means how much overlapping talk there is, then that should come out in the collective wordcount as well, though I agree that it would be better to measure it more directly, in terms of the distribution of number of simultaneous talkers at regularly-sampled (or randomly-sampled) time points. Eleanor has raised the hypothesis (if I can put words in her mouth) that a group of her female friends might show higher counts of two (or three or four) people talking at once, compared to a group of their male counterparts. The measurement is easy enough to make, though testing the hypothesis is made harder by the fact that different groups and different circumstances doubtless vary widely.


[But Eleanor replies:

Well, you did sort of put words in my mouth; I'm not sure exactly how to describe "texture", but I think it's a combination of who says what when, and how it's discussed, and the same things being talked about in different ways. The classic is the fixation on emotions for girls, but really I think that's just me being stereotypical . . . Also conversational topics to a certain extent. But really a lot of it varies on individuals and I just happen to fit the girl stereotype pretty neatly except I talk about boys with physics metaphors (e.g. "delta boy since August until now is perilously close to zero") and say things like, "Well, I bet you don't know what the capital of Azerbaijan is . . . Oh burn," and sometimes will make an absolutely opaque grammar joke. But, you know, I talk all the time and about things like food and romance and makeup, so I'm like a stereotype personified.

OK, point taken. I guess you could interpret the Zits strip as an imaginative (i.e. false) reconstruction of such "texture" differences in terms of other stereotypes. Meanwhile, Rita Rouvalis Chapman wrote in to describe in verse her recent impressions of the reality, from a different perspective:

I'm a high school (English) teacher.

Do 16-year-old boys ever shut up?
Oh no, they do not.

They will talk if I pout.
They will talk if I shout.

Shall I give them a detention?
They do not think this even worth a mention.

They will quote the latest movie.
They will bust the latest Jay-Z.

Should I put them in the hall?
No, out there they have a ball!

They don't care if it's a test.
That's the time they like to whisper best.

Are 16-year-old boys ever quiet?
I'm afraid I find the suggestion quite the riot.


November 15, 2006

No longer the subject of that mighty verb

Douglas Davidson submits to the linguification desk a sentence from Simon Jenkins in The Guardian, an utterly baffling piece of linguifying that ends by saying of President Bush and Prime Minister Blair that

They are no longer the subject of that mighty verb, only its painful object.

I not only have no idea what he means, I am inclined to doubt that even he has any idea what he means. Here it is again with a paragraph of context:

Bush and Blair are men in a hurry, and such men lose wars. If there is a game plan in Tehran it will be to play Iraq long. Why stop the Great Satan when he is driving himself to hell in a handcart? If London and Washington really want help in this part of the world they must start from diplomatic ground zero. They will have to stop the holier-than-thou name-calling and the pretence that they hold any cards. They will have to realise that this war has lost them all leverage in the region. They can insult and sanction and threaten. But there is nothing left for them to "do" but leave. They are no longer the subject of that mighty verb, only its painful object.

To start with, which "mighty verb", and why is it mighty? Does he mean the mighty verb leave? Because making Bush and Blair the object of that verb would give us sentences like this:

The journalists decided to leave Bush and Blair, and went off to look for Britney Spears.

I can't make anything sensible out of the idea that Bush and Blair have switched from being in a position to be described by true sentences in which Bush and Blair would be the subject (as in If Bush and Blair leave Iraq, lots of people will be happy) to being a position to be described by true sentences in which Bush and Blair serves as direct object.

Could Jenkins have meant the verb do, which he oddly puts in greengrocer quotation marks (WE HAVE "FRESH" TOMATOES!), as if they constituted a way of showing emphasis? Same problem. Doing Bush and Blair (in any sense) just doesn't seem to be what he's talking about.

And why "its painful object"? In what sense can a grammatical object be painful? What exactly is painful here? [Don't tell me you think that "do" is intended in the sexual sense, and that's what's painful. Jenkins doesn't mean that; does he? At least one person who has already emailed me thinks that he does: he means Iraq is going to "do" Bush and Blair in the sense of bugger them. I guess what this reminds me of most is the crude remark by John McClane, the cop played by Bruce Willis in the 1988 movie Die Hard, when he points out to the recently humiliated police chief, "You're the one who just got butt-fucked on nationwide TV." Could Simon Jenkins really have that in mind? It doesn't seem plausible to me.]

Simon Cauchi has pointed out to me what is probably the right answer (I am modifying this from the earlier version of the post). He points to an earlier sentence (with another linguification I hadn't noticed!). I missed it because it is an astonishing nine paragraphs back from what I quoted above; but it is the key. Jenkins says that in Iraq:

It is total anarchy. All sentences beginning, "What we should now do in Iraq ... " are devoid of meaning. We are in no position to do anything. We have no potency; that is the definition of anarchy.

So here is what is going on. The "mighty verb" is do. And Jenkins is equating "subject" with "person who does things and is in control of actions", and "object" with "person or thing that gets things done to it and is not in control of actions", and "verb" with "action". He means Bush and Blair (surrogates for the whole of the West) are not in control, and will soon not be able to decide to take the action of doing something to get out of Iraq; rather, they will have something done to them — they will be pushed out of Iraq by the actions of others.

In grammar, "subject" doesn't mean "actor", and "object" doesn't mean "undergoer", and "verb" doesn't mean "action". Those confusions, much beloved of traditional pedagogical grammar, bedevil serious attempts to teach syntax. Jenkins' double effort at linguifying here has stuck him with a sentence that is hard to make any sense of at first. It's one of the most ill-judged attempts at effective writing I've seen in quite a while.

It all makes me think that perhaps Simon Jenkins needs a little rest from writing columns to deadline. (Jenkins was also responsible for some of the "entertaining foolishness" concerning the recent great spelling brouhaha.)

If it was good enough for King Alfred the Great...

Do you own a copy of Merriam-Webster's Concise Dictionary of English Usage? If not, go immediately to your favorite bookseller and buy one. Believe me, it'll be the best $13.22 (or even $16.95, if you pay list price) that you've spent in a while. Geoff Pullum recommended it last year ("Don't put up with usage abuse", 1/15/2005), in response to a reader's question about what references or authorities to trust with respect to style and usage. Geoff used blurb-worthy phrases like "the best usage book I know of" and "this book ... is utterly wonderful", and I agree with him.

Why am I plugging this book today? Because it provides a perfect answer to a note from a reader about the use of less and fewer.

Matt Cockerill send in a link to an article in the Guardian (John Mullan, "M&S: the pedant's store", 10/6/2006). Apparently a customer complained about apostrophe placement ("I do not care to dress my child in a top containing a glaring grammatical giraffe gaffe"), and after an appeal to their "childrenswear technologist", M&S withdrew the offending item from their stores, apologized, and sent a refund. Matt focused on the article's passing mention of an earlier M&S capitulation to customers' grammatical prejudices:

M&S, of course, likes to project a classy image and this confession of grievous fault rather neatly confirms it as the favoured shop of those with high standards, in grammar as in everything else. A few years ago it changed its "6 items or less" checkout signs for replacement signs declaring, more correctly, "6 items or fewer", reportedly after customers had grumbled.

Matt observed that this "crops up as a standard example of the shoddy grammar of our modern age in newspaper articles here all the time", and registered a counter-grumble:

... the weird thing is, I've never once seen anyone point out that there's nothing grammatically wrong with '5 items or less', and in fact it's much more natural and less stilted sounding than '5 items or fewer'.

The key, as far as I'm concerned, is to realize that it's quite valid to think of '5 items or less' to imply an ellipsis:
"5 items or less... [than that amount of shopping]"

and in that it's no different from any number of standard grammatical usages which make use of ellipsis.

I'm also always tempted to ask whether they would replace the sign outside a kids playground to indicate that it may be used only by children who are "5 years old or fewer"...

Matt's grammatical instincts are exactly right. He's also correct in observing that with ages -- and in certain other cases of countables as well, which MWCDEU summarizes as "distances, sums of money, units of time, and statistical enumerations" -- less is generally preferred to fewer. And Matt's observation about a possible construal of "5 items or less" also seem valid to me, although I think it's a secondary point. The primary point is that the now-standard pedantry about less/fewer is in fact one of the many false "rules" that have recently precipitated out of the over-saturated solution of linguistic ignorance where most usage advice is brewed.

But not the usage advice at MWCDEU. This is the start of its entry on less/fewer:

Here is the rule as it is usually encountered: fewer refers to number among things that are counted, and less refers to quantity or amount among things that are measured. This rule is simple enough and easy enough to follow. It has only one fault -- it is not accurate for all usage. If we were to write the rule from the observation of actual usage, it would be the same for fewer: fewer does refer to number among things that are counted. However, it would be different for less: less refers to quantity or amount among things that are measured and to number among things that are counted. Our amended rule describes the actual usage of the past thousand years or so.

As far as we have been able to discover, the received rule originated in 1770 as a comment on less:

This Word is most commonly used in speaking of a Number; where I shoudl think Fewer would do better. No Fewer than a Hundred appears to me not only more elegant than No less than a Hundred, but strictly proper. --Baker 1770

Baker's remarks about fewer express clearly and modestly -- "I should think," "appears to me" -- his own taste and preference. [...]

How Baker's opinion came to be an inviolable rule, we do not know. But we do know that many people believe it is such. Simon 1980, for instance, calls the "less than 50,000 words" he found in a book about Joseph Conrad a "whopping" error.

The OED shows that less has been used of countables since the time of King Alfred the Great -- he used it that way in one of his own translations from Latin -- more than a thousand years ago (in about 888). So essentially less has been used of countables in English for just about as long as there has been a written English language. After about 900 years Robert Baker opined that fewer might be more elegant and proper. Almost every usage writer since Baker has followed Baker's lead, and generations of English teachers have swelled the chorus. The result seems to be a fairly large number of people who now believe less used of countables to be wrong, though its standardness is easily demonstrated.

MWCDEU then gives a couple of pages of illustrative example in both directions, dealing especially with the "common constructions" with countables where less continues to be used more often than fewer "in present-day written usage". The concluding advice:

If you are a native speaker, your use of less and fewer can reliably be guided by your ear. If you are not a native speaker, you will find that the simple rule with which we started is a safe guide, except for the constructions for which we have shown less to be preferred.

I've scanned the whole less/fewer entry, and made it available here. Now validate my stretching the boundaries of fair use, and go buy the book!

I'm going to add a couple of observations based on web searches. First, Google News validates MWCDEU's observation about the difference between countables in general -- where journalists and their editors prefer fewer about 2-to-1 -- and things like units of time and amounts of money, where less is preferred by a whopping margin. Here are counts for a few different countables, in the "less/fewer than N items" construction:

votes people players pages hours minutes seconds dollars
less than N __
fewer than N __

And the same tendencies can be seen on the web in general, except that the ratios are generally shifted in the direction of less. No doubt this is due to the effects of copy-editing on the Google News sample.

votes people players pages hours minutes seconds dollars
less than N __ 272,000 2,570,000 183,000 696,000 9,110,000 13,400,000 3,840,000 2,220,000
fewer than N __ 182,000 837,000 61,500 112,000 552,000 92,200 22,300 921
less/fewer 1.50 3.07 2.98 6.2 16.5 145.3 172 2,410

But interestingly, in that "N items or less/fewer" construction, the less/fewer ratios generally shift away from fewer and towards less. At least, that's clearly true for the cases of countables where fewer is reasonably common to start with. Here are the counts from Google News:

votes people players pages hours minutes seconds dollars
N __ or less
N __ or fewer

And here are counts from the web at large:

votes people players pages hours minutes seconds dollars
N __ or less 14,200 134,000 13,200 215,000 1,890,000 2,100,000 944,000 248,000
N __ or fewer 495 30,600 1,570 19,500 24,200 757 6,760 125
less/fewer 28.7 4.38 8.41 11.0 78.1 2,774 140 1,984

I'm not sure whether Matt's ellipsis theory is the reason for this shift, though it's a reasonable possibility: when someone writes or says "1,000 votes or less", they may well mean "1,000 votes or less of a margin than that", rather than "1,000 votes or less votes than that". But King Alfred says that they'd be OK either way, and so do most other English writers in the millennium between his time and ours.

[A small pedantic confession: King Alfred had the genitive case at his disposal, and so his use of less with a count noun is actually a partitive construction of a type that we can't copy idiomatically in modern English -- "less of words":

c888 K. ÆLFRED Boeth. xxxv. §5 [6] Swa mid læs worda swa mid ma, swæðer we hit ȝereccan maȝon.  ("whether we may prove it with less words or with more")

But still. And I'll make up for doubting Alfred's modern relevance by filling in one data point from the intervening centuries -- a footnote by Alexander Pope, to book XIV, verse 291 of his translation of the Iliad:

But whoever considers his Circumstances will judge after another manner. Priam, after having been the most wealthy, most powerful and formidable Monarch of Asia, becomes all at once the most miserable of Men; He loses in less than eight Days the best of his Army, and a great Number of virtuous Sons; he loses the bravest of 'em all, his Glory and his Defence, the gallant Hector.

The use of less with a count of time-units has always been preferred, as MWCDEU observes. But I was surprised by that "bravest of 'em all", in a footnote no less. ]

Posted by Mark Liberman at 08:41 AM

November 14, 2006

Wanna: neither slang nor language murder

The Guardian (11/1/2006) obediently repeats from a press release that the non-standard features that will be permitted in examination answers (as Mark recently noted) range "from the slang ‘wot’ and ‘wanna’, to the short cut ‘CU L8R’". Can't Guardian journalists even look up the meaning of the word "slang" in a dictionary that's free on the workstation on the desk in front of them?

Webster says this about slang:

1 : language peculiar to a particular group: as a : ARGOT b : JARGON 2
2 : an informal nonstandard vocabulary composed typically of coinages, arbitrarily changed words, and extravagant, forced, or facetious figures of speech

The clearest point here concerns wanna, a standardized spelling (constantly used, for example, in representing dialog in novels) for a kind derived amalgam of want and to about which linguists have written reams over the last thirty years (since David Lightfoot suggested it provided crucial evidence for a certain theoretical point in syntax [Linguistic Inquiry 1976] and Paul Postal and I took out after this false claim in a whole series of papers [Linguistic Inquiry 1978, 1979, 1982, 1986] there have been dozens of papers on the topic; two have been in Language, one by me in 1997 and one in the latest 2006 issue by Dick Hudson). This isn't a slang form by any conceivable definition of slang: it isn't peculiar to a particular group, it's familiar for every American speaker (and I think most British-derived speakers too). It isn't non-standard at all; it's part of informal style in Standard English (which is why it's treated in The Cambridge Grammar of the English Language). It isn't a recent coinage; it isn't "arbitrarily changed"; it isn't "extravagant, forced, or facetious". Nobody who knows what slang is could think wanna is slang.

But then I'm not sure that any testing authorities actually gave any of the examples, or characterized them as slang. The Guardian says: "The Scottish Qualifications Authority (SQA) said the use of phrases like "2b r nt 2b" or "i luv u" in exam papers would be allowed as long as candidates showed that they understood the subject." Maybe you truly believe that a staff member from the Scottish Qualifications Authority solemnly told education journalists that "2b r nt 2b" is now an acceptable spelling of the first line of Hamlet's famous soliloquy as far as the testing of knowledge of Shakespeare is concerned, but I don't. I think the journalist tossed that in as a gratuitous illustration. My guess is the SQA did little more than to announced (or remind people of the prior existence of) a policy that says an unconventional spelling will not automatically lead to a student who sees what the answer is being graded the same as one who didn't know. (This seems sensible. Preserving the distinction between students who do know the answer and students who don't is surely rather important educationally. But I suppose that makes me a dangerous linguistic libertine.)

Katie Grant of The Sunday Times (11/5/2006) uses the texting-is-OK story as part of her evidence that "Our language is being murdered." It's not all of her evidence; her rant touches on a variety of different subjects relating to the supposed disastrous slippage of linguistic standards in Britain's schools. She is aware that language changes, but she dismisses this briskly by saying, "what the "anything goes" brigade refuse to acknowledge is that there is a difference between developing language and abandoning it."

Our language is being murdered and/or abandoned because your is occasionally spelled ur (as opposed to the common older abbreviation yr) by a teenager writing in a hurry? You know, we do try to exaggerate the dumbness of newspaper stories about language here at Language Log, for a little humorous leavening; but it isn't very easy, because what the journalists say is often far out beyond where satire can reach.

Posted by Geoffrey K. Pullum at 11:45 AM

When life is funnier than the funnies

The world's media recently saw a flurry of stories about a non-story: exam-grading authorities in Scotland and New Zealand explained, when asked, that they give partial credit for correct answers that are wrongly spelled -- a long-standing policy that also covers occasional intrusions of abbreviated spellings from the culture of text messaging. This led to blasts and counter-blasts of end-times rhetoric from pundits and politicians. There was an especially funny exchange between Bill English, the spokesman for education of the National Party in New Zealand, and Steve Maharey, New Zealand's minister of education:

English: This kind of pigeon English is fine for young people organising their social lives, but it is not an acceptable way of expressing an academic argument or idea.
Maharey: The statement is understandable, despite pidgin being spelt p-i-g-e-o-n, as in a bird from the dove family, rather than p-i-d-g-i-n, as in simplified language used between persons of a different nationality. But we will give him credit.

Although I'm a big fan of User Friendly, I'm afraid that in this case the politicians are funnier:

It's normal, if unfortunate, that politicians and the mass media get this kind of thing wrong. But I expect better from the cartoonists. [Tip of the hat to Robin Shannon]

Satirical cartoon uptalk is not HRT either

There's a widespread false belief that "uptalk" -- the phenomenon of final rising intonation used on phrases that aren't yes/no questions -- involves terminal pitch contours that start high and rise. As a result, some people use the unfortunate technical term "High rising terminal", abbreviated HRT, for this way of talking. But as I've argued earlier ("Uptalk is not HRT", 3/28/2006), the informal term uptalk is a better choice, since it avoids the often-false claim about the shape of this contour that's implicit in the term HRT.

Some additional evidence emerged yesterday on Fox's Family Guy (Episode FG-435 "Whistle While Your Wife Works", Air Date: Sunday, November 12, 2006), when Stewie tries to persuade Brian to break up with Jillian, who is described in the episode's press release as "very attractive but intellectually challenged". Stewie tries to make his point by satirically imitating Jillian's (alleged) uptalk, adding some annoying little nods and grimaces:

There are five rise-ending phrases in this short clip. In each case, I've given a transcribed pitch track of the end of the phrase, and also some numbers showing the pitch value (in Hz.) in the middle of some selected syllables -- or in the case of final rises on final accented syllables, in the middle of the initital lower-pitched region, and then at the location of the peak.

The first four examples are in Stewie's little "dump her" speech.

Alright, Brian, you can do this.
You can dump her.
Because once it's done, never again will you have to listen to her talk like thi......s? 199 202 182 162..334

In this first case, the low part of "this" is fully 40 Hz. lower than the value of the previous accented syllable, "talk". (The even lower pitch in the low-amplitude region at the start of "this" is the consequence of the restricted air-flow during the voiced fricative [ð].)

You know, where everything has a question mark at the end of it? 
               213         210     169                      321

Again, the low pitch value on the accented syllable of "question" is 44 Hz. lower than that value of the accented syllable of "everything".

With an upward inflection? 
       205        152 335

And in this case, the low value on the accented syllable of "inflection" is 53 Hz. lower than the previous accent on "upward".

at the end of every sentence?
      198          159  347

This time there's a 39 Hz. difference -- the low point on the stressed syllable starting the final rise is still the lowest accented syllable by almost 20%.

That's the end of Stewie's contribution. Now Brian responds in kind:

Yeah, I don't know what I was thinking.
119           128   142   105  123 313

Oh, dammit, now I'm doing it too!

Brian's uptalk is more ambiguous. The stressed syllable of "thinking" is indeed about 19 Hz. lower than the previous accented syllable "what" -- but the pre-stress dip on "was" seems to represent a genuine low-pitched target (rather than simply the pitch-depressing effect of the obstruent), and "thinking" starts up fairly rapidly. So you might believe that this a type of accent whose low point is aligned before the "beat" of the accent, rather than on or after it -- and that's one of the patterns that might plausibly be described as a "high rise", since the strong syllable of the accent is at a mid pitch value rather than at a local minimum.

And Brian's attitude towards Jillian is more ambiguous as well -- the episode's press release says that he "can't close the deal because she is so hot". But the funny thing is, though Jillian is certainly depicted as less than brilliant, she doesn't actually use uptalk very often. In the segment below, there are a couple of examples around 1:10, and a couple more around 6:00, but most of Jillian's talk is not uptalk at all. (And there's nothing like Stewie's accompanying head and face gestures.) Apparently even a cartoon stereotype of a female airhead is not as intonationally stereotyped as the other cartoon characters' stereotyped image of her is.

[Tip of the hat to Vishy Venugopalan]

Posted by Mark Liberman at 12:12 AM

November 13, 2006

Plain english creeps into police radio transmissions

An Arlington Virginia policeman uses his high-tech radio to call for help, shouting "ten thirteen," meaning "officer down." In nearby Bethesda Maryland, other officers ignore his message because to them it means "request wrecker." Hmm. this could be a problem, to say the least.

The Washington Post reports that the Virginia State Police have had enough of such confusion and are instituting a radically new policy that calls for abandoning "10 codes" used in daily transmissions in favor of -- you guessed it --plain English.

There is a history behind the language planning currently used on police radios. It started back in the 1920s, when police had only one channel to work with. But over the years, in a Tower of Babel fashion, separate police departments began to develop their own meanings for their "10 code" numbers. In the  densely populated area around the District of Columbia the separate law enforcement agencies of the states and counties, along with the Pentagon, ATF and FBI, gradually created their own "10 code" meanings. Fine and dandy -- except when they communicate with each other across jurisdictions, which turns out to be very frequently.

Sociolinguists describe three types of language planning: corpus planning, status planning, and acquisition planning. Corpus planning is what the Virginia State Police seem to be trying to carry out here. This involves creating new forms of language, modifying old ones, or selecting from among alternative existing forms. This is not the same as status planning, which involves such things as deciding on an official or national language, thereby assigning status to that choice. Many Language Log posts (here), (here), (here), and (here), for example, have dealt with America's ongoing efforts to make English the official language, an excursion into status planning. It doesn't seem likely that the Virginia State Police are aiming at status here. As for acquisition planning, it remains to be seen whether this effort  in language planning will succeed in teaching the new forms effectively or create an incentive for officers to learn how to use plain English instead of their more familiar "10 codes."

Language planning changes aren't easy to accomplish. On May 19, 2006, the US Senate, influenced by a group called U.S. Englishvoted to make English the "national" (interestingly, not "official") language of the country. But 27 states have elevated "national language" to their state's "official language." This is a clear example of status planning (proclaiming English to have more status than any other language) but acquisition planning may prove to be a bit more difficult.  Already there is resistance to this venture, as the Post reports. Some cops say that they're more comfortable with the old "10 code" system and they think this in-group jargon is nifty because it marks their status as police. They reason that if doctors and lawyers can have their language codes, why can't police? Other officers express some difficulty in even remembering what the plain English is for their codes. Still others worry that their transmission will now become understandable to the general public (as if they weren't already available on the internet).

We'll have to see what happens on this one.

Update: Grant Barrett writes that one side-aspect of the dropping of "10 codes" is that the trunk radio systems now so prevalent in policing make the masking of police intent and action less necessary, since it's more difficult to monitor trunking than it is in the old analog systems.

So maybe this Virginia State Police language planning is just a practical matter after all.

Posted by Roger Shuy at 04:27 PM

7 - 38 - 55!

I'm not calling a football play; those are the famous Mehrabian numbers, giving -- in the usual citations of this research -- the percentages that verbal content, paralinguistic features (vocal quality, prosody, etc.), and kinesic features ("body language", broadly construed) contribute, respectively, to the total impact of a message.  When I last mentioned this research, I noted that the great avalanche of bizlore -- the lore of corporate trainers, motivational speakers, advertising advisers, and the like -- using the Mehrabian numbers went drastically far beyond Mehrabian's own claims, which were that these figures applied only to the communication of attitudes and emotions.  As it turns out, the actual results of Mehrabian's 1967 studies are much more modest than even this, as Ed Keer noted in his blog back in February (building on a longer discussion by Richard Sproat on Linguist List in 2001).

Check out Keer and Sproat for details.  The fact is that the 1967 studies weren't about the communication of attitudes and emotions in general, but about the communication of one specific set of attitudes and emotions, liking and disliking.  Ok, you say, I had no earthly idea how anyone could study the relative contributions of verbal content, paralinguistics, and kinesics to the total impact of a message (whatever that means), but I still don't see how this much more modest question could be investigated experimentally: what do you measure, and how?

Good question.  What Mehrabian did was pit features (expressing liking/disliking) in the three modes against one another to see which mode prevailed, and how often, when they were in conflict.  Even if we accept his results at face value -- and there are many details of the experimental design and the interpretation of the data that a reasonable person could fret about -- all that Mehrabian did in 1967 was, in Keer's words, to discover sarcasm, in this case the conveying by extralinguistic devices of a meaning opposite to the plain meaning of the words.

That's stage one.  In stage two, these results morph into a global generalization about language use, which then spreads into all sorts of places outside the academic world.  The details of this transformation and diffusion would be worth looking at.  (With luck, Mehrabian himself has relevant materials from the 60's and 70's.)  No, no, don't look to me to do this research; I'm the guy with over a hundred postings in his queue for Language Log, and I'm not a cultural historian.

In any case, I'd imagine that science writers for the general press, and their editors, had a hand in the spread of the Mehrabian numbers to a wider world.  (I mention editors, because many a science writer has had an original text altered, in small or large ways, to make it conform to the beliefs of editors -- whatever the content of the original.  And then, famously, headlines are often attached that seriously distort that content.)

That's stage two.  In stage three, the Mehrabian numbers become part of bizlore, indeed part of a larger set of folk beliefs.  Most people are no longer aware of the source of the numbers, and most people who cite Mehrabian haven't looked at the original studies or any careful summary of them;  it's "common knowledge" now.

Bizlore is just one part of an enormous enterprise of popular advice literature -- on education, child-rearing, exercise, diet, relationships, gardening, and more, including grammar, usage, and style.  Bizlore focuses on persuasion, power, and the fostering of positive emotions, with the aim of helping people achieve success in business dealings of all sorts. 

All sorts of popular advice literature, not just bizlore, appeal to "common sense" and folk beliefs; rely heavily on personal opinions and impressions (of the advisers and their audiences); and get points across largely via particular examples, often by telling exemplary stories of personal experience (we all love stories).  Notice that this is not at all the way scientific inquiry proceeds -- but it IS the way ordinary people reason about their world and their lives.  "Science" appears in popular advice literature mostly for its value as dressing: there are numbers, real numbers; and actual researchers or institutions, of some prominence (or apparent prominence), can be appealed to, however spuriously.  "Science" is just one more element in the rhetoric of popular advice literature, rarely an actual contributor to it.  (There are some honorable exceptions, of course.)

In any case, you can see why bizlore loves the Mehrabian numbers.  They're wonderfully impressive.  So exact, and from a real scientist!

The Mehrabian numbers also plug into a powerful folk belief about how human interaction works -- that we are "communicating" (passing back and forth) "messages" to one another.  Ordinary people (and some social scientists) conceptualize interaction in terms of the "conduit metaphor" discussed in several places by Michael J. Reddy (most recently, I think, in the 2nd edition (1979) of Metaphor and Thought, edited by Andrew Ortony) and made famous by George Lakoff in many of his writings.  Now, everyone, including social scientists as a group, recognizes that paralinguistics and kinesics contribute a lot to the texture of interaction, so it's natural for ordinary people to think that linguistic expressions, paralinguistic features, and kinesic features are just three different modes of communicating the same messages, and it then makes sense to ask what their relative contributions are.

Two problems, one of substance, one of method.

The first is that there's no reason to think that the three modes are ways of conveying the SAME "meanings", or even that CONVEYING meanings is what's going on.  I would maintain, with many others, that there are many different kinds of "meaning" at issue here, and that it would be more accurate to say that the features of behavior in question (depending on the occasion) express, reflect, perform, or construct these meanings than to say that they simply convey them.

The second problem is that the question being posed -- what are the relative contributions of (strictly) linguistic content, paralinguistics, and kinesics? -- is, as I suggested above, one of those impossibly over-global questions that almost surely can't be answered.  The methodological difficulty here is that what happens in each of the three modes is exquisitely context-dependent.  I can't see any way to sample behavior, from the whole world of human interactions, while controlling for these differences in context; without such controls, what we see might well just follow from differences in the frequencies of the various contexts, rather than from some intrinsic difference between the modes.  (There's also the problem of individuating contexts.  Where do we get an inventory of the relevant types of context, even in one culture?)

What I'm saying here is that there are some questions about language and behavior that are easy to formulate but so global that they are probably unanswerable in principle.  (At least some of the questions about differences between the sexes, such as Louann Brizendine's claim that women use many more words per day than men -- now discussed here in a long series of postings by Mark Liberman -- are almost surely unanswerably over-global.  I hope to post on that eventually.)

A semi-final remark: linguists will probably be struck by what counts as (strictly) linguistic (vs. paralinguistic or kinesic) in Mehrabian's research and everything that cites it or alludes to it: apparently, only aspects of utterances that contribute to literal meaning.  To a linguist, this is desperately impoverished view of language use, excluding most of the subject matter of entire subfields of linguistics.  Almost all variation in the linguistic system is ignored, thus neglecting the ways in which, through their use of particular linguistc variants, people express, reflect, perform, and construct social group affiliations and personas; the ways they express or reflect attitudes and opinions towards their audiences (including liking/disliking!), about the nature of the interaction, etc.; and the ways in which they use linguistic choices to structure their discourses (via discourse particles, for example).  These are "social meanings" and "discourse meanings", if you want to put everything under the umbrella of "meaning".

Also missing is everything to do with non-literal meaning: for instance, implicatures of all sorts, fresh figures (especially metaphors and metonyms), and other rhetorical devices.  Emotions and attitudes can be expressed or revealed through all these means, too.

On a more constructive note, I can remind you that linguists, psychologists, sociologists, and anthropologists have long concerned themselves with the ways in which linguistic content, choices of variants, discourse organization, paralinguistic features, and kinesic features are coordinated with one another and can combine into suites of behaviors associated with "meanings" of all sorts.  For a beautiful recent example of research along (some of) these lines I recommend Rob Podesva's Stanford Ph.D. dissertation, Phonetic Detail in Sociolinguistic Variation: Its Linguistic Significance and Role in the Construction of Social Meaning, completed this summer (it will be available eventually, in chapter-sized chunks, on his website).  Podesva looks at the way three speakers' uses of one segmental variable (realization of word-final coronal stops) and two paralinguistic variables (prosody and voice quality, in particular falsetto) are associated with different personas in different contexts.  These associations are very much local, in that they are tied to particular social groups and to particular contexts, as well as to individual speakers (the three speakers -- all friends -- use the variables in different ways).

(Full disclosure: I was a member of Rob's dissertation committee.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:38 PM

Punctuational linguificatory hyperbolicity

Charles Belov points out to me a punctuational linguification that does fall under the heading of hyperbole.It's in a piece of writing about dance by Eva Yaa Asantewaa:

Some enormously gifted people contributed to Francesca Harper's Modo Fusion Lounge showcase up at Makor/Steinhardt Centers intimate café space on West 67th Street. For starters, there was the stunning Harper herself — the kind of artist and performer whose pile-up of talents quickly exhausts a keyboard's hyphen or comma keys.

Did you instantly parse that connection between talent pile-up and key exhaustion?

The idea seems to be that the comma and hyphen keys on your keyboard will get worn out when you try to write about Harper's talents. Asantewaa says Harper is "‘a conceptual pop artist,’ film director, lyricist, dancer, singer, and actor currently understudying two roles in The Color Purple", and the Modo Fusion Lounge features "music, ... dancing ..., film, poetry, humor, and a whole lot of fun." Ten commas there. My comma key survived the pounding. I suspect Asantewaa's did too, refuting her literal claim. She exaggerates. But at least this linguification is intelligible once you see that: if Ms Harper did billions of different things, (the actual array of projects can be seen at; it doesn't really run into the billions), and you tried to list them all in a multiple coordination, then at an average word length of about 6 characters (roughly the right figure for English), for every n keystrokes on letter keys you would need 6n comma keystrokes, which might cause the comma key to wear out before the letter keys. It's a fairly silly piece of over-writing (perhaps not the first in enthusiastic arts reviewing), but at least it does make perfect sense as an exaggeration. As I have previously pointed out, many linguifications don't.

Posted by Geoffrey K. Pullum at 11:49 AM

November 12, 2006

prancing about with jack mcconnells pants on your head does not a news story make

... but the Tensor's Snowclonatron will find it for you anyway. Over at Tenser, said the Tensor, the Tensor has just released a perl script that tabulates instances of snowclones. So I downloaded his program, and executed 'X does not a Y make' >DoesNotA

and hey presto (well, after a couple of minutes), I had a list of 640 variant instantiations, sorted according to Google count. In this case, the top ten (with their counts) are

one game does not a season make 916
one election does not a democracy make 482
prancing about with jack mcconnells pants on your head does not a news story make 404
a bottle of water does not a rider make 402
benchmarks does not a majority make 195
one month does not a trend make 158
an os does not a platform make 143
one year does not a trend make 141
one win does not a season make 131
one day does not a trend make 85

The business about Jack McConnell's pants is a self-refuting line from a much-copied recent news story: Murdo MacLeod and Eddie Barnes, "McConnell under pressure over Bute House video fiasco", The Scotsman, 11/5/2006. Well, actually, it's from a reader's comment on the news story, so it's not so much self-refuting as reader-refuted.

The Tensor's algorithm for finding the boundaries of the phrasal template in particular cases is simple but generally effective. There's just one mistake in the top ten list: the phrase "benchmarks does not a majority make" omits an initial "several" in quotation marks in the source.

In this list, the Xs such that X "does not a trend make", in frequency-sorted order, are one month, one year, one day, one data point, one quarter, one week, one number, two months, one point, three weeks, one person, one season, one example, five people, one quarter, four days, an exception, one movie and one failure.

The Xs such that X "does not a democracy make" are one election, back room politics, voting, one poll, voting alone, a single election no matter how successful, majority rule, holding elections alone, the right to vote alone, and white boys in power.

The Xs such that X "does not a season make" are one game, one win, one bad loss, and April.

Posted by Mark Liberman at 10:02 PM

Be a famous footnote

The snowclone "I for one welcome our new * overlords" is the subject of some the most enduringly popular LL posts ("Giant space ants win control of Congress?"; "The memetic phylogeny of 'Our new * overlords'"; "I, for one, welcome our new * overlords".) So now, if you've got a collection of back issues of the Utne Reader during the period 1990-1994, and some time on your hands, the result could be cyberspatial fame beyond your wildest dreams. Or maybe not, depending on what your dreams are like. Anyhow, Martin Keegan writes:

The line "... and let me be the first to welcome our new Centaurian overlords" appeared in a three-frame comic strip in Utne Reader some time between 1990 and 1994 (when my parents got a subscription to Utne and when I left Australia and stopped reading it). The scene is the same as the Simpsons episode: a TV announcer covering an alien invasion switching sides mid-sentence. It's highly unlikely that one was not inspired by the other. Unfortunately I don't have the magazines to hand, so can't be as precise as I was over "butt-crack of dawn".

If you can track down this reference, please send me the citation -- and if possible a scan of the cartoon.

[Martin's earlier precision of attribution is on display here.]

Posted by Mark Liberman at 11:36 AM

Alarming decline in literacy among publicists and journalists

In a couple of earlier posts, I commented on the fuss in Britain and elsewhere created by the revelation that some exam-grading authorities give partial credit for correct answers that are wrongly spelled. This post gives some background, including a psycholinguistic backstory that also brings out some interesting things about the ecology of science journalism.

Now, the fact seems to be that no exam-grading policies have actually changed. To the extent that there was any news here at all, it was just that some new kinds of misspellings, following the new conventions of text messaging, are now found in what a spokesman for the Scottish Qualifications Board called a "very small" percentage of exam papers. These new types of misspelling are treated just as the old ones were: "pupils would still be given [partial] credit if expressing a valid idea".

Nevertheless, some writers proclaimed cultural Armageddon. Thus Katie Grant, "Our language is being murdered", The Sunday Times, 11/5/2006:

Those marking exams are no longer presented with neat, comprehensible scripts, but with pages and pages of C U l8r, heavily illustrated with emoticons, those smiley or gloomy faces so beloved of teenagers, who probably have no idea that emoticons were originally made up of punctuation marks. In Scotland today, children presenting such scripts go unpenalised.

Others called on lovers of liberty to rally to the cause of orthographic reformation. Thus Simon Jenkins, "A million fingers are tapping out a challenge to the tyranny of spelling", The Guardian, 11/3/2006:

Thank you, Scotland. First John Knox, then the Enlightenment and now the Scottish Qualifications Authority. In a direct challenge to the English at their most reactionary, the authority has declared that it will accept text-messaging short forms in school examinations. The dark riders of archaism will protest and the backwoods will howl. No spell is cast as dire as spellcheck. But the champions of reason are massing north of the border and need our support.

(By the way, is it only in the Anglosphere that discussions of orthography so readily tap a vein of apocalyptic imagery? Is this connected to the phenomenon of word rage, discussed here, here and here?)

One of the early reports about the SQA grading controversy found a lower-keyed defense of tolerance for texting in a recently-featured research report ("Exam board attacked for approving text message answers", The Guardian, 11/1/2006):

Exam chiefs in Scotland were branded "ridiculous" today after admitting that answers written in text message language will be acceptable in English tests as long as they are correct.

The Scottish Qualifications Authority (SQA) said the use of phrases like "2b r nt 2b" or "i luv u" in exam papers would be allowed as long as candidates showed that they understood the subject.

The admission follows research from Coventry University, released in September, which suggested that sending text messages - from the slang "wot" and "wanna", to the short cut "CU L8R"- may actually be improving, not damaging, young children's spelling skills.

So I thought I'd look into this research. The first thing I found was that there had been quite a few news reports about it: "Texts 'do not hinder literacy'", BBC News, 9/8/2006; Alexandra Smith, "Texting slang aiding children's language skills", The Guardian, 9/11/2006; Alexandra Frean, "Y txtng cn b v gd 4 improving linguistic ability of children", The Times, 9/9/2006;, etc.; And even a blog post or two, e.g. Helen Keegan at Musings of a Mobile Marketer, "textin iz gud 4 ur lang skilz", 9/11/2006.

But it turns out that the research was not exactly "released in September", in the sense that in September there was a paper to read, or even a preprint. The news reports came about because of a press release.

On Friday, September 8th, at the annual conference of the British Psychological Society's Developmental Section, held at the University of London, there was a poster-format presentation with the title "Cognitive Factors in Text Messaging and Literacy Links". The authors were Beverly Plester and Clare Wood, of Coventry University.

Unfortunately, the BPS does not put papers or even abstracts on line for its many conferences ("around 100 conferences and events each year"), but it has an active publicity department, who distribute press releases for particularly juicy items from these events, and the flacks chose the Plester and Wood poster as one of 16 things to tell the world's press about in the month of September. I believe that this was the only item that they chose from the program of 102 posters, 108 individual presentations, 16 symposium presentations and 4 keynote presentations at the Developmental Section's September meeting.

For obvious reasons, PR departments and scientific program committees have different ranking criteria for ranking research. In this case, the program committee put the Plester/Wood paper among the poster presentations, which is the lowest rank of acceptance at such a conference. The BPS's PR department chose it as the only one of the roughly 230 presentations at the conference to tell the world about. I'm not suggesting that either the program committee or the PR department made a mistake, nor that one set of criteria is intrinsically better than the other. I'm just observing that the program committee and the PR department clearly value different things.

For a sense of what the BPS PR department values, we can list some of its other titles from September: "Identity key to race relations"; "Do terrorist threats increase Islamophobia in Britain?"; "Sacked Rover workers can only find harmful 'bad' jobs"; "Larger mobs carry out more violence"; "People don't deal directly with threats to their way of life"; "Is sex at work the kiss of death for your career?"; "Young children think TV is real"; "Mobile phones: addictive, causes of stress"; "Exercise beats nicotine cravings"; "Young people reveal role of alcohol in their lives"; "Keep fitness fun to lose weight"; "Health risks of smoking ignored by women"; "Texts strengthen exercise plans"; "Hand tied and tongue tied"; etc. In other words, the usual things: sex, violence, race, fitness, smoking, drinking, and so on. Drugs, global warming and celebrities were left out due to sampling error, I guess.

I'm sure that the BPS PR department is doing its job well, in the sense that its operatives understand what the press is looking for, and act as an effective filter in picking out the items that will sell. Let's note in passing, though, that as a result, some fascinating-looking stuff from that same BPS Developmental conference went completely unreported, even in the world's most intellectual media. For example, there was an invited symposium with the title "State of the Art in Theory of Mind: old problems, new data", with Josef Pemer, Paul Harris and Michael Tomasello; and another one with the title "Workshop on methods for analysing children's interaction", convened by Margaret Harris.

A second problem with this system is that the PR operatives who write the press releases, focused as they are on getting the attention of reporters and editors, aren't always very careful to present the facts clearly. The BPS press release for the Plester and Wood poster came out under the title "Do U no wot Im Sayin?". Here it is -- do you know what it's saying?

Contrary to popular assumptions, the use of text messaging abbreviations is linked positively with literacy attainment, a study conducted with eleven-year old children has found.

Mrs Beverly Plester and Dr Clare Wood of Coventry University presented their research on Friday 8 September 2006, at the British Psychological Society’s Developmental Section Annual Conference being held at the Royal Holloway, University of London.

The study was designed to explore how the use of text abbreviations might be related to the skills children need in reading and writing, in response to concern from parents and teachers about whether texting might damage children’s ability to use standard English. The children were quizzed about their use of mobile phones and asked to translate messages between standard English and text language, as well as complete tasks to reveal their English writing, reading and spelling abilities.

It was found that children use their mobile phones more for sending text messages than for talking, the majority of which are sent to friends. Most text abbreviations were phonetically based, such as ‘wot’ for ‘what’ and rebus types, such as ‘C U L8r’. Many also used what the researchers describe as ‘youth code’, casual language such as ‘dat fing’, ‘gonna’ or ‘wanna’. Surprisingly, the children who were better at spelling and writing used the most ‘textisms’.

Mrs Plester said; "So far, our research has suggested that there is no evidence to link text messaging among children to a poorer ability in standard English and those children who were the best at using ‘textisms’ were also found to be the better spellers and writers."

"Texting could be used positively to increase phonetic awareness in less able children, and perhaps increase their language skills, in a fun yet educational way."

A couple of initial problems:

1. When I google the authors, I find that Coventry University lists Dr. Beverly Plester as "Senior Lecturer in Psychology", with a Ph.D. in psychology from Sheffield University -- exactly the same job title and level of academic qualifications as Dr. Clare Wood. So why does the press release describe the authors as "Mrs Beverly Plester and Dr Clare Wood"? Simple carelessness?

2. Were the children asked to compose and send text messages? The description of the study doesn't say so -- we're told that they "were quizzed about their use of mobile phones and asked to translate messages between standard English and text language, as well as complete tasks to reveal their English writing, reading and spelling abilities". But then how could they tell that "the children who were better at spelling and writing used the most ‘textisms’"? Is this just a misleading way of restating Dr. Plester's assertion that "those children who were the best at using ‘textisms’ were also found to be the better spellers and writers"? I suspect so, though I can't tell from this description.

A web search didn't turn up any published version of this study, nor any preprints, but it did turn up the abstract for a talk given two months earlier, at the thirteenth annual meeting of the Society for the Scientific Study of Reading, in Vancouver, July 6-8:

Beverly Plester (Coventry University ); Bell, Victoria; Wood, Clare - Exploring the Relationship between Text Messaging and Literacy Attainment
A pilot study revealed that although high levels of texting on mobile phones was linked to lower levels of literacy attainment in a sample of 12 year old children, their use of text abbreviations when messaging was positively associated with their literacy attainment at school. Ongoing research is attempting to understand the nature of the positive association between textism use and literacy attainment. In particular, the question of whether phonological awareness may be implicated in the apparent ability to use text abbreviations will be considered.

This is probably a report of the same research -- it seems unlikely that an additional study on the same topic could have been completed between July 8 and September 8. However, there are some worrying differences. The biggest one is that this abstract says that "high levels of texting on mobile phones was linked to lower levels of literacy attainment". I interpret this to mean that kids who reported that they did more texting when "quizzed about their use of mobile phones" scored lower on the tests given "to reveal their English writing, reading and spelling abilities" (using the phrases from the 9/8/2006 press release). But the this correlation was not mentioned in the BPS press release, nor in the popular-press articles based on it.

Instead, what was featured was Dr. Plester's observation that "those children who were the best at using ‘textisms’ were also found to be the better spellers and writers". I interpret this to mean that kids who performed better when "asked to translate messages between standard English and text language" also scored higher on the tests given "to reveal their English writing, reading and spelling abilities".

Now, there's no contradiction between these two results. It could be that kids who do more texting (or at least report doing more of it) score lower on tests of spelling and writing; and at the same time, kids who are more skillful at translating between texting and standard orthography also score higher on tests of spelling and writing. (In fact, it would be hard to measure skill at translating between texting and standard orthography in a way that did not automatically guarantee that kids who score higher are also better at using standard orthography...) Then again, it could also be true that the researchers did collect samples of the kids' texting, and found that kids who used more abbreviations when texting were also better spellers in tests of standard orthography.

The July 8 abstract says that the children in the study were were 12 years old. The 9/8/2006 press release doesn't mention the subjects' ages, but the 9/8/2006 BBC story, whose author interviewed Dr. Plester, talks about "[a] Coventry University study of 35 11-year-olds". So maybe there were two studies? Unfortunately, zero studies have been published, as far as I can tell, or even described clearly in an informal document. So all we can say, as usual, is that without knowing what the researchers actually did, it's hard to tell what the results actually mean.

My own image of a more perfect society is agnostic about the level of spelling skills. Instead of dreaming about a world of perfect spellers, I like to imagine a world where stories about scientific research provide (or link to) a clear and simple account of the researchers' methods and results, and where reporters and editors have the skills to make this happen. Since this fantasy of mine is pretty implausible, I admit, here's a goal that we might actually reach: how about a world where all organizers of conferences in science and engineering routinely require, and publish on the web, the sorts of four-page "extended abstracts" that many conferences already require? This would improve refereeing as well as communication within our disciplines. It would also make it possible for the interested public to go to the source, by-passing the (usually misleading) presentation in the mass media.

Despite this post's jokey title, I don't think that scientific literacy -- by which I really just mean common sense and clear thinking -- is any lower among publicists and journalists than in earlier times. But it's pretty low, and I think we'd all be better off if it were higher.

Posted by Mark Liberman at 10:40 AM

November 11, 2006

Unblogged snowclones

On returning to the world of snowclones with my discussion of The New Y, I was dismayed to see how many figures or formulas had piled up in my files of unblogged snowclones; the first came in in 2000!  Here's the inventory, with my sources, and very minimal commentary. 

Note: some of these are without question snowclones, but others might be patterns of playful allusions, idioms, playful morphology, or clichés; several of them deserve a discussion of some length, which I'm not now able to provide.  Nor do I have the time now to trace the histories beyond what I say below.  This is the best I can do at the moment.

1.  "Now if you will excuse me I have a X to Y", e.g. "... I have a plane to catch" (Aaron Dinkin, mail of 9/24/05)

2.  "I'm from X and I'm here to help (you)", e.g. Ronald Reagan's mockery of "I'm from the government and I'm here to help (you)" (Ben Zimmer on ADS-L, 7/13/05, citing a query of the March before from Geoff Nunberg)

3.  "not the Xest Y in the Z", e.g. "not the sharpest tack in the box" (me to Language Loggers, 8/30/06, with a response from Ben Zimmer pointing to on-line lists of "what to call dumb people", for instance this one)

4.  "Don't X me because I'm Y", e.g. "Don't hate me because I'm beautiful", from an 80's shampoo commercial that is possibly the source for the snowcloning (mail from Tim Shock, 10/13/05)

5.  "X-y McXerson", e.g. "Drinky McDrinkerson" for someone who likes to drink a lot (mail from Don Porges, 10/13/05; playful morphology)

6.  "Hardly/Not a X goes by without Y", e.g. "Hardly a week goes by without a Nunberg citing in the New York Times" (mail from Benita Bendon Campbell, 10/13/05; possibly an idiom?)

7.   "We don't need no stinking/stinkin'/steenkin' Xs", e.g. "We don't need no steenkin' snowclones" (mail from Chad Sanders, 10/17/06; discussion on the Subjunctivitis blog, and a whole web site devoted to the figure and its history)

8.   "If that's X, every Y should be so lucky", e.g. "If that's being discriminated against, we all should be so lucky" (mail from Marilyn Martin, 10/23/05)

9.   "Yes, Virginia, [mildly improbable statement is true]", e.g., "Yes, Virginia, the moon isn't made of green cheese" (mail from Vishy Venugopalan, 10/26/05; the source of this one -- an 1897 editorial in the New York Sun -- is well known)

10.  "X does not a Y make" and assorted variants, e.g. "One chapter does not a dissertation make" (mail from Brendan McGuigan, 11/2/05; this one turns out to go all the way back to Aristotle, on swallows and summers)

11.  "X-lorn", e.g. "luck-lorn" (ADS-L posting by Ben Zimmer, 12/15/05; playful morphological extensions from "lovelorn")

12.  "X gone wild", e.g. "Greco-Roman boys gone wild" as a description of Fellini's Satyricon (William Salmon on ADS-L, 5/1/06, with follow-ups by me and Larry Horn; based on the Girls Gone Wild videos)

13.  "Take X and shove/stick it", e.g. "Take this job and shove it" (Doug Wilson discussion on ADS-L, 10/22/06, with examples going back over fifty years)

14.  "There's a lot we don't know about X", e.g. "There's a lot we don't know about the unconscious" (Lee Rudolph on soc.motss, 1/22/04, citing my use of the figure and suggesting that the original had "mirrors")

15.  "As a X, N is a great Y", e.g. "As a baseball player, he's a great linebacker" (Mark Mandel on ADS-L, 8/22/00)

16.  "busier than a X [someplace]", e.g. "busier than a one-armed man in an ass-kicking contest" (Barry Popik on ADS-L, 5/1/04, citing some "busier than a cranberry merchant" examples going back to the 19th century)

17.  "That's not an X; this is an X", e.g., "That's not a screw-up; this is a screw-up" (Jason Parker-Burlingham in conversation, 12/15/05)

18.  "N is the M of X", e.g. "Eric Raymond is the Margaret Mead of the Open Source movement" (e-mail from John Cowan, 9/13/05, and previously blogged here)

19.  "There's no rest for the X" and variants, e.g. "There's no rest for the Clinton-obsessed" (me on ADS-L, 5/21/06; this goes back to Isaiah and "no rest for the wicked", with later variants with "peace" for "rest" and/or "weary" for "wicked")

20.  "Whatever Vs your X", e.g. "Whatever bangs your shutters" (many participants on ADS-L, 10/10-11/04)

21.  "X me no Ys", as in "Petition me no petitions" from Fielding (Mark Mandel on ADS-L, 8/22/01; David Crystal in his 2006 Words Words Words, pp. 70-1, takes it back to Shakespeare)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:28 PM

Fully awesome!

Today's Zits cartoon takes on the march of intensifiers (beyond GenX so, beyond intensifier all, beyond totally), and also works in an instance of the "X is the new Y" snowclone (last discussed in the halls of Language Log Plaza here):

Since this last posting about The New Y (as I'm now labeling this snowclone), citing Chocolate is the new black, I've been collecting instances in the wild, with the following finds:

Blue is the new red. (a variety of meanings, some of them opaque to me)

Gray is the New Blonde. (hair color for women)

Nudity was clearly the new black. (model Kate Moss naked at a photo shoot)

How long will... an article about how taupe is the new black. (fashion colors)

Folk is the New Black (Janis Ian album released earlier this year)

After all, 60 is the new 50. (porn actor Peter Berlin at 60)

And as the proper accessory for the well-dressed man of a certain age, a bulging crotch is the new bifocals. (ditto)

... rugby is the new polo. (shirts)

Pink: the New Black. (anal bleaching -- would I make something like that up?)

Small is the new big. (economic developments in the energy world)

I hope you're eating organic!  Because organic is th' new "Fifty" and th' new black. (Zippy on food)

Fat is the new black. (designer Isaac Mizrahi on men's fashions)

Forty is the new 30 (price per dish at some upscale restaurants)

Sicily is the new Tuscany. (vacation destinations)

Here's to 50! ... The new 40! (women's ages)

College is the new high school. (preparation for careers)

Chefs are the new rock 'n' roll stars, cookbooks are the new pornography. (food and sex)

How long can this go on, before the attractions of The New Y wane and it crashes, the way Color Me ("Color me surprised" 'I am surprised') eventually did?  Or will it live on as a durable but no longer ubiquitously fashionable formula, the way One Man's X ("One man's terrorist is another man's freedom fighter") seems to have done?  Is The New Y going to be the new Color Me or the new One Man's X?

[Addenda: Ben Zimmer supplies a link to a site with a pile of The New Y examples and a pointer to the Wikipedia page, where the figure is taken back to Gloria Vanderbilt asserting, in the 60's, that "Pink is the new black."  And Jim Lewis pulls up around 46,000 hits for, omigod, "Black is the new black".  Shannon Casey notes the popular gossip blog Pink is the New Blog.  And Martyn Cornell tells me that the British satirical magazine Private Eye has been running a column called "The Neophiliacs" for several years that reprints "ever-more ridiculous examples" of The New Y, without, apparently, having any effect on its popularity.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:35 PM

The Historian

Having mentioned Elizabeth Kostova's novel The Historian (as I just did), let me add that as a true aficionado of Bram Stoker's wonderful 1897 epistolary novel Dracula, with which Kostova's is intimately interwoven, I would not be one to think well of any cheap ripoff of it; but Kostova has crafted a serious, well-written, and ingenious homage to Stoker's imaginary world and the historical truth about Vlad Drakul that partly inspired it.

In addition to being talented, Kostova is, as her picture shows, very beautiful. If there is any Language Log reader who happens to know how to get in touch with her, I would be grateful if they would kindly convey to her the following private message.

Elizabeth (if I may): I wish to take you to dinner at some time and place in the future that is mutually convenient. On the eve of St George's Day, perhaps. I thought we might have paprika hendl, as did Jonathan Harker when he broke his journey at the Hotel Royale in Klausenburgh (Cluj as it is now called). We might also enjoy some impletata (he must mean patlagele impulute, the stuffed eggplant dish he spoke of as having been offered to him at breakfast; it should do well as part of the dinner). Some cheese, and a salad, and a bottle of old Tokay, the wine Dracula himself served for Harker with his first dinner at the castle. Then, after dinner, a taste of slivovitz, the plum brandy offered to Harker by Dracula's mysterious carriage driver on that cold May 4th night when he was driven to Castle Dracula.

At the end of the evening I would like, if it is acceptable, to bite you very gently on the neck. (You will find that when a grammarian kisses you, you stay kissed.)

Perhaps you might even do me the honor of visiting my own humble home.

Welcome to my house! Enter freely and of your own will! Come freely. Go safely; and leave something of the happiness you bring!

Grateful acknowledgments to Leonard Wolf, whose beautiful book The Annotated Dracula (New York: Clarkson N. Potter, 1975) is indispensable for an appreciation of Stoker's world. Spellings of place names in Transylvania used above are those found in Stoker's novel, not the modern Hungarian or Romanian spellings.

Posted by Geoffrey K. Pullum at 12:55 AM

November 10, 2006

More Field Linguistics Books

Here are some additions to Arnold's and Mark's suggested readings on linguistic fieldwork.

  • Bob Dixon's 1983 Searching for Aboriginal Languages: Memoirs of a Field Worker ( New York: University of Queensland Press. Reprinted 1989 by the University of Chicago Press.) is a fascinating memoir of his work in Australia.
  • Also about Australia is Forty Years On: Ken Hale and Australian languages, a collection of essays about the work of Ken Hale and related topics, including a piece by Ken's widow Sally and one by Geoff O'Grady, another of the greats of that generation of Australianists. Further information is available here.
  • Science Fiction/Fantasy writer Sheri Tepper has written two books that deal with linguistic fieldwork The more recent of the two, The Companions (New York: Harper Collins, 2003) is explicitly about linguistic fieldwork on an alien planet. Her earlier book After Long Silence (New York: Bantam Books, 1987), which is one of my favorites, is not explicitly about fieldwork but turns out to be. I can't reveal more without spoiling it. If you think your tastes are like mine, read it.
  • Some textbooks of field methods contain interesting anecdotes or philosophical discussions. You might enjoy Bert Vaux and Justin Cooper's Introduction to Linguistic Field Methods The first edition was published by LINCOM Europa. There's a new edition of whose status I am uncertain. Anvita Abbi's A Manual of Linguistic Field Work and Structures of Indian Languages (Munich: LINCOM EUROPA) is interesting for its focus on fieldwork in India.
Posted by Bill Poser at 09:18 PM

Recursively nested quotation marks

The theory of quotation marks in printed Standard English says that whether you use single quotation marks (‘ ’) or double quotation marks (“ ”) as the default, if you have to enclose one quotation within another you switch to the other kind, and so on recursively, alternating quotation mark types: either ‘...“...‘...’...”...’, and so on, or “...‘...“...”...’...”, and so on, to any depth of embedding of quotes. But of course, even just a quoted string that is within a quoted string that is itself within a quoted string is very rare. If you would like to see one in the wild, I can tell you one place to look.

In Elizabeth Kostova's best-selling vampire novel The Historian (Time Warner Books, 2005), starting four lines from the bottom on page 365 of the paperback edition (8th reprint) that I purchased in England this summer, you can see a sequence like this (I omit a long portion just before “The epitaph”):

   “A Traveller” had visited the monastery in Snagov in 1605. He had talked a good deal with the monks there [. . .] The epitaph, which I copied down with care — out of what instinct I didn't know — was in Latin.’ Hugh dropped his voice, glanced behind him, and stubbed out his cigarette in the ashtray on our table.

    ‘After I'd written it down and struggled with it a while, I read my translation aloud: “Reader, unbury him with a —” You know how it goes [. . .]

   [. . .substantial amounts of text omitted here —GKP . . .] My father looked very upset.’ Here Hugh lit another cigarette, and the match shook in the gathering darkness. [. . . much more text omitted here —GKP . . .] 

What is going on here is that the character named Hugh James is telling a long story (shown here in green) which is embedded within a longer story being told by the father of the narrator of the whole novel (shown in red). The father's words are signalled by opening double quotation marks (in red), which, in a style familiar to those who know 18th and 19th-century epistolary novels, are repeated at the beginning of each paragraph but only closed once, at the end of the whole section in that person's voice (so the closing red quotation marks are not shown above; they occur on page 376, at the end of the chapter). Hugh James's words are shown in single quotation marks (green), also not closed at the end of each paragraph but only at the end of a complete section of direct quotation in his voice (as, for example, just before “Hugh dropped his voice”; the second green left single quotation mark above is actually not closed until a couple of paragraphs later, on the lower half of page 366, after the words “My father looked very upset”). The first double quotation marks in blue are scare quotes; there is a character in a manuscript identified only as “A Traveller”, and the novelist uses scare quotes in the written form of Hugh James's spoken utterance to make it clear that, in the part shown here in blue, Hugh James is not using an indefinite noun phrase in his own voice but rather using a repetition of the manuscript's way of identifying a certain definite individual. Later the green type is interrupted by another blue section, in double quotation marks, where Hugh quotes the epitaph.

Also embedded in the narrator's father's double-quoted sections of the novel are letters from another character, the father's mentor Bartolomeo Rossi, and these are in italics. Had they been shown in quotes instead, those would have been single quotation marks.

This is really a novel of very complex narrative structure. One has to keep one's eye on the ball, and one's recursion-depth counter on the level of quotations in which one is currently embedded. This kind of complexity will not be found very often in any kind of literature. But at least I have been able to show you a paragraph that opens with the sequence <Left Double Quotation Mark> <Left Single Quotation Mark> <Left Double Quotation Mark>. And I could have put things differently by saying that the paragraph begins with ‘’, giving you a four-quotation-mark sequence to read (the outermost single quotation marks would be mine, to indicate that I am quoting a string composed of the other multicolored ones). And if someone else quoted me, they would need to add yet another set of quotation marks (those should be double quotation marks, since I used single). In principle, there is no limit.

Nerd note: I leave it as an exercise for those readers who are acquainted with the methods of formal language theory to turn what I have just explained into a rigorous argument that the set of all possible properly punctuated English texts cannot be accepted by any finite-state automaton.

For more nerdy typographical stuff about quotation marks in various languages, see this post by John Cowan.

Posted by Geoffrey K. Pullum at 02:06 PM

Partial credit for "pigeon English": not new in New Zealand

A few days ago, there was a small bubble of news reports to the effect that the New Zealand Qualifications Authority was planning to follow the example of the Scottish Qualifications Authority and allow free-form spelling and "texting" abbreviations on the NCEA examinations. But according to a story by Claire Trevett in today's New Zealand Herald, this was all a mistake -- yet another example of the danger of trusting what you read in the mass media, a creative but undisciplined arena that has yet to work out how to impose the checks and balances that we in the blogosphere take for granted.

Bali Haque, deputy chief executive of the authority, said there had been no change to guidelines and there was no specific policy about text language.

However, he warned: "If people are expecting they can come up with an exam script full of text and pass, then they're dreaming.

"Examiners will be expecting the use of the English language in full. I think students are intelligent enough to understand that. Most would know the difference between using formal language in an exam and informal with friends on the weekend."

(If this is really what Mr. Haque said, by the way, it's an interesting example of the word text apparently being used to mean "writing of the kind found in cell-phone text messages".)

The best part of this story, in my opinion, was the statement issued by Bill English, who is described in the story as "National education spokesman Bill English", which apparently means that he is the spokesman for educational matters of the National Party. Mr. English's statement read in part:

This kind of pigeon English is fine for young people organising their social lives, but it is not an acceptable way of expressing an academic argument or idea.

The Education minister, Steve Maharey, used this to explain what story calls "NZQA's policy of forgiving minor mistakes that were understandable in an otherwise strong answer":

The statement is understandable, despite pidgin being spelt p-i-g-e-o-n, as in a bird from the dove family, rather than p-i-d-g-i-n, as in simplified language used between persons of a different nationality. But we will give him credit.

This is a sensible implementation of a sensible policy, familiar to anyone who has graded a set of college essays. I wonder if the Scottish Qualifications Authority case was a similar non-story, blown up by the Guardian and other British papers to fill a hole on a slow day. As we've often observed, the traditional media will never be able to fulfill their undoubted promise as an information source until they can find a way to impose some elementary standards of accuracy and accountability.

[Update -- Ben Zimmer points out that:

Through the late 19th century, "pigeon" was a common variant for "pidgin" -- in fact, "pigeon (English)" predates "pidgin (English)". An early citation for "pigeon English" from 1857 (which I contributed to the OED's latest draft entry) can be found here:

Train, George Francis, An American merchant in Europe, Asia, and Australia.New York, G.P. Putnam, 1857. (p. 101)

On every side of you, Pigeon English - that horrible jargon of multilated baby talk which custom has made law - meets you. From boatwomen to shopmen - house boy to compradore - you hear nothing else. I endeavored to get a copy of Hamlet's soliloquy, which was translated into Pigeon English, but I have failed to do it. I can only remember its commencement.
"To be or not to be" reads: "Can - no can."

"Pidgin English" shows up about a decade later:

Tileston, W. M., "Tea Leaves", Overland monthly and Out West magazine. Volume 3, Issue 6, Dec 1869, pp.539-544

(p. 539) The "pidgin English" which followed, was too much for our untutored intellects to comprehend.

(p. 543) We asked Ah Lum to translate one of the songs for us; but the effort to put the words of one of his native poets into "pidgin English" was too much.

Somehow I doubt that Mr. English was attempting to spell following pre-1869 norms. But this is one more reason to be tolerant of spelling mistakes -- as someone who often makes such mistakes, I certainly have a personal interest in the availability of forgiveness. Though I think that we are still allowed to enjoy the display of self-refuting hypocrisy on the part of the intolerant.]

Posted by Mark Liberman at 10:44 AM

Field linguists at work

Now that I've gotten around to posting suggestions of books that show linguists at work, correspondents are writing to fill in gaps in my list.  My list had a couple of items depicting field linguistics, but I missed several good books about field work.

Curtis Booth writes to nominate Leanne Hinton's wonderful collection of essays on California Indian languages, Flutes of Fire.  To which Peter Austin adds Paul Newman and Martha Ratliff's Linguistic Fieldwork, a collection of essays by a number of accomplished field linguists about all aspects of the field experience, and Mark Abley's Spoken Here: Travels Among Threatened Languages, specifically on working with endangered languages, including revitalizing them.

For sociolinguistic fieldwork, I know of nothing quite like these volumes about traditional field linguistics.  My current best recommendation is Penny Eckert's 1989 volume Jocks and Burnouts, because it treats both quantitative research and ethnographic description, and because it's engagingly written.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:56 AM

The film that dare not speak its name

I went this evening to a Berkeley screening of a new documentary called Fuck. Or make that F*ck, or F**k, or ****, depending on which newspaper the film is being advertised in (when it opens at the Nuart in LA tomorrow, the marquee will read simply "FOUR-LETTER MOVIE"). It's a funny and fascinating farrago of four-letter fact and fable -- and you can quote me on that. Though in the interest of full disclosure I should add that I appear in the movie, along with Jesse Sheidlower of the OED and a supporting cast of talking heads that includes Bill Maher, Drew Carey, Sam Donaldson, Billy Connolly, Ice T, Alan Keyes, Alanis Morissette, Chuck D, Hunter S. Thompson, and Pat Boone (whose presence entitles me to claim a Kevin Bacon number of 3). The film will open in LA and New York tomorrow, and over the following weeks in selected cities, including San Francisco, Berkeley, Minneapolis, Chicago, Boston, San Diego, Portland, Seattle, Atlanta, Santa Cruz, Washington DC, and St Louis. See you at the movies!

Posted by Geoff Nunberg at 02:32 AM

November 09, 2006

Ill-judged word choice lost Congress for GOP?

When Senator George Allen (R, Virginia) announced today that he had given up his attempt at re-election to the US Senate and conceded to his Democratic opponent, it became clear that the Democratic party will control the Senate as well as the House in the next Congress. The margin by which Allen lost was only about 7,000 votes (roughly half a percent of the voters). For many voters, the decision was influenced by Allen's foot-in-mouth problem, and particularly the incident in which he twice referred to S. R. Sidarth, a brown-skinned Democrat campaign tracker whose parents came from India, by the derisive nickname Macaca (previous Language Log epithet-watch coverage here and here and here). The word was almost certainly intended as a racial epithet. It is familiar among French colonists in Africa in that capacity (Wikipedia has the basic facts about the word here), and Allen's mother is a French-speaking Tunisian Jew who would have been quite likely to use the word that way in referring to North African Arabs. (Allen seemed to think Sidarth was an immigrant. He is not; he is Virginia-born and raised.) The subsequent brouhaha ultimately necessitated a public apology from Allen. All in all, it seems unlikely that the number of voters swayed by the Macacagate affair was less than the 7,000 margin. And if that is right, then the control of the US Senate and thus the entire legislator may have been turned over to a different party because of one thoughtless nickname choice by a tired and irritated candidate. (That's not an exculpation, by the way. Tired and irritable he was, but reprehensible nonetheless.) It was surely one of the biggest consequences of an on-the-fly nickname choice in all of history. Watch your mouth, politicians. It's a linguistic jungle out there.

You know, putting this incident alongside various others (like the birthday babbling that cost Trent Lott his job), it sometimes seems to me that politicians get insufficient training in choice of words and idioms. It is as if they have not yet grasped the nature of the huge change has taken place with respect to racism in the United States over the past forty years. Those who want to get their language use in line with current standards should understand it very clearly. It is not that racism has gone away (good heavens, surely nobody thinks that will ever happen). And it's not that racist talk has been made illegal, or ever could be: the First Amendment is simply not going to allow that. You can speak your opinions in this country, and express anything you want about the racial inferiority or utter subhuman vileness of any racial group you may want to take out after. No, it's not illegal to say racist things, it's not even a misdemeanour; it is something much worse, for racists, that has happened. Racism has become not just unfashionable (itself almost a kiss of death for those in public life) but unacceptably disgusting to most thinking people. And that's much more serious.

If you're a political candidate, then for you to say something on camera that suggests racist attitudes or beliefs is comparable to, oh, something like putting your hand down the back of your pants to scratch your asshole and then sniffing your finger. Nothing illegal there. But your campaign will take a downswing from the moment that video clip hits YouTube.

This is not about the mythical political-correctness "word police" of which the right-wingers disingenuously complain. This is about thinking people simply seeing what you do and turning away in disgust. It if were just illegal to say "nigger" or "spic", a politician could perhaps survive it (politicians do survive drunk driving arrests, and surely drunk driving is enormously more serious and dangerous than having negative opinions about some racial group). But it's worse than illegal. It picks you out as someone to stay away from. It identifies you as disgusting and fit only to be shunned. A person who would never be invited to dinner. And you won't survive that in modern American politics.

Posted by Geoffrey K. Pullum at 09:05 PM

Two upticks in a classical allusion

Who caused the spike on June 7? This guy.

I shouldn't have to tell you who caused yesterday's spike.

Moral equivalence? Certainly not. The same metaphor? It's a fact.

Posted by Mark Liberman at 03:56 PM

Only 17 words for snow

I'm not sure what the current record is for Eskimo N, the number of words the Eskimos are claimed to have for snow, but this Sunday's New York Times Book Review yielded an unusually modest Eskimo N, 17.

From Christopher Buckley's review (p. 18) of Chris Miller's The Real Animal House: The Awesomely Depraved Saga of the Fraternity That Inspired the Movie, on the vocabulary of the members of Alpha Delta Phi at Dartmouth in the early 1960's:

There are... a few relatively innocent terms, like the synonyms for breasts: "jehoshaphats," "baboos," "wazookies," "ka-hogas" and of course "gabongas."  The Inuit language contains -- what? -- 17 different words for "snow"?  The AD's must have twice that many for "vomit."

Buckley has obviously pulled the number 17 out of his, um, hat.  This number is what you're likely to come up with when you're asked to pick a random number: it's the smallest prime number without any special cultural significance.  The numbers 2, 3, 5, 7, and 13 are clearly special; 11 is not quite so special, though it is the number of players on a football team (American or Association), and then you're up to 17.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:01 AM

Progress in malignancy tagging

There's some new news from BIOIE, an NSF-sponsored research project on information extraction from biomedical text, which is one of my day jobs. I share faculty responsibility with Fernando Pereira and Aravind Joshi at Penn, and Pete White at Children's Hospital; but as usual in such projects, most of the real research is done by graduate students. One of those students, Yang Jin, has just had a paper accepted by BMC Bioinformatics: "Automated recognition of malignancy mentions in biomedical literature".

Last month, I posted about "Fable 2.0", an on-line system that automatically tags articles with mentions of genes, normalizes the mentions so that various different ways of referring to the same gene are connected, and lets you search millions of articles to find genes associated with arbitrary boolean combinations of keywords. Yang's new paper applies the same named-entity tagger to finding clinical descriptions of malignancies, as part of a larger strategy to link molecular and phenotypic observations, both in reports of laboratory research and in clinical records.

Yang used the "same tagger" as Fable does, in the sense that he used a general-purpose program that will attempt to learn how to "tag" any sort of text regions at all, generalizing from a body of hand-tagged training material. To make a gene tagger, the program was trained on text hand-annotated for genes. To make a malignancy tagger, it was trained on text hand-annotated for malignancies. (This tagger was developed by Ryan McDonald while he was a grad student at Penn -- Ryan is now at Google -- based on the Mallet machine learning toolkit.)

Yang's malignancy tagger works pretty well: 0.84 precision, 0.83 recall, 0.84 F-measure. ("Precision" is the proportion of hits that are valid; "recall" is the proportion of valid mentions that are found; the "F-measure" is the harmonic mean of precision and recall. These days, across various entity types and document collections, such taggers generally have F-measures in the 0.7-0.9 range.)

Yang's tagger also worked notably better than the obvious baseline of string-matching against a term list. Yang took the National Cancer Institute's neoplasm ontology, a term list of 5,555 malignancies, and tested it (on a random subset of abstracts from the larger test set) using case-insensitive string matching. Of the 202 malignancy mentions in this subset, the term-list method found only 85, for a recall of 0.42, while his tagger found 190, for a recall of 0.94. The mentions missed by term-list matching but found by the tagger included some variations in form for items already on the NCI list (e.g. "leukaemia" vs. "leukemia" or AML vs. "acute myeloid leukemia"), but also quite a few that simply weren't on the list in any form, such as "temporal lobe benign capillary haemangioblastoma" and "parietal lobe ganglioglioma".

One of the most interesting and promising results was an essentially negative one. Yang trained the tagger in one trial with a completely generic set of features (words, character n-grams, and so on), that could be used for any entity tagging task at all, and in another trial with additional cancer-specific feature sets, in particular the NCI term list and a list of indicative suffixes. The generic tagger scored an F-meaure of 0.834, while the addition of the cancer-specific feature sets only improved its performance to 0.838. This suggests that for some biomedical tagging tasks, domain-specific lexicons and other task-specific feature sets may not be needed.

But the single most important part of this story, in my opinion, is who Yang is. He's a graduate student in neuroscience, not in computer science or even in bioinformatics. We're beginning to enter an era when text-mining techniques are just another scientific tool, like a centrifuge or perhaps more analogously a package of software for fMRI analysis, available for use by researchers whose goals have no intrinsic connection to the analysis of language.

Posted by Mark Liberman at 07:50 AM

November 08, 2006

Double disingenuousness

Stephen Rowland has made me very happy. He has found a linguification phrased as an embedded rhetorical question. An astonishing blend of two of my recent topics of interest. And in fact it actually co-occurs with a third trope, irony. It's from an article by Michael Gove in the Times (London) on November 8, 2006:

[A]ll true fans of The Sound of Music know that the most important role in the production, the moral centre of the show, is the Captain.

Before I go any farther, I know that I have to pause while some, mildly perturbed, readers wonder how one can shoehorn the words "moral centre" into a sentence which also contains the phrase "Sound of Music".

Isn't that wonderful? Two disingenuousnesses compounded. He does not really think his readers will be wondering about how a musical can be described as having a moral center, he is being ironical, and pretending he thinks it is uncontroversial shared knowledge that musicals don't have moral centers (the mutual knowledge of what the answer is supposed to be is what makes it a rhetorical interrogative); and he doesn't really mean to raise the question of whether the word sequence "moral center" might occur in a properly formed sentence where "Sound of Music" also occurs (that's the linguification). I know that I have to pause while some, mildly amused, readers wonder how one can shoehorn three tropes into one short but rhetoric-heavy sentence.

Thanks, Stephen!

Posted by Geoffrey K. Pullum at 10:31 PM

Try and stop me, FCC

The Language Log news department has learned that the FCC has just decided to reverse itself on certain cases where they had ruled against uses of obscene language on broadcast media. The CBS Early Show will not be fined for broadcasting an occurrence of the word "bullshitter". (The New York Daily News is remarkably coy, though, printing it as "bulls-er"; comment later from Arnold Zwicky of the Language Taboo Desk). It seems that the FCC is going to allow more freedom for use of filthy language on news programs than is allowed for similar cursing on other programs.

Well, Language Log is, of course, part of the news media, so we have even more freedom now to say whatever we want, and broadcast it if we want to. Which gives me the right to tell you about something that just arrived in the mail that I thought I would share with you. Something to convince you that we professors, even Kantian moral philosophy professors, are not all prim and fusty and severe; we are open, red-blooded, and always ready for a laugh.

Professor Jeffrie Murphy of Arizona State University has just published the presidential address that he gave in consequence of his receiving the honor of selection as President of the Pacific Division of the American Philosophical Association ("Legal moralism and retribution revisited," Proceedings and Addresses of the American Philosophical Association, 80.2, November 2006, 45-62). And about being awarded the presidency by his peers, he says (p. 45):

The very day after I received notification of my selection as president, my wife and I went to see the Francois Ozan film "The Swimming Pool." Early in that film, the Charlotte Rampling character — commenting on literary and academic awards — makes this remark: "Awards are like hemorrhoids — eventually every asshole gets one."

And we can report filthy talk like that, you see, without fear of prosecution, because we are Language Log, and we are part of the media in a free nation.

Posted by Geoffrey K. Pullum at 04:13 PM

Linguists at work

Reading Anatoly Liberman's Word Origins ...and How We Know Them: Etymology for Everyone (Oxford University Press, 2005) has brought me back to some unfinished Language Log business from long ago, a 2004 query about books that "could give a potential linguist some sense of what it's like to be a linguist, to do linguistics".

Back then I said:

I found this a surprisingly difficult question. Not-bad introductions to linguistics aren't hard to come by, and there are some pretty good surveys of what has (or, actually, had) been done in the field: some of the chapters in Shopen's set Language Typology and Syntactic Description and in Newmeyer's Cambridge Survey of Linguistics, for example. But such works present the product of doing linguistics, not the activity.

For a feel for what it's like to do syntax, maybe Green & Morgan's Practical Guide to Syntactic Analysis.

For a sense of what it's like to do fieldwork and to discover something about the structure of a language, the two Shopen volumes Languages and Their Speakers and Languages and Their Status.

And for thought-provoking reasonably brief essays, the two books that I most often give to non-linguist friends who are interested in language: Bauer & Trudgill's Language Myths and, especially, Pullum's Great Eskimo Vocabulary Hoax.

Then some weeks ago a correspondent who was working his way through the Language Log archives from the very beginning wrote to ask if I had ever answered the question (alas, no), and right after that my copy of Liberman's etymology book arrived.

So let's start with the Liberman book, which I think is wonderful at showing, in detail, how word histories are uncovered (or, as is often the case, not).  Along the way you get a lot of fascinating etymologies, plus accounts of sound symbolism, borrowing, sound change, semantic change, comparative reconstruction, and much more.  You should carry away an appreciation of just how HARD etymology is, what an immense store of background knowlege is required to do it well, how provisional many of the histories are, and how much of history is probably not recoverable at all.

The whole book might be a bit much for some readers, but chapter 13 ("A Retrospect: The Methods of Etymology") gives a nice summary, and the two chapters that follow, on sound change and semantic change, illustrate well the etymologist in action.  The enormously entertaining chapter 16 ("The Origin of the Earliest Words and Ancient Roots") could, I think, be read on its own.

Now back to the 2004 question and replies to it.  Several people seconded my nomination of The Great Eskimo Vocabulary Hoax.  But two suggestions dominated the responses I got: Steven Pinker's The Language Instinct (which won Pinker the first Linguistics, Language, and the Public Award from the LSA in 1997) -- I can't imagine how I could have left this book off my list -- and Language Log itself.   (Remember: unlike Steve, we offer a full money-back policy to anyone who's dissatisfied with the services we provide.)  And now some of our stuff has been published in Far from the Madding Gerund (see ad on front page, and buy the book!).

Adam Parrish nominated Thomas Payne's Describing Morphosyntax, saying: "It provides an outline for a morphosyntactic description of a language and instructions on how to fill in the details.  It's billed as a guide for fieldworkers, but I just like to read it and marvel at how languages are simultaneously diverse and similar."

And Matt Post suggested, for computational approaches to language, the first few chapters of Daniel Jurafsky & James Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.

Some further suggestions that have occurred to me: Pinker's 1999 book Words and Rules: The Ingredients of Language, which shows an experimental psycholinguist grappling with a lot of messy details about language, in particular about inflectional morphology; George Miller's 1977 Spontaneous Apprentices: Children and Language, in which you get to watch Miller and Phil Johnson-Laird struggle to do research on child language acquisition; and Miller's 1991 The Science of Words, about all things having to do with words, with much discussion of experimental work.  Miller is an especially engaging writer, by the way.

Finally, Mark Liberman recommended a very different sort of writing, fiction with linguist characters:

None of these are by linguists. All of them involve sympathetic central characters who turn out to be better at analyzing the structure and content of exotic languages than the structure and content of their own lives.

Ted Chiang's Story of Your Life, discussed here.

Mary Doria Russell's The Sparrow, discussed here.

Malcolm Bradbury's Rates of Exchange, discussed here.

(There's quite a lot of fiction with linguists in it, but these are works in which you get to see some actual linguistic analysis being done.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:26 PM

Vote for the woman whose mother uses this verb

Claire McCaskill, who ran successfully in Missouri for a US Senate seat and beat out incumbent Jim Talent yesterday, said to Renée Montagne on NPR this morning that part of the explanation for her success among the typically conservative people in the rural areas of the state was that her mother was a native of the Ozarks, and is so rural that "she's the kind of woman that says ‘hornswoggle’ as part of her ordinary vocabulary." The inference from having the verb lexeme hornswoggle to being appealing to Missouri farmers was apparently supposed to be completely obvious, and for a moment I thought that was completely nuts. Why, I thought, would anyone imagine that a person was worth voting for because her mother knew a certain lexical item? Let's say I have the word psephologist in my active vocabulary; does that help you in deciding whether you would cast a vote for my son Calvin?

But I guess the reasoning goes like this: "McCaskill's mother uses hornswoggle; I use hornswoggle; hornswoggle is rare or unknown in standard dialects, but familiar in dialects of rural people like me; so the mother is probably a rural person like me; so the mother probably has similar values to mine; and mothers teach their values to their daughters; so the daughter probably has them too; so a vote for her will probably be a vote for someone with values like my own." Far from being a foolproof reasoning chain, but not entirely as irrational as at first one might think. Voting is so often a matter of looking at a brief resumé of a person you don't know, plus some repellent negative allegations about them in an opponent's TV ads, crossing your fingers, and hoping the electee won't turn out to be just another rascal. Using a statistically unusual lexical item as a possible indicator of membership in a social group with values you like might be one small way to make the process less irrational.

Of course, the sociolinguistic judgment about the item in question may not be right; as Ben Zimmer remarked to me at the water cooler in Language Log Plaza this morning:

I don't know if "hornswoggle" is such a reliable sociolinguistic index. Ann Coulter, who hails from New Canaan, CT, once said that President Bush has shown "how easy it is to hornswoggle liberals." Somehow I don't think those rural McCaskill voters would feel much social or political kinship with Coulter.

How would a farm housewife in Wright County, MO, react to a quintessentially urban blonde bombshell who makes her living as rabid liberal-baiter, hostile TV personality, fire-breathing columnist, self-parodist, and ultraconservative performance artist? I don't know. Some psephologist is probably working on it.

Update: We put an intern onto checking Coulter's biography, and it turns out her mother was born in Paducah, Kentucky! Maybe you can take the girl out of the country but you can't take the country out of the girl. We are now trying to lure Coulter to Philadelphia so we can get her into our basement sociolinguistics lab, where we have... umm... equipment suitable for robust and forceful interrogation. We use it for eliciting information about speakers' dialect backgrounds. More news as we manage to extract it.

Posted by Geoffrey K. Pullum at 11:30 AM

Swear it

To celebrate the publication of Keith Allan and Kate Burridge's Forbidden Words: Taboo and the Censoring of Language (Cambridge University Press, 2006), I reproduce today's poem on the Writer's Almanac, "Swear It" by Marge Piercy (from The Crooked Inheritance, 2006):

Swear It
My mother swore ripely, inventively
a flashing storm of American and Yiddish
thundering onto my head and shoulders.
My father swore briefly, like an ax
descending on the nape of a sinner.

But all the relatives on my father's
side, gosh, they said, goldarnit.
What happened to those purveyors
of soft putty cussing, go to heck,
they would mutter, you son of a gun.

They had limbs instead of legs.
Privates encompassed everything
from bow to stern. They did
number one and number two
and eventually, perhaps, it.

It has always amazed me there are
words too potent to say to those
whose ears are tender as baby
lettuces--often those who label
us into narrow jars with salt and

vinegar, saying, People like them,
meaning me and mine. Never say
the K or N word, just quietly shut
and bolt the door. Just politely
insert your foot in the Other's face.

The new Allan & Burridge follows their 1991 Euphemism and Dysphemism: Language Used as Shield and Weapon, the two volumes together making a thorough survey of the field of taboo language, its uses, and its regulation, discussed "with deep erudition and a light touch" (as Steve Pinker puts it in his blurb on the back cover).  Full disclosure: I am one of the helpful friends and colleagues thanked in the Acknowledgements.  (I was surprised to see my name so early in this list -- third, right before Bill Bright -- but then I realized it was alphabetized by first name.  Bill Leap comes last, my usual place in such lists, because he appears under the name William Leap.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:43 AM

Giant space ants win control of Congress?

Jonah Goldberg's comment on the election results:

Let Me Just Say...
I for one welcome our new Democratic overlords. I'd like to remind them that as a trusted rightwing personality, I can be helpful in rounding up others to toil in their underground sugar caves.

The "explainer" that he links to is here, but it offers only the picture. Except for the announcer's simpsonian (simpsonical? simpsoniacal? simpsonistic? simpsonish?) yellow skin, there's no indication of the actual source of the joke.

That's because Goldberg has used this allusion before, to frame another electoral outcome that he didn't like, and he expects his readers to remember the explanation that he gave on that occasion. The date was 2/03/00, and John McCain had just defeated George W. Bush in the New Hampshire primary.

Goldberg explained "The Meaning of McCain" this way:

Recall, if you will, the episode of the Simpsons when Homer is selected to be a space shuttle astronaut. News anchor Kent Brockman is scheduled to interview the shuttle crew while they are in orbit.

But just before they "switch live" to the crew of the corvair craft, there's a mishap on board. Homer, unaccustomed to weightlessness, is veering, out of control, straight toward the ant farm the crew brought along for study. [...]

When news anchor Kent Brockman cuts to the live feed from the shuttle, the ants float by the camera lens — momentarily appearing gigantic. Then they lose the picture. Brockman instantaneously reports:

"Ladies and gentlemen, er, we've just lost the picture, but, uh, what we've seen speaks for itself. The Corvair spacecraft has been taken over — 'conquered', if you will — by a master race of giant space ants. It's difficult to tell from this vantage point whether they will consume the captive earth men or merely enslave them. One thing is for certain, there is no stopping them; the ants will soon be here. And I, for one, welcome our new insect overlords. I'd like to remind them that as a trusted TV personality, I can be helpful in rounding up others to...toil in their underground sugar caves."

When it becomes clear that the bugs are in fact not a "master race of giant space ants", Brockman quickly removes his "Hail Ants" sign hanging just behind him, covering the station logo. [...]

The moral of the story is that journalists (and party hacks) love power. Whether it's a new insect overlord or a candidate suddenly surging at the polls, the chattering class works under the assumption that whoever has power now will have it for a long time.

And some people doubt the benefits of a classical education!

Goldberg continued:

If that's not highbrow enough for you, consider George Orwell's 1946 observation that "Power-worship blurs political judgement because it leads, almost unavoidably, to the belief that present trends will continue."

As far as a few minutes' web research allow me to determine, Jonah has reserved the "overlords" allusion specifically for apparent defeats of W, whose opponents are all thus classified as temporarily magnified insects. Rumor has it that researchers at the Rockridge Institute have been toiling through the night to develop an effective counter-allusion. So far the leading candidates are " Nyah, Nyah: We're Back" and "Attach the Stone of Triumph!" More on this as it develops.

Posted by Mark Liberman at 08:45 AM

November 07, 2006

Attested subordinate rhetorical interrogatives

Almost as soon as I mentioned that it would be interesting to find actual examples confirming Ivano Caponigro's suggestion that interrogative subordinate clauses could have rhetorical-question interpretations, Mark Liberman noticed one in something he was reading. Well, further overwhelmingly convincing evidence has been coming in of actual examples showing that interrogative content clauses can indeed express rhetorical questions.

Bruce Rusk, of Cornell University's Department of Asian Studies, contributed one from the early 18th century:

The new prophesying Sect, I made mention of above, pretend, it seems, among many other Miracles, to have had a most signal one, acted premeditately, and with warning, before many hundreds of People, who actually give Testimony to the Truth of it. But I wou'd only ask, Whether there were present, among those hundreds, any one Person, who having never been of their Sect, or addicted to their Way, will give the same Testimony with them?

[From: Anthony Ashley Cooper, Earl of Shaftesbury, 1671-1713: Characteristicks of Men, Manners, Opinions, Times (1711-1714), Volume I, Treatise I, Section VI; link here

He also points out that rhetorical questions can be embedded in rhetorical questions. One structure that often shows this is the rhetorical "Dare I ask...":

And, dare I ask who could not benefit from reading certain experienced Illinois trial lawyer's tips and tactics. ;^>

Says Bruce: "The writer knows that everyone would benefit from reading these tips (and of course dares to ask). And doesn't even bother with the question mark."

Bruce offers another similar expression favoring rhetorical interpretation: "Should I even ask...":

Should I even ASK how the field trip was? :)

Bruce says: "The commenter assumes it was bad and that he/she should not ask. I think (tone can be hard to judge)."

A similar structure that Bruce points out is "Do I have to ask...":

LMAO at the fools that take up politics. Do I have to ask what party is doing this??? and like the previous post said, the union mafia (dimwit votes) is probably going to get P.O.'d about this.   [Comment at]

He notes that the continuation with "and" heightens the rhetorical force of the question, and adds:

Actually, in the last case I think the rhetorical nature of the question is only apparent if it's so framed. In the first case ("who could not benefit..."), the rhetorical force is apparent even without the framing. In the second, it's less clear, though it could be read into "And how was the field trip? :)" In this last case, however, I don't think it would be apparent to a reader (though it might be, to a speaker, from tone) that the question was rhetorical: "What party is doing this???" just sounds (again, in writing) like a "real question."

Can a question be rhetorical precisely because it's embedded in another?

Ora Matushansky wrote from Paris with another pair of attested examples, found by looking for "makes me wonder" + "could possibly" on Google:

which kind of makes me wonder how you could possibly find less funny videos

which is definitely an improvement, but makes me wonder what exactly was the point in adding the RFID chip in the first place?

Let us agree, then, that Ivano Caponigro's intuition is correct: the device of the rhetorical question should perhaps be referred to more broadly as the rhetorical interrogative, since interrogative clauses that do not directly ask questions can indeed have the flavor that independent-clause rhetorical questions have.

Posted by Geoffrey K. Pullum at 01:05 PM

The Duff curriculum

Arnold Zwicky recently wrote that "I recalled with pleasure [C.C.] Fries's careful development of a system of parts of speech via distributional analysis, using as raw data some fifty hours of (covertly) recorded conversations". Tom Duff emailed an interesting suggestion in response:

I wonder if there's a primary education hook here (and a way to promote general Linguistics awareness.) Unless the math is too heavyweight, it sounds like a research program that schoolkids could replicate: taking down each other's speech, analyzing the data, discovering the grammar of the language as used by their peers. I would have been so stoked by this when I was 9 or 10.

It sounds like a complete primary education program -- English, science & math all rolled together. And talking in class!

I think this is an absolutely terrific idea. There are many difficulties, some of which I'll sketch below, but the opportunities are even greater. And much of the needed computational infrastructure could be shared with other projects, pedagogical and otherwise.

First, let's generalize the idea. Although distributional analysis of word classes would be a fine thing to do, you wouldn't want either to start there or to stop there. Students could learn some acoustic physics with their math, while looking at pitch contours or measuring formant frequencies and segment durations. They could learn some simple statistics, especially if data is available to them from multiple classes and schools, by looking at the effects of age, sex, region and so on. They could analyze the rhetoric and the performance of speeches, or the dynamics of conversation, looking at how gestures and facial expressions are aligned with words and phrases. They could compare vernacular and formal speech. They could look at different languages, for example to see how differently words with similar meanings are used.

I can say from personal (though informal) experience that bright nine- or ten-year-olds are interested in this kind of thing, at least at the level of looking at waveforms, spectrograms and pitch tracks of their own speaking, singing and assorted weird noises, or using web search to try to figure out what the right way to say something in Spanish is.

And as a technical matter, it would be fairly easy to make such analyses available to kids. Most of the needed infrastructure is already available, as free software on generic personal computers -- though you'd need to create more kid-friendly (or teacher-friendly) versions in some cases. There's one thing that's still missing, however: support for sharing data and for conveniently accessing shared data. The main motivation for this is that many interesting things, including distribution analysis of word classes, require more data than one class could collect; but even sharing data within a group of 30 or so students could be challenging without an appropriate system.

Here's one idea about what you'd want: a server where anyone can upload audio (and video too) with appropriate metadata; an Ajax-based tool for creating, editing and viewing transcriptions (and other time-aligned annotations), also saved and accessed on the net; a mechanism for defining virtual corpora out of sets of these annotated audio/video files; and a user interface (and an API) for searching such virtual corpora.

This would be useful for education through the graduate-school level, and for many scientific and engineering projects as well. I think that anyone who's ever taught or done research in this general area can see how it might be used.

OK, enough enthusiasm. Now for some of the (very serious) problems with the idea.

1. Most elementary-school and high-school teachers don't have the background needed to understand and teach such stuff, much less to create course materials based on it.

2. There are ethical and legal problems, in the general area covered by "human subjects" regulations, that are more acute in dealing with kids. You'd have to worry about how to prevent students from releasing information about personal identity, or inappropriate information about themselves and their families, or slander about their classmates, or whatever. This is related to the problems that myspace and facebook have, except that in this case, (some of) the material would be created or used under the authority of schools, who need to be much more cautious.

3. Even if problems (1) and (2) were dealt with, my guess is that the hardest problem here is how to create "lab exercises" that would work for students of different ages, backgrounds and interests, as presented by a similarly diverse set of teachers.

All the same, the general idea is a wonderful one. The (additional) infrastructure is worth implementing for other reasons -- more on this later. And I guess the way to make progress on the pedagogical problems would be to try it out with some kids in pilot projects, which could be in schools or in other contexts, like a summer camp or a museum program.

[Update -- Mike Maxwell and Bill Poser remind me that Ken Hale had the idea, more than 30 years ago, of using study of the Navajo language to teach the scientific method to Navajo students. Bill mentioned this work in an earlier LL post ("Reintroducing diagramming", 11/7/2004). Ken wrote a (still unpublished) textbook in support of this idea. Mike also cites Josie White Eagle, "Teaching Scientific Inquiry and the Winnebago Language", International Journal of American Linguistics, 48 306-319, and a paper by Michael Barkey, "Linguistics and Scientific Inquiry" (ms. dated 9/4/2006), which includes a brief review of "what others have done" (pp. 26-28), including Nigel Fabb's "Linguistics for ten year olds" (MIT Working Papers in Linguistics, 6, 45-61, 1985). Josie White Eagle's paper in turn refers us to a 1970 paper by Samuel Kay Keyser, "The role of linguistics in the elementary school curriculum", Elementary English, January 1970, 39-45. A bit of internet searching also turned up a brief review by Wayne O'Neill, "Linguistics in the Science Classroom: Progress and Prospects".

Since the general concept has been around for more than a generation, we need to ask why it has never been adopted to any significant extent. My speculation would be that it's because curricular innovation is hard; because most teachers lack the knowledge and skills needed to teach such material; and because the cultural trend has been strongly against teaching any analytic skills at all, at least in the area of language and communication.

Are things any different now? Well, the anti-analytic tide may have turned; the internet's "long tail" effect makes it easier for enterprising teachers to find and use curricular materials; and it may be possible to design interactive web-based materials and tools that can help teachers (and students) develop the concepts and skills that they need to make such ideas work. Also, we might get some added traction from the use of corpus-based rather than intuition-based methods, especially for kids who are already used to internet search.]

Posted by Mark Liberman at 07:49 AM

Leanne Hinton Wins Lannan Award

Yesterday's New York Times carried a full-page ad announcing the 2006 winners of Lannan Awards for Cultural Freedom. One recipient is Leanne Hinton of the University of California at Berkeley, arguably the world's most effective and influential advocate for language preservation and revitalization. Leanne has long worked with California Indian tribes who are on the point of losing, or have lost, their heritage languages. Her famous Master-Apprentice program has been adopted by communities in which a few elders still speak the tribal language fluently; her regular Breath of Life workshops at Berkeley are an important resource for communities whose languages are no longer spoken but are sufficiently well documented that they can (with hard work and some luck) be revived. Shortly before Ken Hale died, he and Leanne co-edited the influential sourcebook The Green Book of Language Revitalization in Practice. Everyone who works with Native American tribes, and with other communities around the world whose heritage languages are endangered or moribund, is greatly indebted to Leanne for her work and her inspiration. And with the most optimistic estimates predicting the death of 50% of the world's 6,000 or so languages by the end of this century (the most pessimistic estimates range up to a 90% extinction tally by 2100), all linguists ought to respect Leanne's work and to congratulate her on her Lannan Award.

Posted by Sally Thomason at 07:26 AM

November 06, 2006

Charles Carpenter Fries

I had occasion recently to refer a graduate student to Charles Carpenter Fries's 1952 book The Structure of English.  She's working on a cluster of issues having to do with syntactic categories and subcategories, and I recalled with pleasure Fries's careful development of a system of parts of speech via distributional analysis, using as raw data some fifty hours of (covertly) recorded conversations.  Though many linguists are now looking at syntactic categories and subcategories through the lens of the constructions words can and cannot occur in, and though a great many linguists now draw their data from corpora, Fries's work is scarcely known.  He has no Wikipedia page, except for a place-filler ("Diese Seite existiert noch nicht") on the German Wikipedia site.

Well, I think it's time for people to pay some attention to C. C. Fries.

I never met Fries, or Paul Roberts, whose 1956 textbook Patterns of English is a presentation of Fries's system for classroom use.  But the two books are an important part of my intellectual history: one of my high school English teachers used Patterns as a text in English grammar -- quite a remarkable step, then as now -- and so gave me my first taste of linguistics.  It was delicious.  A couple of years later, at Princeton, I took intro linguistics (with the Gleason text) first chance I got, even though I was a math major.  I was hooked.  On to the intro to historical linguistics (with the Hockett text) and reading Sapir, Bloomfield, Fries's Structure book, Harris's Methods in Structural Linguistics (1951), and, yes Syntactic Structures.

The Fries system has four major syntactic categories, called "parts of speech", in "classes" numbered 1 through 4 (Roberts maintains Fries's notation, but is willing to label the four classes Noun, Verb, Adjective, and Adverb), plus fifteen minor categories of "function words", in "groups" lettered A through O.  Some of the groups have only one member (Group C, not; Group H, expletive there), and several gather together words that are largely ignored in traditional English grammar (Group K, comprising utterance-initial well, oh, now, and exclamatory why; Group M, comprising the discourse markers look, say, and listen).  There are extended treatments of sentence patterns, immediate constituents, the syntactic functions "Subject" and "Object", and much else.

Well worth looking at now.

But what happened?  Why did Fries pretty much disappear from sight?

Look at the dates.  While Fries was getting his book to press, Chomsky was writing The Logical Structure of Linguistic Theory; he finished the manuscript in 1955, the year before the Roberts book was published, and the next year after that Syntactic Structures appeared.  By the time Fries died, in 1967, generative grammar was flourishing and American structuralism was increasingly marginalized.  Fries's careful procedures and concepts defined from (real-life) data had no place in the world of Universal Grammar.  Well, they're back, and it's time to say some good words for Charles Carpenter Fries.

zwicky at-sign csli period stanford period edu

[Update from Mark Liberman -- Dan Everett writes:

Ken Pike told me many stories about Fries that support Arnold's statements. Ken's first presentation in linguistics was on tone languages, to the plenary session of the LSA in 1936. There were only 12 people in attendance, but on the front row were Bloomfield, Sapir, Trager, Bloch and Fries. In the second row was the new PhD, Charles Hockett. After his presentation, Pike said that Sapir wanted him to do his PhD with him at Yale. Bloomfield offered him a spot at Chicago. And Fries talked to him about coming to Michigan. Pike said that he chose to work with Fries over Bloomfield and Sapir because Fries' work was more concerned with helping people learn to do linguistics and apply it.


Posted by Arnold Zwicky at 04:00 PM

Taboo avoidance in Dilbertland

Scott Adams has turned his attention to taboo avoidance, and he doesn't like what he sees.  In his blog of 11/4/06, he declares that "the most obscene letter in the alphabet is the asterisk."  But to balance that judgment, he explains how the asterisk protects us:

Naked naughty words can destroy your brain and also society as a whole. However -- and one would think this is obvious -- It's completely safe to THINK naughty words. And it's safe to cause other people to think naughty words. But if you spell those naughty words without the asterisk loin cloth to protect your victims, you're a danger to society. I know this to be true because I heard it from lots of people who have sh*t-for-brains.

(Hat tips to Susan Harrelson and Edward Wilford.)

zwicky at-sign csli period stanford period edu
Posted by Arnold Zwicky at 12:50 PM

From one non-proscriptivist to another

Bill Weinberg recently offered a linguistic argument against the use of "open source" as a verb ("Commentary: Open Source is not a verb", NewsForge, 11/4/2006):

I am a linguist by training. Long before I delved into free software and was snagged by the quagmire of marketing, I pondered the marvels of morphology, the grimness of grammar and the splendor of semantics. It is only natural then that my wrangling criticism of industry-speak, in both technical and literary modes, is informed by ingrained linguistic sensibilities, descriptive and proscriptive. Given my background, I find it vexing when open source is used as a verb.

In my travels with OSDL, I frequently hear our eponymous Open Source employed as a transitive verb. As in "My company open sourced our product." Now, I am no petty proscriptivist.

Me neither. But I'm happy to offer some free editorial advice to a fellow linguist.

First, I think you mean "prescriptivist", not "proscriptivist". The OED tells us that a prescriptivist is "An adherent or advocate of prescriptivism", and that prescriptivism is "The practice or advocacy of prescriptive grammar; the belief that the grammar of a language should lay down rules to which usage must conform". There's no OED entry for "proscriptivist", but we could regard is as a regular derivation from proscriptive, which is glossed as "Characterized by proscribing; tending to proscribe; of the nature or character of proscription". The verb proscribe has the glosses

I. 1. trans. To write in front; to prefix in writing. Obs. rare. Perhaps a scribal error for prescribe.
II. 2. To write up or publish the name of (a person) as condemned to death and confiscation of property; to put out of the protection of the law, to outlaw; to banish, exile. Also fig.
 b. To ostracize, to ‘send to Coventry’.
3. To reject, condemn, denounce (a thing) as useless or dangerous; to prohibit, interdict; to proclaim (a district or practice).

And the noun proscription is glossed as

1. The action of proscribing; the condition or fact of being proscribed; decree of condemnation to death or banishment; outlawry. Also fig.
2. Denunciation, interdiction, prohibition by authority; exclusion or rejection by public order.

"Word rage" may be common among English-language prescriptivists, but ostracizing, death, banishment and confiscation of property are not really on the agenda here. Rejecting, condemning, denouncing, prohibiting and interdicting might be, so it's not nonsensical to use "proscriptivist" to mean "someone who condemns or denounces others for misuse of words". But it's confusing to coin a new word, when there's an old one that's almost identical in sound and essentially equivalent in meaning. And you're likely to make a bad impression on your readers, who may suspect you having committed a malapropism or created an eggcorn. So my advice is to proscribe the use of "proscriptivist", and to prescribe "prescriptivist" instead.

Weinberg's commentary continues:

English is a dynamic, productive language in which nouns can become verbs, and verbs can return the favor. Consider the word source (n. from Middle English sours, from Anglo-French surse spring, source, from past participle of surdre to rise, spring forth, from Latin surgere). Today, source is as often uttered as a verb as it is a noun, as in the dreaded labor term, outsource.

Uh oh, the frequency illusion strikes again! As Arnold Zwicky put it, "once you've noticed a phenomenon, you think it happens a whole lot".  And as Arnold asked,

Why do people ... who propose to offer authoritative advice to educated people not use standard sources of information? ("You could look it up", as Casey Stengel is reported to have said, with reference to his claim that most people his age were dead.)

One way to "look it up" in this case would be check examples of the word source on the web. So I read through the first ten pages of Google's returns for a search on source, without finding any examples of source as a verb. This suggests to me that it's unlikely to be true that source the noun and source the verb are equally common these days.

But Weinberg wrote "uttered", so maybe we need to check conversational use. Well, in the 26 million words of English-language conversations indexed at LDC Online, there are 392 instances of the word source, of which 391 are nouns, and one is a verb:

now everybody that can out source to a cheaper you know find a cheaper worker somewhere

(By the way, Weinberg's sentence offers another classic example of the McKean/Skitt/Hartman Law of Prescriptive Retaliation. When he wrote that "source is as often uttered as a verb as it is a noun", I think he left out an "as". I could be wrong about this, because both wordings give me a sort of unpleasant headachy feeling, but I believe that the clause ought to read "source is as often uttered as a verb as it is as a noun." In any case, it's probably not a good idea to create a 14-word clause involving either three or four copies of the word as. A better option might be something like "source the verb is now as common as source the noun". It's still false, but it reads better.)

We haven't quite gotten to Weinberg's real point yet, but we're getting closer. He continues:

What I find nettling is the presumption of what syntacticians call agency. In pragmatic grammar (as opposed to case grammar), the subject of a transitive verb is the agent that performs some act upon the patient or direct object of the verb. Dog [agent] bites [verb] man [patient]. The dog bites the man because it wants to, because it can. (Maybe a better example for software is "Cat throws up hairball"). But is it meaningful to say that the owner of a piece of code can open source that code, by fiat?

This is confusing. Is he saying that all transitive verbs have agents as subjects? In all flavors of grammar that I'm familiar with, some transitive verbs have agentive subjects and some don't. Here are a few examples where transitive verbs have subjects that are causes or themes or experiencers, not agents:

A fallen tree blocked the road.
The noise bothered her.
The bullet entered his chest and lodged near the spine.
Everyone in the room heard the explosion.

In those sentences, "blocked", "bothered", "entered" and "heard" are perfectly good transitive verbs, although none of their subjects are agents by the usual linguistic definition. Certainly none of them "performs some act upon the patient ... because it wants to". But in any case, someone who makes software available under an open-source license is both legally and linguistically a sentient agent who intends the result that is achieved.

Well, let's put all the grammar aside, because we're about to get to the real point:

There are actually four distinct stages for source code, only one of which I consider open source. ... The first is source code as documentation ... The second is source code as bait ... The third is source code under an OSI license ... The fourth and canonical scenario that embodies the true meaning of open source is a community of developers and users cooperatively building, deploying and maintaining project code.

OK, fair enough. Weinberg's idea seems to be that we shouldn't say "(person or company) A open-sourced (software system) X" just to mean "A made X available under an open source license", because X won't really be true "open source software" until and unless it comes to have an active community of developers and users.

I agree with Weinberg that "Without community, the source code behind open source is just a dusty tome, lifeless, static and unread". But lifeless, static and unread open source software is still open source software.

And therefore it's not only meaningful, but also true, to say that the owner who released some code under an OSI license "open sourced" it. You can object to this usage on aesthetic grounds, if you want to, but the business about subjects and agency is beside the point. English syntax and and semantics are neutral on this one.

[Hat tip to Tiego Tresoldi.]

[Update -- Bill Poser writes:

Bill Weinberg seems to be confusing a canonical association between transitive subjecthood and agency with a rigid implication. There are languages that have something stronger than what English does. In Japanese this association is sufficiently strong that, as Susumu Kuno pointed out years ago, it is generally not possible for a transitive verb to have an inanimate subject. To say "History repeats itself", for example, is bizarre in Japanese.

If he were writing about Japanese, I would just have pointed out that sentences like "IBM open-sourced UIMA" in fact have perfectly good agentive (and even quasi-animate) subjects.

What he's really saying is that the real agent of the open-sourcing process is the developer and user community, not the software owner. The trouble is, that's a moral or political judgment, not a linguistic one.

The usage fact is that phrases like the following are often found:

Yahoo! made this library available under an open source license.
The Redmond company made WiX available under an open-source license on
...we made it available under an Open Source license available at [10].
The CMU Sphinx project ... has made Sphinx2 available on SourceForge under an Open Source license.

There's nothing wrong with these sentences, in terms of syntax, semantics or word usage. And it's also sanctioned by the norms of English to refer to software that is available under an open-source license as "open source". As a result, it would be perfectly natural, from the point of view of English morphology and syntax, to rephrase each of those sentences using the causative neologism to open-source <something>, meaning to make <something> open source. Thus "The CMU Sphinx project has open-sourced Sphinx2 on SourceForge".

You might decide against this rephrasing because you don't like to make a new causative verb out of a complex nominal of the form adjective+noun -- that would be a reasonable stylistic preference. But to reject such usage on the grounds that "it takes a village to open-source a program" (as we might paraphrase Weinberg's argument) confuses morphosyntax with politics.]

Posted by Mark Liberman at 12:04 AM

November 05, 2006

White Horses

Language Hat's discussion of the Chinese spelling of "Africa", in which the character 非 is used phonetically, triggered a few thoughts about Chinese philosophy. The word 非 is usually translated as "not", or in compounds, as a negative prefix such as "un-" or "i(n)", as in 非 法 feī fǎ "illegal". This has led some people, on encountering the statement 白馬非馬 in the writings of philosophers of the 名家 míng jīa "Logicist" school, as, for example, in the title of a famous work by 公孫龍子 Gōngsūn Lóngzı̌ known in English as the White Horse Dialogue, to interpret it as "A white horse is not a horse.", which appears to be a contradiction. A key point is that in Classical Chinese 非 can mean "different", so the statement can be read as "A white horse is different from a horse". The White Horse Dialogue plays on the ambiguity created by the two meanings of 非.

The Logicist tradition in Chinese philosophy, which bears a much closer relationship to the Western scientific tradition than does the Confucian tradition, was largely submerged by Confucianism and was for a long time poorly known. It was brought to prominence by Hu Shih 胡適, who, after a traditional education in Chinese philosophy became a student of John Dewey, in his doctoral dissertation The Development of the Logical Method in Ancient China. To my astonishment, it is available from He is probably better known as one of the leaders of the shift to the use of the vernacular in written Chinese and as the ambassador of the Republic of China to the United States from 1938-1941.

[Addendum: A nice discussion of the White Horse Dialogue is to be found in the Stanford Encylopaedia of Philosophy.]
Posted by Bill Poser at 06:00 PM

Newroz Píroz be!

Although Turkey has taken some steps toward reducing its oppression of the Kurds in hope of being admitted to the European Union, it keeps on backsliding. It is reported that Osman Baydemir, a prominent human rights activist now the mayor of Diyarbakır is being prosecuted for sending out cards containing New Year's greetings in Turkish, Kurdish, and English. "Happy New Year" in Kurdish is Newroz Píroz be!, the publication of which violates Act 1353 of November 1, 1928 on Adoption and Application of Turkish Letters, which forbids the use of any letters not found in the Turkish alphabet. Turkish does not use the letters q, w, or x.

The Constitution of 1982, incidentally, is unusual in that, while it contains provisions guaranteeing freedom of expression, it explicitly empowers the government to prohibit the use of languages not once but twice (Article 26, par. 3, Article 28, par. 2) and declares (Article 174, par. 6) the Act of November 1, 1928 to be constitutional. As amended in 2001 [the amendments are marked in this Turkish text], the Constitution no longer explicitly empowers the government to ban languages but continues to enshrine the Act of November 1, 1928. As I pointed out in my comment on a previous incident of this type, Turkey enforces this law selectively, using it only for the purpose of suppressing Kurdish. Meanwhile, one of the few good things that can be said to have come out of the invasion of Iraq is the improvement in the status of the Kurds and Kurdish.

Posted by Bill Poser at 04:46 PM

George Lakoff, Card-Carrying Chomskian

George Lakoff has taken a lot of knocks for his theories about political language and the cognitive roots of political orientations, which is not unexpected for someone who has courted controversy throughout his career. Some of the criticism is justified, I think -- in fact I've gotten in my own two cents in my book Talking Right and in a recent post on Open University, the New Republic's academic blog. But some of the charges are really loopy, and none is so weirdly off-the-wall as the description of Lakoff offered in a piece by the Bloomberg columnist Andrew Ferguson claiming that Lakoff's influence is fading, which also ran in the New York Sun and was picked up by various conservative bloggers:

A disciple of the notoriously anti-American Massachusetts Institute of Technology professor Noam Chomsky, Lakoff first earned a wide public audience -- inadvertently -- with his essay "Metaphors of Terror," published a few days after 9/11.

Now call George Lakoff what you will -- and a lot of people have done just that -- "a disciple of Noam Chomsky" he ain't, not in his politics and certainly not in his linguistics.

True, you could argue that Lakoff's work bears the methodological traces of his Chomskian upbringing; virtually no modern linguist has escaped Chomsky's influence, after all. But Lakoff and Chomsky have been at theoretical and personal loggerheads since the 1960's, and to call Lakoff a Chomsky disciple is like calling Dave Winfield a disciple of George Steinbrenner.

As nonsensical as it is, though, this little factoid has been making its way around the influential conservative blogs. From Little Green Footballs, for example:

I'm sure it will come as no surprise to learn that Lakoff is an admirer of Noam Chomsky.

And from Richard Bennett's The Original Blog:

George Lakoff, the Chomsky protege who's a big fave with Democrats these days, has a new book out urging leftish politicians to spin more.

It isn't hard to see what's going on here. Many people assume that there's some connection between Chomsky's politics and his linguistics, and a lot of them go on to conclude that linguistics itself is constitutively a leftish discipline. So when Lakoff emerged as an influential political figure, it seemed natural to blur both his politics and his linguistics with Chomsky's, particularly if for those who didn't know jack about linguistics. Whatever your political views, it's a depressing reminder of how widespread the ignorance about the field of linguistics is (not that we exactly needed another one). But then it's probably asking too much to expect people who find it expedient to conflate Lakoff's garden-variety liberalism with Chomsky's anarcho-syndicalism to take the trouble to learn the difference between Chomsky's minimalism and Lakoff's cognitive linguistics. Oh well, they have the sense they were born with.

Posted by Geoff Nunberg at 01:46 PM

Clifford Geertz, 1926-2006

Jeff Weintraub has a nice appreciation of Clifford Geertz, who died last week, along with links to other appreciations of this great scholar. Geertz's work displayed a rare combination of depth and subtlety, and a resolute refusal to accept easy answers or facile generalizations about culture. It's a testimony to his influence that "thick description" became a term of art widely used not just in anthropology but in the social sciences and humanities, including linguistics (it gets 188,000 Google hits), even if he sometimes raised an eyebrow at the way people deployed the phrase. I knew Cliff a bit, and fondly recall both his personal warmth and intellectual generosity. He will be missed.

Posted by Geoff Nunberg at 01:23 PM

Aut blog aut mori

The motto of NaBloPoMo ("National Blog posting month") is a nice example of dog latin. (Though in former times "blog or die" might have been dog-latined as "aut blogare aut mori", with some approximation to an infinitive form of the verb blog, more closely echoing the original "aut vincere aut mori" = "conquer or die".) No matter what the pseudo-Latin morphology might be, though, I don't agree with the sentiment, since I make it a principle not to blog unless it feels like fun. Life has enough necessities without adding another.

The alternative motto "Blog today -- tomorrow you may be eaten" is funnier but even less attractive. I'd go along with "Blog longa, vita brevis", especially if you add the rest of the quote.

Posted by Mark Liberman at 08:53 AM

November 04, 2006

Grammar on the gay beat

Genre, a lifestyle magazine for gay men, has an advertising section every month with interviews of "Genre men", one from each of several cities.  Questions about favorites: gym, bar, restaurant, retailer, place to hang.  (Picture of gay male life: we work out, go to gay bars, like to eat and shop and hang with other gay men, cruising them.)  Other more specific questions, some silly ("If you were a cocktail, what would you be?" and "Who is the chick you'd switch for?"), some serious ("When/How did you come out?" and "What are you afraid of?").

New York City is represented in the November issue (p. 71) by Arnie Plotnick, a 46-year-old cat veterinarian who's inclined to smart-aleck answers:

First thing you do in the morning?

My boyfriend.

And then there's the question:

If you could go back in time to any year, what year would you go to?

Plotnick snaps back:

I'd go back to a time when people didn't end sentences with a preposition.

Ah, that prescriptive fiction Dryden's Rule, a.k.a. No Stranded Prepositions.  Particularly ridiculous here, since the fronted version of the interviewer's question is stunningly awkward:

If you could go back in time to any year, to what year would you go?

Whether by intention or accident, the Genre editors get their revenge by following the stranded-preposition exchange with this one:

How important are politics to you?  Be honest.

Very important.  We have to do everything we can to stop the radical right-wing bigots from destroying our country and everything it stands for.

There you have it: stranded prepositions AND a prejudice against them.  They are everywhere.

(In case you're curious, the answers to the other questions above are, in order: New York Sports Club, Gym Bar, RUB (Righteous Urban Barbecue), Whole Foods, and "on the grass by the pier on Christopher Street"; Absolut Peach and Tonic; Bjork "or maybe J.K. Rowling"; "I didn't really come out of the closet" because "the entire house kind of fell down around me instead"; and gay Republicans.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:57 AM

Linguistics in the funny papers

A few days ago, Sally Forth documented grammar as a source of intergenerational conflict:

This morning, Doonesbury notes the role of tense as a marker of restaurant status:

Posted by Mark Liberman at 07:57 AM

November 03, 2006

Madonna in Malawi: distinguished white lady?

It's a fair bet that most Americans were unaware of the existence of the poverty-stricken African nation of Malawi before Madonna decided to fund an orphanage there and adopt a Malawian child. Now that Madonna is making the media rounds to smooth over criticism of her adoption efforts, she's also bringing some unexpected attention to Chichewa, Malawi's national language (along with English). The Associated Press reports:

"People started to say my name and they had never heard of Madonna," the 48-year-old singer, talking about her recent visit to Malawi, told AP Television in an interview Tuesday.
"And, in Chichewa, the word 'madonna' means 'distinguished white lady,' so I think they got very confused."

At the very least Madonna's mention of Chichewa is an improvement over earlier reports that she was planning to learn "Bantu" so that her adopted son could remain in touch with his Malawian roots. "Bantu" of course refers to a language family rather than a particular language, encompassing Chichewa and hundreds of other distinct languages throughout the southern half of Africa. So if Madonna has now figured out the name of the language of her new son's homeland, how'd she do with the gloss of madonna as 'distinguished white lady'?

Turns out she wasn't too far off, though she should probably keep working on those Chichewa lessons. The word to which she refers is madona, which consists of dona 'lady' plus the plural prefix ma-. (Compare makaku, the plural of kaku 'mangabey' in Bantu languages of Gabon and Congo, which is the etymon for macaque and, possibly, George Allen's notorious epithet Macaca.) The word appears in the Chichewa equivalent of "ladies and gentlemen," mabwana ndi madona — as in this song in honor of Malawi's first president Dr. Hastings Banda (or more recently, this post on a Malawian message board). So for starters, it's 'ladies' rather than 'lady'. But what about the 'distinguished white' business?

The honorific dona has indeed been used to refer to white women in Malawi since colonial times. Presumably the word itself is a vestige of Portugal's early colonial ties to Africa, derived from Portuguese dona — cognate with Spanish don/doña and ultimately from Latin dominus/domina. (Note also the Old Italian cognate donna, which in the form ma donna 'my lady' provides the etymological source for Madonna Ciccone's first name.) According to an article on the Dutch Reformed Church Mission in Malawi (History of Education Quarterly, Autumn 1984), the DRCM's female missionaries were known as madona, and the girls' homes that they supervised in the late 19th century were called Ku Madona. The presence of European women in Malawi also gave rise to a style of ceremonial mask known as Dona (described here and here), emulating the women's foreign features. So (ma)dona has survived as a more generalized honorific for distinguished ladies as well as a term specifically used for women of European descent.

The racialized sense of (ma)dona did apparently have some resonance for Yohane Banda, the father of Madonna's adopted son David. At least that's what a Malawian blogger named Steve Sharra wrote, most likely based on local news reports:

For Mr. Yohane Banda, who had never heard of the pop diva Madonna until she visited Malawi last week to adopt his 13 month-old son David, the closest he could relate with the material girl was the word Dona, meaning rich white woman, in Malawian parlance. In a matter of days, he now knows her, and the rich guy Guy Ritchie, as the new parents of his son.

Perhaps that rich white lady Madonna will do for Chichewa what Mel Gibson (before his fall from grace) hoped to do for Yucatec Maya. Shouldn't every indigenous language have its own celebrity spokesperson?

Posted by Benjamin Zimmer at 01:52 PM

Elliott Bay in view! Oh! The Joy!

Elliott Bay is an inlet of Puget Sound that forms Seattle's harbor, and so the Elliott Bay Book Company is a terrific Seattle bookstore (150,000 titles) and cafe. If you live in the Seattle area, you probably already know that. But you might not know that Geoff Pullum will be there for a book reading and signing, at 2:00 p.m. on Sunday, November 5.

Geoff won't be handing out money, as he did back in June in order to encourage people to come to his reading at the MIT Coop. However, I hear that Tom Sumner will be there, giving out ("a limited supply of") free stuff. Anyhow, Geoff's readings are legendary, so you shouldn't need to be bribed. And as Tom points out on his blog, the Seahawks' game is Monday night, so what else are you going to do on a rainy Seattle Sunday afternoon?

The title of this post refers to the entry in William Clark's journal that recorded his first view of the Pacific Ocean, in November of 1805, just about 101 years ago. ("Ocian in View! Oh! The Joy!") To avoid geographical confusion, I should hasten to tell you that Lewis and Clark were following the Columbia River, and therefore reached the Pacific 100 miles or so to the south of Elliott Bay, which in any case wasn't named until 1841. Of course, William Clark was also a pioneer of plain spelling, so it's unlikely that he would have gotten the two l's, the two t's, the use of o for the reduced final vowel, etc., even if he had reached Elliott Bay, and known it by that name. He might have referred to "Eliot", "Elliot", "Eliott", "Elliatt" or some other kind of bay. I might have, too, except that I used the internet to check the spelling, and to find you a map link.

Since I'm an interested party, here's what Seattlest says:

>>>Elliott Bay, 2:00pm. It's inevitable that someone would come along to rip Strunk & White a new one. It's the good fortune of every user of a non-fossilized version of the English language that that someone is as eloquent as Geoffrey Pullum. Pullum's one of the prime movers behind the essential linguistics blog (seriously!) Language Log, and the co-author of Far from the Madding Gerund.

Posted by Mark Liberman at 11:41 AM

Plain spelling

Some entertaining foolishness from Simon Jenkins, "A million fingers are tapping out a challenge to the tyranny of spelling", The Guardian, 11/3/2006. The lede:

Thank you, Scotland. First John Knox, then the Enlightenment and now the Scottish Qualifications Authority. In a direct challenge to the English at their most reactionary, the authority has declared that it will accept text-messaging short forms in school examinations. The dark riders of archaism will protest and the backwoods will howl. No spell is cast as dire as spellcheck. But the champions of reason are massing north of the border and need our support.

Sample quotes:

I have no quarrel with grammatical authoritarianism. Grammar is a vehicle that needs a highway code of human communication. To parse is to prosper. [...]

In contrast, spelling has become a no-go area, an intellectual tundra. While plain writing is considered a stylistic virtue, plain spelling is a vice. English orthography is an edifice of unreason. Word endings are the last gasp of the Anglo-Saxon and Norman invasions, embedded in the cultural DNA of literary Brahmins. Not to spell properly is a sign of being common, as once was ignorance of Latin. Knowing your "ie" from "ei" or -ible from -able does not affect a word's meaning one jot. It is a caste mark, its distinction deriving from its very obscurity.

Most linguists think that this is backwards: syntax and word usage can take care of themselves, pretty well, but spelling does need standardization. The basic argument is that writing is artificial in a way that speaking is not, and orthography is the most artificial part of writing, so that the normal human process for creating and maintaining cultural norms is good enough for grammar, but not for spelling, which therefore needs to be established as "made order" rather than a "grown order".

The form and content of this argument are certainly valid, but it does have a bit of the smell of a rationalization. At least, it's certainly true that the Elizabethans got on fine with what Jenkins calls "plain spelling" (i.e. chaotic spelling) -- though this would have made search engines harder to implement, if they'd had them.

Posted by Mark Liberman at 10:05 AM

TLSX: reverse engineering the language module

I'm in Austin for TLSX ("Texas Linguistic Society, Ten"), an annual conference run by grad students at UT. This year's theme is "the application of techniques from computational linguistics to descriptive linguistics and the analysis of less-studied languages".

I'll blog from the conference site as time permits. I'm going to start with an entry that does have to do with linguistics and with blogging, but not with TLSX (though maybe there'll be some sort of connection, who knows). The subject is Ken Macleod's new novel, "Learning the world: A scientific romance", and I've had it on my to-blog list for a month or so.

From the cover blurb:

Humanity has spread to every star within five hundred light-years of its half-forgotten origin, coloring the sky with a haze of habitats. Societies rise and fall. Incautious experiments burn fast and fade. On the fringes, less modified humans get on with the job of settling a universe that has, so far, been empty of intelligent life.

Being at a conference run by grad students reminds me: does the appeal of space-colonization stories comes partly from their success as a metaphor for the experience of young people starting out to find a place in the world?

More from the cover blurb:

The ancient starship But the Sky, My Lady! The Sky! is entering orbit around a promising new system after a four-hundred-year journey. For its long-lived inhabitants, the centuries have been busy. Now a younger generation is eager to settle the system. The ship is a seed-pod ready to burst.

Graduate school does seem to last for centuries, for some people, and UT does have an unusually large linguistics department. And perhaps the UT department is entering intellectual orbit around some new topics. But I think this joke has been pushed far enough.

One of the Learning the World's narrative threads comes from a young girl's blog. The book opens with her first entries:

13 364:05:12 16:24

The world is four thousand years old. I was eight years old when I found that out for myself. My name is Atomic Discourse Gale and this is the first time I have written something that anyone in the world can read. It is strange and makes me feel a little self-conscious, but I reassure myself that not many people will read it anyway.

14 364:05:13 18:30

That was a joke. I see I have a few readers. J---- wants to know how I found out the age of the world. It was six years ago now but I remember it quite well. I was very young then and didn't understand everything that happened, but looking back I can see that it was a significant event in my life. That is why I mentioned it. So this is what happened.

OK, let's get to the point. The second planet of the new star system is inhabited by sentient bat-creatures, who have reached a roughly Victorian-era level of technology. Their economy is based on slave labor provided by trudges, members of a semi-sentient related species -- roughly as if humans had succeeded in domesticating and enslaving chimpanzees.

In their spread across the galaxy, humans have never encountered any life much above the level of slime moulds. But they've got ethical principles in place, all the same, which say that they should leave the bat-people alone, except for some discreet observation mediated by bioengineered beetles and the like.. But what does non-interference mean, in this case? Would colonizing the system's other planets, ithe asteroid belt, etc., be OK? Maybe not, since the bat people are on a track to invent space flight and do it themselves. Still, a dissident faction plans to force the issue by breaking off part of the ship (it's designed to work that way) and starting the colonization process anyhow.

And is non-interference really the right policy, given the bat people's warring societies and their cruel treatment of the trudges? Some crew members don't think so.

In this context, a bit of genetic hackery goes badly wrong -- or maybe exactly right, depending on which level of whose plans you attend to. From p. 281 of the TOR paperback

She reviewed what he had told her, replaying the words and sentences her anger had whited out and shouted down the first time.

The problem, the intellectual problem, was this. No Rosetta stone existed for the bat people's language. No amount of observation, no iteration of linguistic heuristics, could decode an unknown language from recordings alone. For mutual understanding, there had to be mutual interaction. One had to know directly what one side of the conversation was trying to say, and that meant one side of it had to be you. Faced with this impasse, the crew's scientists had, in all too characteristic a fashion, worked around it. Their solution had all the grubby fingerprints of a brute-force kludge.

The neural structure of the human brain's language-processing module, named in deep antiquity Chomsky's Conceit, had been known since the Caves. The genetic code of the Destiny II biosphere was known from aerial microorganisms returned to the stealth orbiter. The amount of information and genetic instruction that could be packed in a naonassembler was vaster by far than even the vast amount stored in natural genomes and machinery, cluttered as tehy were with redundancy and junk. The information-processing hardware capacity of the ship was beyond all human conception, and the amount of information its sceince software could extract from the slenderest and most fragile of evidence was limited only by the ingenuity of the human inquiry that initiated it.

So . . . they'd had the means to install Chomsky's Conceit on any big enough brain down below. They had the means to generate radio transmitters within host bodies, as they'd done with the dung-beetles. And faced with the crash-and-burn and banning of that project, they'd skipped blithely ahead to a bolder one. They couldn't install Chomsky's Conceit on the brain of bat people -- the aliens' brains already had a language module of their own. That would have given rise to wetwoare conflicts and deep grammar errors, and anyway, ethically, that would never have done. Oh no. That would have been wrong. That would have interfered. What they had done, bless their reckless little souls, was to set up the machinery to install the module on the brains of the slaves, who had (they'd figured) no language module (and who were, therefore, not slaves but beasts). And once thye'd received and filtered and processed and quantum-handwaved the information coming back from brains learning the bat people's languages, the translation protocols had been ---

Reverse-engineered from the language module!

Holy rocking shit.

From a narrow technical point of view, this is just clever intelligence work: getting the information needed to be able to understand what the bat people are saying to one another. But from another point of view:

"You realise what you've done?" she demanded. "Do you have the faintest conception of the harm this will cause?"

Constantine nodded. "The disruption will be immense. It'll destroy the entire slave economy."

"But they're not slaves!" Synchronic said. "If they had been, I could see why we might want to interfere. But you've taken waht are by your admission mute brutes, and given them language. Deep grammar. Self-awareness. Human consciousness. You've made them slaves."

Caliban writ large. (No, not Taliban -- that's a completely different problem.) For a later post: why Macleod's future humans have misread Chomsky. If you're really curious about this, you could check out these earlier posts:

"Homo hemingwayensis" (1/9/2005)
"Chomsky testifies in Kansas" (5/6/2005)
" JP versus FHC+CHF versus PJ versus HCF" (8/25/2005)

Posted by Mark Liberman at 08:04 AM

Bad Boy Science

In the current American Prospect, a nice piece by Jaana Goodrich called "Where the Boys Are" (subscription required, unfortunately) takes on the conservatives' enthusiasm for same-sex schooling and the psychological "evidence" for the vast cognitive differences between the sexes that are held to justify separate education, as presented by the likes of Michael Gurian and Leonard Sax:

Too bad that the scientific evidence underlying these recommendations is unclear at best and nonexistent at worst. Mark Liberman, on the Web site Language Log, takes apart some of the bad science Sax uses in his popular book Why Gender Matters. He also points out that any average sex differences in learning styles are small and swamped by the individual variations within each sex.

Goodrich goes on to dismantle the idea of a "boy crisis," which she lays to sloppy research and discomfort with the idea of girls doing better than boys. She concludes:

None of this probably bothers the Republican Party's socially conservative base. Social conservatives already view gender roles as innately determined and single-sex schools fit admirably into their sexual abstinence agenda. Neither are conservative anti-feminists likely to be upset over these developments: Anything that pokes a finger in the eye of second-wave feminists with their claims of equal treatment for girls and boys is fun for this group.

Posted by Geoff Nunberg at 12:44 AM

November 02, 2006

Social science on the playground

On the recent Linguistics 001 midterm, one of the questions that most people missed was "What is the Machiavellian Intelligence Hypothesis?" Obviously my lecture on hypotheses about language evolution failed to get this point (or at least this term) across. For the record, "Machiavellian intelligence" (popularized as a term by a 1988 book of that title) refers to the hypothesis that "the driving force in the evolution of human intellect was social expertise--a force which enabled the manipulation of others within the social group, who themselves are seen as posing the most challenging problems faced by primitive humans".

In the interests of making this idea more vivid, here's a little story that someone recently sent me, about an interaction among elementary-school children (in a galaxy far away):

Dramatis personae: Elementary school kids A, B, C, X, Y, Z; teacher T.

There's this new kid, A, who, according to B, never smiles and is extremely hateful to other people. He's in C's group, with T for teacher.

At recess, C was complaining to B that T got mad at him and it wasn't fair. He said T turned to him just before recess -- because C was sitting closest to her -- and asked where A was. C said, "I don't know, and I bet nobody else does either, because 9 out of 10 people don't like him." That made T mad.

B told C the problem was that if you quote a statistic, most people will think you agree with it. So they took a survey among three kids on the playground -- X, Y, and Z -- asking, "If someone quotes a statistic to you, will you assume they agree with it?" X said yes, Y said yes, and Z said, "What's the right answer to that question?" indicating that she just wanted to be on the same side as B and C.

But B points out that while it really wasn't fair of T to get mad at C for merely quoting a statistic, in fact C does agree with it.

And of course, he did make up the statistic. Although B says he thinks C's estimate may be conservative, because they only know of one kid who likes A.

Posted by Mark Liberman at 06:20 AM

The cabinet of Dr. Birdwhistell

As promised, the story of "Political correctness, biology and culture" continues.

In the fictional 1880s, Sherlock Holmes closed the case of Silver Blaze by paying attention to the curious incident of the dog that did nothing in the night time. In the middle 1960s, Paul Ekman opened his study of the universality of facial expressions by paying attention to the curious incident of the file cabinet that didn't exist in an anthropologist's office.

Ekman came at the problem of facial expressions with a strange combination of Freudian and Skinnerian motivations:

As a freshly trained clinical psychologist, my therapeutic orientation was psychoanalytic, but as a researcher I had been trained as a behaviorist, a radical Skinnerian. Skinner said that psychology should examine only observable behavior; there were to be no inferences about what might be going on inside the heard. ...

I was dissatisfied with the evidence for the effectiveness of psychoanalytic therapy, which rested on what the patient and a therapist said. I wanted to example not words but real behavior (from a Skinnerian viewpoint) -- body movements and facial expressions. Examining the non-verbal behaviors of patient and therapist might reveal evidence of clinical improvement not shown in their words, and perhaps would suggest ways to improve therapeutic techniques.

Body movements and facial expressions are certainly real enough, but it seems odd to view them as "real behavior" in a sense that spoken words are not. In any case, Ekman spent "a few years studying hand and leg movements", and then moved on to the face.

I had not read Expression, but had heard about it and thought Darwin was probably wrong. As a Skinnerian, I thougth it unlikely that expressions would be universal, and I was sure that inheritance could not play a role in emotional behavior. But it didn't really matter what I thought, as a Skinnerian, it was better not to have any forethought about what you were going to study. I would just get the facts.

As it turned out, Ekman's empirical methodology would come into conflict with his empiricist ideology. The first clue that there might be a problem was the curious incident of the missing data in Ray Birdwhistell's office:

Before starting my research on the face, I vistied Birdwhistell. I expected to find file cabinets full of data, notebooks crammed with detailed observations, or racks of film documenting his position. Birdwhistell was surprised at my request to see his documentation, for what he had seen and observed was all in his head. We did not get along. He could not undersand what I thought I might be able to prove by re-opening the question of whether facial expressions are universal, when he had found the answer was 'no'. He could not comprehend why I was dissatisfied with his conclusions with no documentation or data others could inspect or attempt to repeat.

Ekman also talked with Gregory Bateson, who was charming, and Margaret Mead, who was not.

I clearly remember that meeting, the appearance of her office, and her unfriendly, gruff manner. She had little patience for the quest I was about to begin. She knew that I had been to see Birdwhistell and that I disagreed with Birdwhistell's view that the question of universality was settled. I did not anticipate how angrily she would react later when my findings challenged Birdwhistell's claims.

Some of the animosity may have come from the fact that Birdwhistell was Mead's student. But much of it was a clash of cultures:

They believed in the value of the lone anthropologist and his or her fieldwork, trusting in his or her own intuitions and judgments. The idea of using multiple observers, of gathering quantitative data, of building in safeguards against the influence of the scientist's commitments, which are standard in experimental psychology, were foreign to them.

Next: people in 21 countries agree about pictures of facial expressions; Birdwhistell argues they've learned about facial expressions from John Wayne and Charlie Chaplin; Ekman studies the Fore people in in the highlands of Papua New Guinea to escape the influence of Hollywood; Margaret Mead writes in the Journal of Communication that Ekman's work "is a continuing example of the appalling state of the human sciences".

Posted by Mark Liberman at 12:06 AM

November 01, 2006

Is there coded language in the House?

Ruth Marcus of the Washington Post (see here) believes that if or when the Democrats gain leadership of the House of Representatives, it would be a bad idea for the incoming Speaker, Nancy Pelosi, to appoint Rep. Alcee Hastings of Florida to head the House Intelligence Committee. Hastings has the seniority to take over that role but Marcus points out that he has a serious blot on his record, because in 1988 he was impeached and removed from his office as a Federal judge.

So what does this have to do with linguistics? And why comment on it on Language Log? Because linguistic analysis played a role in Hastings' impeachment process and Marcus, who covered this event for the Post, still recalls the salient factors, as follows:

"The evidence against Hastings is circumstantial, but it's too much to explain away: A suspicious pattern of telephone calls between Hastings and Borders at key moments in the case; Borders' apparent insider knowledge of developments in the criminal case; Hastings' appearance at a Miami hotel, as promised by Borders as a signal that the judge agreed to a payoff; a cryptic telephone conversation between the two men that appears to be a coded discussion of the bribe arrangement."

So what is this cryptic conversation that leads Marcus to say:

"I don't worry that as chairman he'd suddenly be for sale: If he could be entrusted with national security secrets as a committee member, why not as chairman? But this is no ordinary crime, and Intelligence is no ordinary committee."

It all began in 1981, when a DC lawyer, William Borders, and Judge Hastings were involved in a criminal extortion proceeding. Borders was convicted but Hastings was acquitted. Months later, the House Subcommittee on Criminal Justice was not convinced of Hastings' innocence and it started its own lengthy investigation of the matter. At issue were several intercepted telephone calls between Borders and Hastings. To the Subcommittee, the conversations looked like coded messages but they couldn't figure out how to prove this. In 1988 they called on me to analyze the conversations with the sole purpose of answering the question about whether or not they indeed contained some kind of code.

The main focus was on one very short conversation between the two men. The ostensible topic was the judge's plan to write support letters for Hemphill Pride, a South Carolina attorney who had run afoul of the law and was now trying to reverse his disbarment. The government believed that Hastings and Borders were involved in a plot to extort money from a man they believed to be Frank Romano but who was actually an undercover agent using the name, Rico. Borders assured Rico that he could get a judge to provide a favorable sentencing report if Rico would ante up $50,000. The government further believed that part of this money was to go to Judge Hastings. No linguistic analysis of the conversations were made at the trial.

My task was to discover whether the conversation was actually in code. But first, let's look at what they actually said to each other in this short, 19 line conversation:

Phone ringing:

(1) B: Yes, my brother.

(2) H: Hey, my man.

(3) B: Uh-huh.

(4) H: I've drafted all those, uh, uh, letters, uh, for Hemp.

(5) B: Uh-huh.

(6) H: And everything's okay. The only thing I was concerned with was, did you hear if, uh, did you hear from him after we talked?

(7) B: Yeah.

(8) H: Oh, okay.

(9) B: Uh-huh.

(10) H: Alright then.

(11) B: See, I had, I talked to him and he, he wrote some things down for me.

(12) H: I understand.

(13) B: And then I was supposed to go back and get some more things.

(14) H: Alright. I understand. Well, then, there's no great big problem at all. I'll, I'll see to it that, uh, I communicate with him. I'll send the stuff off to Columbia in the morning.

(15) B: Okay.

(16) H: Okay.

(17) B: Right.

(18) H: Bye bye.

(19) B: Bye.

If this was a code, it certainly wasn't a total, obvious code. That is, it wasn't the type that cryptologists deal with where the intent of the coding is to be so unclear that the message can't be deciphered by outsiders to the code. Such codes look like codes and strive to be impossible for outsiders to understand.

Nor was it the usual partial and obvious code, where ludicrous nouns and verbs substitute for the intended meaning, similar to the ones Oliver North described in his 1989 book, Taking the Stand: "If these conditions are acceptable to the banana, then oranges are ready to proceed."(p. 143).

If Hastings and Borders were using a code here, it was a partial and disguised code, one in which words were carefully selected to make it appear to anyone who should happen to intercept it that the participants are talking about one thing while, in reality, they are talking about something very different. In his Encyclopedia of Language David Crystal cites such a code used in a murder case in India: "Go clean the bowl" was used to man "Prepare the grave."

In partial and disguised codes both participants must understand the code, which must be relevant to real life situations, be plausible, be specific, be consistent, and they generallly require more confirmation of mutual understanding than we find in everyday conversation. So the question was whether this type of code was being used here.

Lines 1 and 2 appear to be a standard greeting routine between two friends but in line 3 Borders gives a feedback marker, "uh-huh," that suggests a willingness to give up his turn of talk immediately. Oddly, there is no request to Hastings about why he is calling such as "What's up?" or "What's on your mind?" Nor did Borders seize the opportunity to assert his own agenda, such as "I'm glad you called because..."

In line 5, Hastings explains that he's drafted those letters for Hemp, to which Borders says only "uh-huh," accomplishing no more that giving up his turn again. Note that Hastings used the pause filler, "uh," three times in line 5. Pause fillers can accomplish at least three things: to prevent interruption, to provide assurance that more is coming, or to struggle to find the right word to use. In hastily constructed codes, one expects speakers to struggle to find the word that accomplishes the code. These pause fillers tend to occur in exactly those places where the potential code word is to follow:

(4) uh, uh letters

(4) uh, for Hemp

(6) uh, you hear from him

(14) uh, I communicate with him

In lines 6 through 10, Hastings asks if Borders has heard from "him" after they last talked. Since "him" is not specified, we can assume that they both understand who this is. Borders says "Yeah," but oddly does not report what he heard from "him." Nor does Hastings pursue this, offering only "Oh, okay," to which Borders says "Uh-huh," and Hastings says, "Alright then." This odd exchange appears to signal that complete information has been given when it has not--unless "did you hear from" is code for "did you get X," for which the rest of the exchange would have been appropriate.

Borders makes his first substantive contribution to the conversation in line 11: "See I had, I talked to him and he, he wrote some things down for me." Note the care with which he constructs this sentence, including a false start and pronoun repetition at the points where a code word could be expected. This gives the appearance of Borders' struggle to not slip into uncoded talk along with his effort to remember the code consistently. Hastings' response, "I understand," serves as a confirmation of truth or the existence of facts presented by Borders. It's curious why Borders' sentence would require confirmation of undersanding unless it is a code for something else. Hastings' "I understand" can relate only to the "he wrote things down" part of the sentence, since it had already been established that Borders had heard from "him." Even so, writing some things down is hardly monumental enough to require confirmation of understanding. On the other hand, it is appropriate as a confirmation of the potential coded meaning of something else.

Lines 13 to 15 continue the odd sounding dialogue. Borders had already referred to "things" that someone had written down and now he elaborates a bit, saying that he was supposed to go back and get some more "things." if "more things" are things to say in the support letter for Hemp, one might expect it to be said differently, such as, "He couldn't think of everything you should say so he'll think about it more and get back to me." In any case, although it may be appropriate to write "things" down, it is odd to say that one will go back and get more things. Such wording can work nicely, however, for a different (coded) meaning of "things." Note also Hastings' false starts and pause filler in line 14, again in front of potential code words.

Other issues also point to the use of a partial, disguised and hastily constructed code here. For example, one would expect Borders to have said that Hemp wanted him to "come back" for more things rather than "go back." One also wonders why Hastings said that there was "no great problem" in a context where going back to get more things was allegedly benign. This suggests that there may have truly been a big problem about something else. And why, after Borders has said that he was supposed to go back and get more things (line 13), does Hastings change the plan so abruptly on line 14, saying that he will communicate with Hemp himself? Borders' response to this change in plan was only a mild, "Okay." Finallly, Hastings' change in the procedure includes, "I'll send the stuff to Columbia in the morning," more consistent with sending something other than a support letter.

The crucial expressions used here appear to be "letters," "wrote some things down," "get some more things," "I'll communicate," and "send the stuff," all easily translatable to other meanings. Apparently "things" had morphed into "stuff" by the end of the conversation. This gives evidence of a hastily constructed, partially disguised code in which the participants intended their meaning to look like support letters for Hemphill Pride. Interestingly, Pride informed me personally that he never requested such letters and that as far as he knew, none existed. At his impeachment hearing, Hastings claimed that the style of speech he used in this conversation was his typical mode of talk. But comparison of this conversation with the others in evidence showed none of these features that look very much like code.

Rep. Hastings is a congenial man who is generally well liked and thought to be competent. He has served in the House of Representatives since 1992 and has enough seniority to head the House Intelligence Committee. Marcus doesn't think it's a good idea to appoint him to that post, largely because of this 19 line conversation. She may have a point.

Posted by Roger Shuy at 07:04 PM


My last posting on Garner's Rule -- which proscribes sentence-initial linking however -- ended with an unresolved issue: I observed that college student writers seem to be fond of the discourse connective however, and to prefer to put it in initial position (rather than sentence-internally), and wondered why.  That  was Act I of "The Story of However".  Now, Act II, with some reasons.  The zero-tolerance policy ZT-1, "If they do it too much, they should be told not to do it at all", will return to the stage and play a prominent role in this act.

(Acknowledgment: I'm reporting on joint work with Douglas Kenter.)

It's fairly easy to see why writers like discourse connectives (in general, not just markers of contrast like however and but) in sentence-initial position: if you use a marker C to connect a sentence S with preceding discourse D according to the scheme

D  C+S

the marker comes between the things whose contents it relates; structure reflects function.  In addition, sentence-initial connectives are easy to produce and easy to process, while other schemes of connection are more demanding.  Sentence-internal connectives are interruptions within their sentence:

The test is demanding.

Most students, however, will get all the answers right.
Most students will, however, get all the answers right.

and sentence-final connectives hold off information about discourse connection until the last possible moment, where it may come as something of a surprise:

Most students will get all the answers right, however.

The other main option, expressing discourse connection via a subordinating conjunction on the sentence S' preceding S --

C+S'  S

Although/Though the test is demanding, most students will get all the answers right.

involves the complexity that is associated with subordination in general.

The point here is not that these other options are inferior -- there are occasions when they would be excellent choices -- but that a sentence-initial linker is the simplest way to connect a sentence to preceding discourse, so it's no surprise that students are inclined to go for that scheme a lot of the time.

Ok, a sentence-initial connective, but which one?  For expressing contrast, the main contenders are however and but.  These items differ in (at least) three relevant ways: in their prosodic properties, in their stylistic levels, and in their syntactic category.  (Actually, Kenter and I maintain that they also differ subtly in meaning and/or discourse function, and we aren't the first to make this claim.  But that's a matter for another day.)

First, prosody.  However has three syllables, has an accent of its own, and comes with a prosody that separates it from the sentence it modifies.  But has only one syllable, is usually unaccented, and is prosodically integrated with what follows.  Overall, however is a lot "weightier" prosodically than but.  Things follow from that.

Some people report that they like however just because it's more substantial, more prominent, than but.  They see however as a more emphatic marker of contrast, or at least a more noticeable one.

As I noted in my last posting, Bryan Garner sees things the other way; he finds initial however "unemphatic".  De gustibus and all that.  But there are reasons for a sensible person to like but: it's shorter and less ostentatious; however holds off the sentence that follows for an appreciable amount of time, and it shouts "Contrast!"  (As usual when we're comparing alternatives, the things that distinguish them cut both ways, functioning as either advantages or disadvantages, depending on the context and the writer's purposes.  That's why we should want both alternatives to be available to writers and speakers.)

On to stylistic level.  Here, people generally agree that however is more formal than but -- however  groups with adverbials like  moreover, furthermore, consequently, therefore, nevertheless, and nonetheless -- with the result that many people like it when they're doing formal writing.  College students seem to like it especially, probably because one of the things they're working at is to get the proper level of formality in their writing.  (They often overshoot, of course.)

(A little digression: complaints that initial however is weak, monotonous, etc. seem not to be extended to the other formal discourse adverbials in initial position.  The concentration on however puzzles me; furthermore is in competition with and, and consequently and therefore with so, in much the same way as however with but, yet however gets all the attention.  Maybe it's just intellectual fashion.  Maybe it's all Strunk's fault.)

Notice that I said that however is more formal than but, not that but is informal or colloquial.  My judgment here is that but is in fact stylistically neutral, usable at all levels, and this seems to be Garner's judgment as well.  In choosing between a neutral and a more formal alternative, Garner seems to aim for a "plain style" and recommends the neutral item, and in fact that's my practice too.  That's why I use so little sentence-initial however.  (Garner's preference for neutral items over more formal alternatives undoubtedly contributes to his enthusiasm for Fowler's Rule, insisting on restrictive relative that over the more formal which when both are available.)

Finally, syntactic category.  Here we approach the dramatic climax of "The Story of However".  However is an adverbial, but a coordinating conjunction, and this second fact introduces a conflict into our play's action.

A little story: whenever Kenter and I talk about our investigations into but and however, a significant number of people in our audiences are astounded to hear that there are authorities actually RECOMMENDING sentence-initial but.  Almost all of the students in the audiences respond this way.  (And now, after yesterday's posting, my mailbox is filling up with similarly surprised messages from all over the place.)  But, but, they clamor, we were taught NEVER to begin a sentence with but, or any other coordinating conjunction (and and so are the other usual offenders).
Taught where?  In grade school and high school.  No Initial Coordinators (NIC) is all over the place in those precincts.  Some Stanford undergraduates told us that their section instructors in PWR (Program in Writing and Rhetoric, the successor to Freshman Composition) insisted on NIC.  I happen to know that the main texts used in PWR do not advocate NIC, so these section instructors were rolling their own advice (well, probably just handing on things they themselves had been taught).  Still, NIC had some college presence.  And at Stanford.  I was appalled.

In any case, what were the kids taught in elementary and secondary school?  Don't use but to start a sentence; USE HOWEVER INSTEAD!  So of course college students very frequently opt for however; it's just what they were taught to do.  Now we see the dramatic conflict: NIC vs. Garner's Rule.  You can't obey them both.

I will soon speculate on the origins of NIC.  But first, some disavowals of NIC, beginning with Mark Liberman right here on Language Log:

There is nothing in the grammar of the English language to support a prescription against starting a sentence with and or but --- nothing in the norms of speaking and nothing in the usage of the best writers over the entire history of the literary language. Like all languages, English is full of mechanisms to promote coherence by linking a sentence with its discourse context, and on any sensible evaluation, this is a Good Thing. Whoever invented the rule against sentence-intitial and and but, with its a preposterous justification in terms of an alleged defect in sentential "completeness", must have had a tin ear and a dull mind. Nevertheless, this stupid made-up rule has infected the culture so thoroughly that 60% of the AHD's (sensible and well-educated) usage panel accepts it to some degree.

(And, sadly, Microsoft's Grammar Checker tries to enforce NIC.)

Mark notes that the AHD note for and rejects NIC out of hand, and he provides a smorgasbord of cites (and statistics) from reputable authors.  Similarly MWDEU.  Paul Brians, collector of common errors in English, labels sentence-initial coordinators a "non-error".  Bryan Garner denies, all over the place, that NIC has any validity.  Even the curmudgeonly Robert Hartwell Fiske tells his readers that there's absolutely nothing wrong with sentence-initial coordinators.  A point of usage and style on which Liberman and I and the AHD and the MWDEU stand together with Brians and Garner and Fiske (and dozens of other advice writers) is, truly, not a disputed point.  NIC is crap.

But still it lives on, as what I've called a zombie rule.  It's been lurking in the grammatical shadows for some time -- at least a hundred years, to judge from MWDEU.  Hardly any usage manual subscribes to it, but it is, apparently, widely taught in schools, at least in the U.S., with the result that educated people tend to be nagged by a feeling that there is something bad about sentence-initial and (and but and so).  (It might well be that this sense of unease rises with level of education.  Someone should look at this possibility.) 

I speculate now about two questions: how did the proscription arise, and why does it persist?

Grammatical proscriptions that are at odds with elite usage can arise in three ways, two of which were probably at work in the case of sentence-initial and/but/so: as an expression of individual taste; as a consequence of "theoretical" claims about grammar; and as a by-product of well-intentioned efforts to improve student writing and speech.

Most of the advice literature on English is the product of individual people -- essayists, poets, editors, journalists, literary scholars, lawyers, translators, and other people who deal in a practical way with language -- who see themselves as serving as arbiters of style as well as guardians of the formal standard written language.  There's plenty of room on matters of style for the arbiters to retail their personal likes and dislikes as instructions to others.  But as far as I can tell, the impulse to impose personal taste has played no significant role in the rise of the NIC zombie.

But "theoretical" considerations surely have.  There is a widespread belief that sentences -- in both writing and speaking -- should be "complete", not fragmentary, in fact that complete SENTENCES are signs of "complete", well-ordered THOUGHTS (and that incomplete, fragmentary sentences are signs of incomplete, disordered thoughts).  The underpinning belief is that the superficial syntactic form of sentences is a direct reflection of the structure of the thoughts these sentences convey.  This is a very silly idea, and when it's combined with an almost exclusive attention to single sentences, rather than organized discourses, it yields the claim that fragmentary sentences are very bad things.

(Animosity towards fragmentary sentences has had occasionally pernicious results -- perhaps, most famously, in claims by Bereiter and Englemann back in the '60s that kids, or at least impoverished black kids, who answered wh-questions (Where is the monkey?) with fragments rather than full sentences (In the tree rather than The monkey is in the tree) were betraying an inability to think clearly.  The recommended treatment for their deficit in thinking was drilling on always producing complete sentences in answers to questions.)

NIC can be seen as just a special case of No Fragmentary Sentences.  The function of conjunctions like and, but, and so -- the only function of such conjunctions, it is claimed -- is the joining of phrases of like type, so that a sentence that begins with one of these words is missing the clause that is to be joined with the clause that follows the conjunction, and that sentence is therefore only fragmentary.  (Yes, I know, the clause is right there in the previous sentence, but we're supposed to be looking only at single sentences here.)  If you take all the beliefs and claims above literally, you are led to the conclusion that NIC is not only true, but necessarily true.

But few advice manuals are willing to go all the way with this "theoretical" argumentation.  For example, Diana Hacker's A Pocket Style Manual, 4th ed. (Boston: Bedford/St. Martin's, 2004), p. 48, tells the student, "As a rule, do not treat a piece of a sentence as if it were a sentence" and goes on to classify fragments into two types: "Some fragments are clauses that contain a subject and a verb but begin with a subordinating word.  Others are phrases that lack a subject, a verb, or both."

Notice: "a SUBordinating word".  Hacker's rule applies only to subordinate clauses.  And, indeed, in the text that follows there's a list of sample words that begin subordinate clauses.  And, but, and so are not on this list; by inference, they are allowed in sentence-initial position.

It's likely that the main justification for NIC comes instead from well-intentioned attempts to improve student writing and speaking.  Initial and is the first sentence connective acquired by most English-speaking children, and they use it heavily in their speech; of course they do, since for a while it's all they've got for indicating connection between sentences.  Heavy use of  sentence-initial and and (logical/temporal) so continues through childhood and into adulthood, in both speaking and writing, with then and and then as additional variants in narratives.  Observe the discourse organization of the Coasters' rousing "Along Came Jones", from about 45 years ago:

And then he grabbed her (And then)
He tied her up (And then)
He turned on the bandsaw (And then, and then...!)

And then along came Jones
Tall thin Jones
Slow walkin' Jones
Slow talkin' Jones
Along came long, lean, lanky Jones

Teachers quite rightly view this system of sentence connection as insufficiently elaborated, and they seek ways of getting students to produce connectives that have more content than vague association or sequence in time.  At some point, I speculate, they applied ZT-1, "If they do it too much, they should be told not to do it at all", and NIC, a blanket proscription, was born.  Probably in elementary schools, from which it would have diffused to secondary schools and beyond.  And now the zombie lurches on, possibly inside your own computer; it's inside mine, thanks to Microsoft Word for Mac OS X.

Once NIC is out there, it will persist.  Any fool with a claim to authority and either students or a publisher can get a rule ON the books, but there is absolutely no mechanism for getting rules OFF.  People think that rules are important, and they are reluctant to abandon things they were taught as children, especially when those teachings were framed as matters of right and wrong.  They will  pass those teachings on.  They will interpret denials of the validity of such rules -- even denials coming from people like Garner and Fiske, who are not at all shy about slinging rules around -- as threats to the moral order and will tend to reject them.  I've had some success convincing some students and friends that some of the rules they were taught are not good rules to live by -- but my success depends on their willingness to listen to me and their willingness to question their beliefs, two qualities that are not widespread in the general population.

So our little play goes: ZT-1 contributes significantly to the rise of NIC and then Garner's Rule, though these originally have different audiences.  Eventually, the two proscriptions clash, and, in my telling of the story, NIC is mortally wounded, but continues to wander the landscape as a zombie.  Garner's Rule survives, in a community of like-minded souls pugnatiously defending themselves against the opinions of linguists and the practices of many of the neighbors.  Nothing is ever resolved.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:49 PM

How to defend yourself from bad advice about writing

In one of the NaNoWriMo forums, Elrina asks for help on "How to Avoid Passive Tense?":

Okay, I admit it. I am a shameless user of passive tense. I've been involved with a power-struggle with one of my writing profs on campus (creative nonfiction class) about the tense, and I think it's finally time for me to concede. However, she seems to think that I should just inherently know other ways to word things. And, of course, there's the issue that I don't think I quite understand passive tense, because the things she's been marking as "wrong," are not passive tense as I was taught. I guess I tend to say things like "I was doing this," "there were these things," etc.

A specific sentence I've been playing with recently:

"Thomas was relieved when the car finally pulled onto the highway."

So, any thoughts would be awesome.

Well, Elrina, here are a few thoughts.

First, there's nothing wrong with "Thomas was relieved when the car finally pulled onto the highway". At least, there's no grammatical or stylistic reason for you to reword it.

Second, that sentence is probably not really an example of the passive voice, unless you mean that Thomas was relieved in the sense that his replacement arrived for duty. If you mean that Thomas was relieved in the sense that he felt a lessening of anxiety, then the construction is an example of what the Cambridge Grammar of the English Language calls an "adjectival passive". CGEL observes that "adjectival passives are passive only in a derivative sense". (More on this in a later post, or go read pp. 1436-1440 of CGEL if you're curious and impatient. A clue: you can say "Thomas was very relieved when the car finally pulled onto the highway", but not "Thomas was very run over when the car finally pulled onto the highway".)

Third, your other two examples -- "I was doing this" and "there were these things" -- are definitely not passives in any sense at all. If your writing prof is really telling you that things like this are wrong because they're in the passive voice, then she's certainly ignorant and probably incompetent.

Fourth, there's nothing intrinsically wrong with using the passive voice. All the best writers do it, some of the time. See the list of posts at the bottom of the page for some examples -- if you're eager, go here to find some examples of passives in the writing of E.B. White, who was the White in Strunk & White, or here, to examine the passive practices of professor Strunk himself..

Fifth, and least important, the traditional terminology is "passive voice", not "passive tense". The term tense deals with ways of expressing concepts of time, like present and past; the term voice deals with ways of arranging the arguments of a verb, as in "The ninja explained the concept of passive to the writing teacher" (which is an example of active voice), vs. "The concept of passive was explained to the writing teacher by the ninja" (which is an example of passive voice).

There's also aspect, which deals with ways of expressing action (or being) in respect of its inception, duration, or completion. Putting voice, tense and aspect together, we can create a little paradigm of some variations on one of your examples.

I do this. [active voice, present tense]
I am doing this. [active voice, present tense, progressive aspect]
I did this. [active voice, past tense]
I was doing this. [active voice, past tense, progressive aspect]
This is done (by me). [passive voice, present tense]
This is being done (by me). [passive voice, present tense, progressive aspect]
This was done (by me). [passive voice, past tense]
This was being done (by me). [passive voice, past tense, progressive aspect]

So your example, the one in bold, was in the active voice.

Speaking of ninjas and writing teachers, the ninja is this cartoon by Nic Bommarito seems to know her grammar:

Of course, we here at Language Log don't recommend or even condone the murder of English professors, though we do feel that some of them ought to sit in on a linguistics course or two, or maybe read a good student grammar. And if you want to be able to stand up to them, Elrina, you might invest in such a grammar yourself, and perhaps in a good usage guide while you're at it. What's in those books might even help your writing, but in any case it'll help you keep your writing teachers from wasting your time.

Following Elrina's question, the NaNoWriMo forum has three pages of interesting answers. These make it clear that most people believe that "passive" has something to do with whether or not the subject is an agent, and perhaps also something to do with overall dynamism, vividness or concreteness. For example, "Corvus" defends the use of passive in this way:

While I do not advocate a sudden embracing of the passive voice, I do advocate a less strident opposition to it. It's not always the wrong voice. For example:

"The horrific thought occured to me that I was on the wrong train, headed for Paris instead of Berlin."

I challenge the reader to reconstruct this idea in active voice and maintain the flavor.

That challenge is hard to meet, since the sentence is already in the active voice. The issue for Corvus seems to be that the subject of the main clause is "thought" rather than "I".

In another comment, "Cpt. Nemo" suggests switching Elrina's problem sentence from

Thomas was relieved when the car finally pulled onto the highway.

to one of

Thomas felt relieved when the car finally pulled onto the highway.
Thomas gave a sigh of relief when the car finally pulled onto the highway.

and "paintbyletters" responds that

The third is definitely the most active, because Thomas is acting. I, as the reader, sigh in relief right along with him. OTOH, to tell me that Thomas felt relief or was relieved distances me from Thomas. I don't care quite as much, and overuse of passive verbs will have a cumulative effect on your reader's interest in your characters.

I don't care at all, myself. I've given up, for the moment, on wishing that people would use grammatical terminology in a coherent way, and instead, I'm asking myself whether any of this writing advice makes any sense. Specifically, I wonder whether there's any evidence that a narrative is better if it has a higher proportion of verbs that denote actions, whose subjects are human agents.

Let's do a quick sanity check, by looking at the openings of a few successful novels, pulled (literally) off the shelf at random. Exercise for the reader: what proportion of the clauses have a verb denoting an action, with an agentive subject? Would these novels have been better if that proportion were higher?

It was August, and it shouldn't have been raining. Perhaps rain was too strong a word for the drizzle that blurred the landscape and kept my windshield wipers going. I was driving south, about halfway between Los Angeles and San Diego. [Ross Macdonald, The Far Side of the Dollar]

The Channel Club lay on a shelf of rock overlooking the sea, toward the southern end of the beach called Malibu. Above its long brown buildings, terraced gardens climbed like a richly carpeted stairway to the highway. The grounds were surrounded by a high wire fence topped with three barbed strands and masked with oleanders. [Ross Macdonald, The Barbarous Coast]

The law offices of Wellesley and Sable were over a savings bank on the main street of Santa Teresa. Their private elevator lifted you from a bare little lobby into an atmosphere of elegant simplicity. It created the impression that after years of struggle you were rising effortlessly to your natural level, one of the chosen. [Ross Macdonald, The Galton Case]

Moran's first impression of Nolen Tyner: He looked like a high risk, the kind of guy who falls asleep smoking in bed. No luggage except for a six-pack of beer on the counter and the Miami Herald folded under his arm. [Elmore Leonard, Cat Chaser]

A friend of Ryan's said to him one time, "Yeah, but at least you don't have to take any shit from anybody."
Ryan said to his friend, "I don't know, the way things've been going, maybe it's about time I started taking some." [Elmore Leonard, Unknown Man #89]

The marriage wasn't going well and I decided to leave my husband. I went to the bank to get cash for the trip. This was on a Wednesday, a rainy afternoon in March. The streets were nearly empty and the bank had just a few customers, none of them familiar to me. [Anne Tyler, Earthly Possessions]

He -- for there could be no doubt about his sex, though the fashion of the time did something to disguise it -- was in the act of slicing at the head of a Moor which swung from the rafters. It was the colour of an old football, and more or less the shape of one, save for the sunken cheeks and a strand or two of coarse, dry hair, like the hair of a cocoanut. [Virginia Woolf, Orlando]

I'll make my report as if I told a story, for I was taught as a child on my homeworld that Truth is a matter of the imagination. The soundest fact may fail or prevail in the style of its telling: like that singular organic jewel of our seas, which grows brighter as one woman wears it and, worn by another, dulls and goes to dust. Facts are no more solid, coherent, round, and real than pearls are. But both are sensitive. [Ursula K. LeGuin, The Left Hand of Darkness]

Early this morning, 1 January 2021, three minutes after midnight, the last human being to be born on earth was killed in a pub brawl in a suburb of Buenos Aires, aged twenty-five years, two months and twelve days. If the first reports are to be believed, Joseph Ricardo died as he had lived. [P.D James, The Children of Men]

My quick count says that out of 39 tensed verbs, 7 (or about 18%) denote actions and have an agentive subject. Two of the seven are "said" -- hardly the most dynamic action around -- and if we discount those, we're down to 13%.

Do you see any places where the text would be improved by substituting some active-voice transitive verbs denoting actions, with human agentive subjects? I don't. The next time someone tells you to "avoid passive", -- apparently meaning that you should use verbs denoting actions with human agents as subjects -- why not ask them to define their terms, and to back up their advice with some evidence?

Other LL posts on passive voice:

"Passive voice and bias in Reuters headlines about Israelis and Palestinians" (12/17/2003)
"The passivator" (4/6/2004)
"Two out of three on passives" (5/8/2004)
"Hey folks, 'passive voice' != 'vague about agency'" (5/31/2004)
"Tossing technical terms around" (8/5/2005)
"Voice confused with tense at the Economist" (3/13/2006)
"Diagnosing soup label syntax" (6/29/2006)
"Passive aggression" (7/18/2006)
"How long have we been avoiding the passive, and why?" (7/22/2006)
"The ancient roots of passive avoidance" (7/23/2006)
"When men were men, and verbs were passive" (8/4/2006)
"The direct and vigorous hyptic voice" (8/5/2006)
"Free verbs" (8/5/2006)
"Avoiding passive for dummies" (9/25/2006)
"School shootings and passive constructions" (10/10/2006)
"The passive in law" (10/16/2006)
"If they do it too much, they should be told not to do it at all" (10/31/2006)

Posted by Mark Liberman at 07:05 AM