December 31, 2004

Implicature in the service of a moral panic

There's a lot to be unhappy about in J. L. King's recent book, On the Down Low (Broadway Books, 2004), but most of it doesn't have to do with linguistic issues. However, one of King's aims in the book, to sound the alarm about "straight" black men who have sex with men and the danger they present to black women, is furthered by the way he phrases his warnings. Implicature is working in the service of a moral panic over DL men.

[Added 1/1/05: Linguistic and sociological background (perhaps more than you wanted to hear, but people have been asking)... "MSM" stands for "men who have sex with men", which does not refer, as you might have imagined from working through simple compositional semantics, to men who at least sometimes engage in gay sex, but is a technical term in the social services and sociological literature for one specific subset of such men, namely men who consider themselves to be "straight" but seek out and engage in gay sex with some frequency while concealing their form of sexuality from most of the world. MSMs -- don't fuss at me about this pluralization -- are thus to be distinguished from what I call "frankly" gay and bisexual men, who identify as either "gay/homosexual/queer" or "bisexual", can be either closeted or open about their sexuality, and can be either sexually active or not; MSMs are by definition both closeted and sexually active. The expression "the DL" stands for "the Down Low". Men "on the DL" (or "in the DL life(style)"), also known as "DL men", are the black-on-black variant of MSMs, where "black" is a synonym for "African American". So "DL" is a specifically U.S. expression, and it's a folk term. In contrast, I've seen "MSM" used in U.S., Canadian, U.K., and Australian contexts, usually as a technical term; white MSMs mostly have no term for their sexual activity (which they tend to view as something more akin to a hobby than a sexuality), though they do recognize the ordinary English expression "men who have sex with men" as applying to them, which is why the expression is useful in attempts at outreach to MSMs, of whatever race or ethnicity.]

Life on the DL was last discussed here back in July. Since then I've read King's book, with some dismay, though possibly not as much as Keith Boykin, as expressed in his article "Not just a black thing" in The Advocate of 1/18/05, pp. 31-3 (Boykin, an openly gay and politically very active black man, has a book Beyond the Down Low soon to be published.) Boykin is particularly outraged at King's claim that men on the DL are the vector for the spread of HIV/AIDS among black women; he notes that the phenomenon of "straight" men having sex with men is as widespread among whites as blacks, yet HIV/AIDS is much more prevalent among black women than among white women. So Boykin sees King as engaged in a campaign of blaming the gays.

My dismay, on the other hand, is over King's fomenting a moral panic -- in this book, in television interviews, and in public appearances -- about the prevalence of DL life. Now, King lived on the DL himself for years, and he's writing about what he knows, which includes a very large number of men on the DL, so in his world the Down Low is all over the place. This view of the world finds expression in the way he frames his warnings to black women (the bold-facing is mine):

The first thing I want to say to women who are seeking DL signs or behavior traits is that not every black man is on the DL. Not every black man in your life is on the DL. (p. 128)
I also want to make sure that I am clear when I say not all black men are living a double life, a double lie. (p. 129)

Well, that's generous. Not all gay men are child molestors. Not all straight men are gay-bashers. And so on. But all these assertions that not all Xs are Y implicate that a large percentage of Xs are Y. King might well believe, given his life history, that a large percentage of black men are on the DL, but this claim is almost surely false. Estimates are hard to come by, but many studies put the percentage of men who are gay in a broad sense (including bisexuals and, despite their self-identifications, MSMs) at around 5%. Let's opt instead for the larger figure of 10% that is often bandied about. Surely MSMs are no more than half of this population, probably a good deal less. So a very generous estimate of the incidence of guys on the DL is 5%. Not an insignificant number, but scarcely a omnipresent threat to women.

King's recommendations to women are in line with the sense of omnipresent threat and moral panic that he projects. Although he concedes that

You cannot be with him twenty-four hours a day. You don't have access to his e-mail accounts or his phone calls, and you don't always know if he is where he tells you he is. (p. 128)

this very phrasing suggests that it might be a good thing if you (the typical black woman) could do all these things. He counsels vigilance ("A woman should... keep up with her man's comings and goings" (p. 131), "A woman should know her man's schedule" (p. 132), "Come home early from work one day; surprise him" (p. 132)) for pages, offering detailed accounts of how guys on the DL cheat and lie, how they hook up with one another, and so on, and even suggests drastic action:

I don't have a "sure list of signs" that will give women the answers they seek. Women can hire a private investigation company or get a very masculine-acting and -looking gay man and have him approach their man.

No, I'm not making this wild-eyed paranoia up. The irony here is that men in general, and MSMs especially, complain that their women are too controlling, interfering, demanding, full of "drama and hassles" (p. 142), so that they feel the need to escape the ol' ball and chain for undemanding companionship with other men (which for MSMs comes with easy uncommitted sex).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:45 PM

Waves, contagion, and the spread of linguistic changes

Reporting on site from the MLA meetings, Geoff Pullum tells us that

Labov has found a spreading sound shift in inland Northern cities (Buffalo, Cleveland, Toledo, Chicago, but not Columbus or Indianapolis) stopped dead by a line where the prevailing ideology of Northern Yankees (anti death penalty; pro gun control) ceases to be a typical feature of local political opinion — a sharp and rather unexpected intrusion of ideology on the course of linguistic change.

The image Geoff uses here is like that of a spreading wave, which is halted by natural barriers. But wave trains are not a particularly good metaphor for the spread of linguistic changes. Contagion is another attractive metaphor, but it too is unsatisfactory in most cases. When we look at the micromechanisms at work in the spread of variants, though, the "barrier" effect that Labov observes is not at all surprising (though it's nice to have it illustrated so dramatically).

Quite possibly Labov said this in his MLA talk, but let me say it here anyway.

For a wave to propagate, all it takes is for molecules to be in contact. The molecules that have energy transmitted to them don't have to be receptive to it; they just have to be there. The spread of linguistic variants doesn't have this automatic quality. Merely hearing some variant doesn't cause you to adopt it; you have to be receptive to it.

The contagion metaphor is a bit better, since it incorporates some notion of susceptibility to the disease, which translates in the linguistic case into some kind of receptivity to a new variant. And in some cases -- notably the spread of many lexical items -- this seems to be an adequate picture of the events in question.

But the spread of syntactic, morphological, and phonological variants requires more than contact and simple receptivity to new variants. You have to view yourself as like the people who use the variant, so that you'll be willing to accommodate to their speech to some extent. These variants spread locally, by interaction among companions. Insofar as you don't identify with people -- say, because you hold to very different ideologies -- even if you're in regular contact with them, you're unlikely to adopt their behaviors.

So there is a barrier, a social and psychological one.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:42 PM

booger blogger anaphora

ABC News has named bloggers as "People of the Year", for "providing unique insight into people's thoughts." The ABC article starts this way:

A blog — short for "web log" — is an online personal journal that covers topics ranging from daily life to technology to culture to the arts. Blogs have made such an impact this year that Merriam-Webster named it the word of the year.

By "it" ABC of course means "the word blog". This is a good example of a pronoun referring to an entity that is evoked but not named, a phenomenon that has recently come to be known as booger anaphora.

And by the way, Merriam-Webster's choice was not based on someone's subjective evaluation of lexical impact, but rather on a simple count of word lookups on their free internet site.

Posted by Mark Liberman at 11:41 AM

December 30, 2004

Wild Linguists

Much-maligned author Dan Brown is one of the few lay people who shows a proper appreciation for linguists, but not the only one. I just turned on the TV (purely for research purposes, of course) and came upon an episode of Friends. Ross was explaining why he wouldn't cancel his date for the evening:

She's an Assistant Professor in the Linguistics Department. They're WILD!
So there.

Posted by Bill Poser at 11:16 PM

CASS corpus

According to a recent wire story from Xinhua

Chinese linguists are going to complete China's largest database of spoken Chinese, on the basis of which they will compile the country's first modern spoken Chinese dictionary and grammar book.

Shen Jiaxuan, director of the Chinese Academy of Social Sciences (CASS) Institute of Linguistics, said the database include three sub bases such as a live Chinese conversation base whose data were collected in Beijing, a base consisting of six dialects of Shanghai, Xi'an, Guangzhou, Beijing, Chongqing and Xiamen, and a base of phonetic symbols of modern spoken Chinese.

The live conversation base now has 650 hours of live conversations recorded in Beijing, which were transferred to 8.9 million words in transcript.

The English-language web page for the CASS Institute of Linguistics is here.

I wasn't able to find out whether the recordings and transcripts will be published. I hope so -- the corpus-based dictionary and grammar will be even more valuable if the base materials are also available to scholars, as the British National Corpus and many other large linguistic corpora are.

The cited numbers for the Beijing conversational recordings and their transcripts (650 hours of conversations, 8.9 million "words") add up to about 228 "words" per minute. This leaves me uncertain about whether this should be taken to mean "words" in the lexicographic sense, or "characters" as they would be used in transcribing the conversations into normal Chinese orthography. The Chinese writing system doesn't separate "words" by spaces or any other marks, but the project aims at producing a dictionary, which will certainly be organized in terms of multi-character words, just as (for instance) CEDICT or the ABC Chinese Dictionary is. Thus e.g. dian4 shi4 ji1电视机, meaning"television (set)", is three syllables, three characters, but one (lexicographic) word.

228 whatevers per minute seems too fast for the units to be words, which I think should average about two syllables each in Beijing conversation, but it's kind of slow for syllables in conversational speech. It's probably syllables (=characters), though, with some pausing accounting for the slower rate.

Anyhow, it's terrific to see this development.

[via Victor Mair]


Posted by Mark Liberman at 11:20 AM

McCloskey on the rebounding of Irish

When Bill Poser recently pointed out to us an article about the revival of Irish, it occurred to me to ask a colleague about the subject. Jim McCloskey, Professor of Linguistics at the University of California, Santa Cruz, is one of the foremost experts in the world on the modern Irish language, and certainly the most prominent of those taking an interest in theoretical linguistics. His brilliantly worked-out and impeccably detailed theoretical works on Irish have been appearing since the late 1970s, along with philological notes (published not just about but in Irish) and a couple of popular works for an Irish audience. He regularly visits the Irish-speaking areas of Ireland to do fieldwork, and is just back from there. Here are his reactions, presented (in green, naturally) as a guest post. —GKP

Although I haven't seen the original article that Bill Poser reports on, I'll try to say something in response to his report of it. I've just come back from three and a half months in Ireland, much of that time spent in discussing these issues with a range of people (academics, teachers, broadcasters, writers, friends ...).

I think that talk of a ‘rebound’ for the language is misplaced, but I do not equate that position with pessimism. The situation is a complex and fluid one, but largely it seems to me that things are on the same trajectory that they have been on for several decades (with a couple of interesting changes). By which I mean that the traditional Irish-using communities (the Gaeltachtai/) continue to shrink and the language continues to retreat in those communities. Nobody that I know who is involved in those communities is optimistic about their future as Irish-speaking communities (though lots of other good things are happening to them and in them).

The observers I trust most (friends and colleagues engaged in intense fieldwork in Gaeltacht communities) maintain that the process of normal acquisition (for Irish) ceased in most areas in the middle 70's, and it is now increasingly difficult to find people younger than about 30 who control traditional Gaeltacht Irish. If you walk along a road in a Gaeltacht area and try to listen for the language being used by groups of teenagers and children by themselves, it is always (in my recent experience) English. Someone I know who is the principal of a primary school in the Donegal Gaeltacht reported that of the 22 children who entered his school at the beginning of the current year, only two had, in his judgment, sufficient Irish.

So traditional Gaeltacht Irish will almost certainly cease to exist in the next 30 years or so.

But what is unique in the Irish situation, I think, has been the creation of a second language community now many times larger than the traditional Gaeltacht communities (I think that 100,000 is a reasonable estimate for the size of this community). And being a part of that community is a lively and engaging business. A friend of mine who produces a weekly current affairs program in Irish on TV reports that it is always possible to do a report on whatever topic they like in any part of the country and find people who are willing and able to do the business in Irish. And it is true that certain recent developments have boosted this community and its self-confidence---the success of some poets (Celia de Fre/ine) and musicians (Liam O/ Maonlai/, John Spillane, Larry Mullen), the availability of an Irish TV channel, a vigorous presence on the net, and the opening of two trendy coffee-shops in the center of Dublin.

There is a great range of varieties called `Irish' in use in this community. People like me speak a close approximation of traditional Gaeltacht Irish and there are people who speak new urban calques, heavily influenced by English in every way. For the communities of children growing up around Irish-medium schools in urban centres it may be right to speak of pidginization and creolization (along with a lot of clever inter-language play like the recent ‘cad-ever’). Many teenagers are thoroughly bidialectal, switching easily from the version of Gaeltacht Irish they have from their parents to the new urban varieties in use among their peers.

It will be interesting to see what happens to these varieties when the model of Gaeltacht Irish becomes a memory, but one thing that is clear is that this community is not going to fade away just because the Gaeltacht fades away.

And maybe that is what a half-successful language maintenance effort is going to look like (maybe that is the best that can be hoped for). It seems to be very difficult to work against the historical processes that lead to language-shift. But what the Irish experience teaches us is that it is far from impossible to create a new community of second-language users with all the usual and lively trappings (literature, music, radio, TV, journalism, schools, politics).

Of course, what is ‘maintained’ or ‘revived’ in this process, is very different indeed from the language which was the original focus of revivalist efforts. But in this context, as in most, purism is surely misplaced.

Posted by Geoffrey K. Pullum at 07:00 AM

And a right guid willie waught to you, too, pal

We like the incantations we recite on ritual occasions to be linguistically opaque, from the unparsable "Star-Spangled Banner" (not many people can tell you what the object of watch is in the first verse) to the Pledge of Allegiance, with its orotund diction and its vague (and historically misanalyzed) "under God." But for sheer unfathomability, "Auld Lang Syne" is in a class by itself. Not that anybody can sing any of it beyond the first verse and the chorus, before the lyrics descend inscrutably into gowans, pint-stowps, willie-waughts and other items that would already have sounded pretty retro to Burns's contemporaries. But it's my guess that most people take the first two clauses of the song as the protases of a conditional, rather than as rhetorical questions. True, most versions of the lyrics end the lines with question marks (this is the most familiar version, a little different from Burns's):

Should auld acquaintance be forgot
And never brought to mind?
Should auld acquaintance be forgot
And days o' auld lang syne?

But a fair number of people leave out the question marks (for example here, here, and here), which suggests that the interrogative force isn't obvious. It may make no sense that way -- "if old friends should be forgotten, we'll drink to bygone days anyway." But incomprehensibility only adds to the sense of immemorial tradition, even this happens to be a borrowed one, grafted onto American culture in 1929 by a Canadian of Italian ancestry.

As Eric Hobsbawm has observed, after all, the point of invented traditions like the kilt or the Pledge is to provide "emotionally charged signs of club membership rather than the statutes and objects of the club." And what could be more evocative than a New Year's song that's sodden with quaintly impenetrable phraseology? "Is not the Scotch phrase Auld Lang syne exceedingly expressive?" Burns wrote to a friend in 1793. Well, it works for me, anyway.

Posted by Geoff Nunberg at 02:12 AM

December 29, 2004

Fast times in the primate brain: no special providence

An article published in Cell today compared the apparent rate of genetic evolution in four cases: nervous-system genes vs. "housekeeping" genes in primates vs. rodents (here, if you've got a subscription). The abstract:

Human evolution is characterized by a dramatic increase in brain size and complexity. To probe its genetic basis, we examined the evolution of genes involved in diverse aspects of nervous system biology. We found that these genes display significantly higher rates of protein evolution in primates than in rodents. Importantly, this trend is most pronounced for the subset of genes implicated in nervous system development. Moreover, within primates, the acceleration of protein evolution is most prominent in the lineage leading from ancestral primates to humans. Thus, the remarkable phenotypic evolution of the human nervous system has a salient molecular correlate, i.e., accelerated evolution of the underlying genes, particularly those linked to nervous system development. In addition to uncovering broad evolutionary trends, our study also identified many candidate genes—most of which are implicated in regulating brain size and behavior—that might have played important roles in the evolution of the human brain.

Steve Dorus, Eric J. Vallender, Patrick D. Evans, Jeffrey R. Anderson, Sandra L. Gilbert, Michael Mahowald, Gerald J. Wyckoff, Christine M. Malcom, and Bruce T. Lahn. "Accelerated Evolution of Nervous System Genes in the Origin of Homo sapiens". Cell, Vol 119, 1027-1040, 29 December 2004

The lede of the 12/29/2004 Guardian report by Alok Jha has a more breathless, not to say awestruck, take on these results:

The sophistication of the human brain is not simply the result of steady evolution, according to new research. Instead, humans are truly privileged animals with brains that have developed in a type of extraordinarily fast evolution that is unique to the species.

An article by Ronald Kotulak in the Chicago Tribune takes a similar tack, saying that "brain-building genes mutated at a tremendously rapid rate in humans, compared with the brains of chimpanzees, macaque monkeys, rats and mice".

But there's nothing in the Cell article to support the view that humans are "truly privileged" as a result of "extraordinary fast evolution", or that the rate of genetic evolution responsible for the human brain has been "tremendously rapid". The article provides strong evidence that brain size and complexity has been subject to greater selective pressure among primates than among rodents, and that this is especially true of the human lineage among the primates. But the calculated rates of brain evolution at the molecular level are small relative to (say) snake venom or virus coat proteins.

At first I thought that the Guardian's science writer had interpreted the research overenthusiastically, as journalists sometimes do, but both the Guardian and Tribune articles includes quotes suggesting that (some of) the researchers -- at least Bruce Lahn -- are complicit in this (what I take to be) misrepresentation, and perhaps even primarily responsible for it. From the Tribune article:

The findings, Lahn said, disprove the contention of other scientists who say the evolutionary process leading to the bigger human brain was simple adaptation to change--like growing bigger antlers, longer tusks or gaily colored feathers.

"We've proven that there is a big distinction," he said. "To accomplish so much in so little evolutionary time--a few tens of millions of years--requires a selective process that is categorically different from the typical processes of acquiring new biological traits."

And from the Guardian, again quoting Lahn:

"The making of the large human brain is not just the neurological equivalent of making a large antler. Rather, it required a level of selection that's unprecedented."

Researchers have recognized for a long time that evolution has focused on brain size and complexity in the primate and especially hominid lineages, based on measurements of brain size relative to body size -- the numbers on the right in the picture above (from the Dorus et al. article) are (ratios of) encephalization quotients (here based on the equation EQ = BrainWeight/BodyWeight0.28), indicating that macaques have about 6 times as much brain relative to their body size as rats and mice do, while humans have about 9 times as much. (You'd get somewhat different numbers by using a different exponent in the EQ equation, but under any reasonable interpretation, primates are more encephalized than rodents, apes are more encephalized than other primates, and humans are more encephalized than other apes). What the Cell article adds to this quantitative evidence from gross anatomy is quantitative evidence based on molecular changes in DNA sequences, showing that nervous-system-related genes have evolved faster in primates than in rodents. There's no attempt to show that the rate of primate brain evolution is "extraordinarily fast", just that it's faster than brain evolution among rats and mice.

The researchers looked at 214 genes "demonstrated to play important roles in the nervous system", "expressed exclusively or predominantly in the brain", or "implicated in various diseases of the nervous system, such as brain malformations, mental retardation, and neurodegeneration". They compared these nervous-system-related genes with a set of 95 "housekeeping" genes, which are "involved in the most basic cellular functions such as metabolism and protein synthesis" and "exhibit ubiquitous expression" (i.e. in all tissues and situations). The rate of protein evolution was estimated in terms of the ratio between nonsynonymous ( Ka) and synonymous (Ks) substitution rates in the DNA sequence. ("Synonymous" substitutions don't chance the amino acid that's coded for, while "nonsynonymous" substitions do.) In the primate line, humans and macaques were compared; in the rodent line, mice and rats were compared. As graph A below shows, the Ka/Ks ratio was about 37% higher for the nervous-system genes in primates vs. in rodents, while the ratio for housekeeping genes was about the same for both genera. Graph B shows that about 55% of the nervous-system-related genes had higher Ka/Ks ratios in primates, as opposed to about 36% with higher ratios in rodents, while the balance was nearly even for housekeeping genes.

As the Cell article explains, these results could have two explanations:

One is stronger positive selection on nervous system genes in primates than rodents. The other is weaker functional constraint on these genes in primates. We argue that the possibility of weaker constraint seems unlikely, on the basis that the primate nervous system is far more complex (and therefore likely demanding greater precision in gene function) relative to the rodent nervous system.

In order to add to the plausibility of their favored explanation, the authors further subdivide the nervous-system-related genes into three groups:

The developmentally biased subgroup contained 53 genes that included patterning signals of the developing nervous system, downstream components of such signals, transcription factors that specify neuronal phenotypes, and regulators of neural precursor proliferation, apoptosis, differentiation, migration, and morphogenesis. The physiologically biased subgroup had 95 genes, comprised predominantly of neurotransmitters, their synthesis enzymes and receptors, neurohormones, voltage-gated ion channels, synaptic vesicle components, factors involved in synaptic vesicle release, metabolic enzymes specific to neurons or glia, and structural components of the nervous system. The unclassified subgroup contained the remaining 66 genes.

The "developmentally biased" subgroup shows the greatest effect, suggesting that the key thing is brain size, structure and organization, not nervous-system physiology.

There's more interesting stuff in the article, including evidence from comparisons including chimps and squirrel monkeys that the rate of nervous-system evolution at the molecular level has been greater in the hominin lineage than in the rest of primate evolution, and a (tentative) argument that human brain evolution has involved "a very large number of mutations in many genes" rather than "a small number of key mutations in a few genes".

But there's no support in the Cell article for humans being the "privileged" beneficiaries of "a type of extraordinarily fast evolution that is unique to the species", as Alok Jha tries to tell us in the Guardian. On the contrary, the authors write

Might genes involved in tissues other than the nervous system also display accelerated evolution in primates? We argue that this is a distinct possibility given the precedent found in nervous system genes. In particular, accelerated evolution of genes might be found in tissue systems that are especially relevant to the adaptation of primates, such as the immune system, the digestive system, the reproductive system, the integumentary system, and the skeletal system.

They don't ask whether genes under selective pressure in other species might show equally fast or faster changes -- but that's because they already know that the answer is "yes". In fact, the original perspective on this is that Ka/Ks ratios need to be greater than 1 to demonstrate that adaptive evolution is occurring at all, and you can see from the graphs above that the average ratio for nervous-system-related genes in the Cell paper was about 0.12. For lists of higher ratios, you can consult The Adaptive Evolution Database (TAED) here. TAED "is designed to provide, in raw form, evolutionary episodes in specific chordate and embryophyte (flowering plants, conifers, ferns, mosses and liverworts) protein families that might be candidates for adaptive evolution" and "contains a collection of protein families where at least one branch in the reconstructed molecular record has a KA/KS value greater than unity, or greater than 0.6."

For example, the sample page linked in the TAED database documentation includes the gene for the (snake venom) protein phospholipidase A2 in the Malayan Krait, Bungarus Candidus, with Ka/Ks of 2.27. Browsing for a minute or two, I found a cell adhesion protein in the zebrafish with Ka/Ks of 4.55. And on the lower links of the Great Chain of Being, the Ka/Ks ratio for sequenced genes of SARS virus during the epidemic in 2002-2003 was as high as 1.98.

Some individual genes among the "nervous-system related" set studied in the Cell paper had Ka/Ks as high as 0.833 in the primate comparison (that was MCPH1, "implicated in the control of brain size"), and the average of 0.12 was significantly higher than the rodent average of 0.09 for the same genes, but this is not some unprecedented, biologically unique level of extraordinarily fast evolution.

Given all of this, it's hard to understand why one of the paper's authors, Bruce Lahn, told the Chicago Tribune that "mutations in genes that build the brain exploded in the human line when humans split from monkeys 20 to 25 million years ago" (emphasis added). What's going on here?

There's a clue in another quote from the Tribune article, this one from Art Caplan at Penn's Bioethics Center:

"There is this desire for humans to have a privileged position in terms of other animals," he said. "We try to find in intelligence or language or something that seems to distinguish us, because we want to be more like the angels than like the animals.

"But, unfortunately, the animals keep talking and being social and using tools," Caplan said. "Every time we come up with something, they do it too."

The difference is that while some animals may have one or two of these attributes in rudimentary form, humans have all of them in abundance, he said.

"The new findings look like the human brain and its hyper-evolutionary development might give us that special status relative to all the other living things around us," Caplan said.

But there's no "hyper-evolutionary development" here, unless I'm missing something. We're just seeing the molecular correlates of the long-recognized increased encephalization of the human lineage, with rates of change that are not spectacular at all relative to thousands of other well-documented cases elsewhere in nature. The resulting degree of encephalization is unique, and clearly causes (and is probably also caused by) some qualitative differences in language, tools and social organization. But what's special about us is what we see and do every day, not the Ka/Ks ratios of the brain-related genes in our lineage.

[Update 12/30/2004: As expected, this story has spread widely in the press. The Independent writes about "evolutionary overdrive"; SciTech today features "unprecedented natural selection pressure" and an "explosion of genetic mutations in the brain"; and so on through more than 50 other stories so far indexed by Google News. ]


Posted by Mark Liberman at 11:24 AM

Don't worry

The tsunami and its ever spiraling death toll may have been getting you down. But don't worry. has had the following comforting headline up all day:

Swimsuit model survives tsunami

It links to sports illustrated which pictures the survivor herself. Hard to tell from the picture that she broke her pelvis, although she's obligingly undone the string which otherwise might have slightly obscured that part of her anatomy. [Yes. I know the picture is not actually meant to display her in post-tsunamic state. Perhaps I should have used a smiley here.]

Is it just me, or is this headline, with or without the implicature that swimsuit modeling helped her survive, bizarre?

Posted by David Beaver at 01:25 AM

December 28, 2004

Unredacted discussion

A little while back I sent the American Dialect Society list a link to Mark Liberman's posting on redact(ed), and we were off and running. Here I reproduce most of this discussion, unredacted ('not redacted'; see below for un-redacted 'with redaction undone'). As far as I now know, the verb redact (along with the derived noun redaction) began as a learnèd synonym for edit; developed a specialized sense in legal contexts; extended its usage in legal contexts; and then spread into more general usage as a (euphemistic) synonym for censor 'remove, black out', while preserving specialized uses in some contexts.

Mark's examples are of REDACTED being used as a reference to blacked-out bits of text that are "classified" or "sensitive" -- effectively, as a replacement for the unpleasant participle CENSORED. But the ADS-L discussion begins with John Baker's observation that in his world the verb redact is more specific than edit, scarcely overlaps with censor, and is genuinely useful:

(12/22/04) I think it's just a case of an obscure word being tapped to fill a need. I have to redact documents on a regular basis (i.e., edit them to remove identifying, privileged, or irrelevant information). If that were to be described as just editing them, it would not be clear without additional explanation, and I cannot offhand think of any other words that would make sense.

Redaction, in this sense, is what we do when we remove or disguise identifying information in corpora. Redaction, in this sense, is the imperative of our Institutional Review Boards and Human Subjects Committees.

I was up to bat next. I didn't quite get Baker's point (I preserve the typographical conventions of the original):

(12/22/04) i think the question here is: when and in what circumstances did "redact" develop from its general 'edit' sense (reported by NSOED from the mid-19th century) -- essentially, a fancy or technical *synonym* of "edit" -- to this more specific sense? the development is natural enough, but it wasn't inevitable (though, like all linguistic changes, from the point of view of the users of the innovative form it might seem so).
in any case, the long-established verb for this sort of activity was "censor" (and "black out" could easily have been specialized for this purpose; it describes well the particular method used for censoring, and is appropriately restricted to written or printed material [1]). at some point someone decided that "censor" needed replacement (and fixed on the learned verb "redact") -- undoubtedly because censorship is so, well, *nasty*. the development looks to me like linguistic laundering of vocabulary.
the development is recent enough that it's not in AHD4, which has only the older, more general, sense. i'm away from my dictionary trove at the moment, so i can't speak about other dictionaries. a lot of the google hits are for the older sense, but then there's:
Pixel-counting can un-redact government docs: A Luxembourgian/Irish security research team have presented a paper on a technique for identifying words that have been blacked out of documents, as when government docs are published with big strikethroughs over the bits that are sensitive to national security. (
Delta Dental Plan will redact all but the last four digits of the SSNs on electronically submitted documents and on ID cards. ... ( pdfs/Fall%202003%20Check%20Up.pdf)
"redaction" has a parallel sense in some contexts, not surprisingly.
[1] is "redact" ever used to describe the censoring of audio material, that is to describe bleeping (out)?

Then Ben Zimmer chimed in with an actual early citation:

(12/22/04) The earliest relevant cite on the Nexis database suggests that US government officials began using "redacted" as a synonym for "censored" in the '70s:
(Washington Post, Dec 19, 1978, A2) Prosecutors in the FBI break-ins case mistakenly circulated to defense lawyers highly classified material that is only supposed to be seen or discussed in a spy-proof vault.
Attorneys for three former top FBI officials charged in the case made the disclosure yesterday in a lively pretrial hearing where they protested Justice Department attempts to get the documents back for censoring as part of a proposal to place strict limits on collecting new information.
[...] The lawyers voiced special opposition yesterday to a government request that they return their clients' grand jury testimony to be "redacted" - censored - of material containing "sensitive compartmented information (SCI)."
A bit more from this case, from an AP wire story that appeared in the New York Times:
(New York Times, Dec 19, 1978, p. A12) Alan I. Baron, who represents the former acting FBI director, L. Patrick Gray 3d, said, "We are being denied the right to conduct a defense. The Government wants an unlimited right to redact information as to intelligence techniques, and that's what this case is all about."
The term "redact" has been adopted by the Government and in this context means censorship of classified material. Barnett D. Skolnik, a Justice Department lawyer, said, "We are redacting in good faith."
Seems clear that the lawyers representing the Government needed a euphemism for "censor" in this case -- it would be difficult for them to say, "We are *censoring* in good faith."

Next, John Baker went back into the legal literature and took things into the '50s:

(12/22/04) It may be that the term began as a legal term, which in large part it continues to be. It certainly predates the '70s. Here's an early use from 1957:
<Justice Bastow and I agree that feasible means should have been adopted to redact DeGennaro's confession and admissions,--before their introduction into evidence,--so as to restrict their contents to his own inculpations, and thus have avoided any possible prejudice to Lombard.> (People v. Lombard, 4 A.D.2d 666, 669 n.2, 168 N.Y.S.2d 419, 423 n.2 (N.Y. App. Div. Dec 10, 1957).)
Here's what the leading legal dictionary, Black's Law Dictionary (8th ed. 2004), has to say:
<redaction (ri-dak-sh<schwa>n), n. 1. The careful editing of a document, esp. to remove confidential references or offensive material. [Cases: Criminal Law 663; Federal Civil Procedure 2011; Trial 39. C.J.S. Criminal Law §§ 1210-1211; Trial §§ 148-153.] 2. A revised or edited document. -- redactional, adj. -- redact, vb.>
I don't think this is the same as censoring, although in some cases both terms might apply. Here's what Black's says about censor:
<censor (sen-s<schwa>r), vb. To officially inspect (esp. a book or film) and delete material considered offensive.>

At this point, I tried to tie the whole thing together:

(12/22/04) this definition of "censor" takes us pretty far afield. the relevant sort of censoring in our context is removal of material because of its possible information value to outsiders (not because it is confidential to the source or because it is offensive). think censoring of wartime letters.
it looks like "redact(ion)" started as a legal term with a specialized meaning (in particular, editing to remove references confidential to sources) and then extended its usage, still in legal contexts, to such editing done for other purposes; the word then encroaches considerably on "censor(ship)" in its restricting-information sense.

And then today Bethany Dumas returned us to Mark's original data:

(12/28/04) I recently received a lengthy legal document from a lawyer. A great deal of information had been removed from the document. The removal was accomplished in each instance by omitting a section of the document. For each instance of removal, this item appeared:

[Late-breaking ADS-L addition, 12/29/04: Doug Wilson observes that redact has shifted in the kind of direct objects it takes:]

It seems to me that the most pronounced novelty in recent use of "redact" is being ignored to some degree.
"Redact" historically means virtually exactly "edit" AFAIK. So if an editor alters a paper by deleting the entirety of its fourteenth paragraph it is conventional to say that the paper was edited, or that the paper was redacted. But I don't think it's conventional in such a case (until recently) to say that the fourteenth paragraph was redacted [or edited]; the fourteenth paragraph would conventionally be said to be deleted, removed, expunged, etc., even edited *out* ... but not just edited or redacted or altered (it's gone!).
Nowadays one sees "redact" applied specifically to the deleted material itself, so that "redact" not only has become specialized to "edit by deletions" (and after all most editing is more by deletions than otherwise) but has drifted away to the extent that it has come to mean "delete entirely", which is generally not within the range of unadorned "edit" or of traditional "redact": "I edited/redacted your paper" would not traditionally be used for "I deleted your whole paper", and "I edited the fourteenth paragraph" would not be the way to express "I removed the fourteenth paragraph entirely".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:41 PM

It's your choice at the MLA

John Strausbaugh's story about the Modern Language Association meeting currently taking place in Philadelphia (here if you register with The New York Times) is aptly described by Arnold Zwicky as pissy and snarky. Strasbaugh would have you believe that the meeting is all loopy feminists and rampant queers clamoring to be more outrageous than each other. It isn't. I'm at the MLA too, reporting for Language Log (a news source you can trust, one that has not had to dismiss a reporter and two editors for running dozens of faked stories).

Let me tell you a bit about the meeting. There are a staggering 774 separate events on the program. Yes, there is feminist research, and studies of gay literature, some serious, some less so. There are dull titles and provocative ones (abstracts are not published, so people try to create long and eye-catching titles; if you were trying to get your session noticed among 773 others, you would do the same). The question is whether you go looking for gaudy titles to make moronic jokes about or whether you have an interest in language and literature. If the latter, you can choose from a vast range of serious stuff.

I chose a presentation by sociolinguist William Labov this morning. Strikingly original, gripping in its import, compelling in its presentation. Labov has found a spreading sound shift in inland Northern cities (Buffalo, Cleveland, Toledo, Chicago, but not Columbus or Indianapolis) stopped dead by a line where the prevailing ideology of Northern Yankees (anti death penalty; pro gun control) ceases to be a typical feature of local political opinion — a sharp and rather unexpected intrusion of ideology on the course of linguistic change. Labov is a giant of the field with a constantly progressing research program (he is reportedly 75, but looks about 50 and works like a man of 30, so this age statistic is not very useful). The session was worth the price of registration all on its own, in the opinion of this reporter. And I could be wrong, but I don't think I saw Strausbaugh in the session. He may have been off somewhere being pissy and snarky and trivial.

Posted by Geoffrey K. Pullum at 09:37 PM

Here comes the accusative

Seth Kanter, author of Ordinary Wolves, on NPR's Morning Edition, 12/28/04:

People are used to these stories of Alaska that are romantic and beautiful, and flowing wilderness, and here comes me with, y'know, an assault rifle and a jug of R&R.

Note the bold accusative. And note that a nominative, as in here come I, just won't cut it.

The English construction with fronted motional adverbial -- Along came Jones, There goes the neighborhood, Into the valley of death rode the four hundred -- has been studied for a long time. One of its little peculiarities is that along with front placement of the adverbial goes inversion of main verb and subject. Uninverted examples are possible but usually marked: Jones came along 'Jones arrived', There the neighborhood goes, Into the valley of death the four hundred rode. The preference for inversion is at least in part metrical: inversion yields an alternating accent pattern on constituents, with the main accent on the final constituent, which is focused.

When the subject is a personal pronoun, however, the uninverted pattern is hugely preferred: Along he came, There it goes, Into the valley of death they rode. Again, the preference is at least in part metrical: personal pronouns are normally unaccented, so that the uninverted pattern has an alternating accent pattern on its constituents. With unaccented personal pronouns, the inverted pattern is unacceptable: *Along came hĕ, *There goes ĭt, *Into the valley of death rode thĕy.

Now, you'd think that this could be easily fixed, just by putting an accent on the inverted personal pronoun. Accented it is generally iffy, so I'll put it aside, but even the other pronouns don't sound great: ??Along came hé, ??Into the valley of death rode théy. What to do, what to do? Especially if you want the pronoun in final, focus position.

Well, this is where we came in, with the use of an accusative pronoun in the inverted construction: Here comes mé. Similarly, Along came hím, Into the valley of death rode thém. The third-person examples are much improved if the pronouns are clearly deictic rather than anaphoric; the first-person examples are already deictic, of course.

An incidental point: once we have accusative subjects, the third-person singular verb form comes in here comes me is just what we'd expect. English verbs in finite clauses agree with nominative subjects, but default to third-person singular otherwise; this sort of defaulting is very well known in other languages, and can be seen elsewhere in English (either it's Poor me is going to suffer for this or you can't say it at all; but certainly *Poor me am going to suffer for this is just out, as, for that matter, is *Poor I am going to suffer for this).

Ok, but what licenses accusative subjects? Putting aside some well-known complexities like coordinate subjects and also putting aside a slew of normative prescriptions, the basic rule for nominative/accusative choice in English is: nominative for subjects of finite clauses, accusative otherwise. This rule has to be understood literally: only subjects of finite clauses; things understood, or interpreted, as subjects of such clauses don't count. So free-standing pronouns are accusative, even when they're interpreted as subjects: Who did that? Me. On the other hand, the subjects of "present subjunctive" clauses, which are finite but nevertheless have base-form, rather than finite, verbs, are still nominative: I demand that she be chair

This rule would, however, predict nominative case in the inverted motion examples, and agreeing (rather than default) verb forms would go along with that: *Here come I. Oops.

(Notice the contrast between this inversion construction and the celebrated Subject-Auxiliary Inversion (SAI), which always has nominative subjects (and they are accented): Kim would object, as would I, *Kim would object, as would me.)

I can see two ways of describing what's going on here. One way is just to say that the inverted motion construction (or constructions) has accusative subjects. Accusative case is a stipulation, as it apparently is in the construction that poor me is an exemplar of. Stipulations happen, after all.

Another way is to analyze the inverted motion construction as having two parts, in a kind of setup/payoff paratactic arrangement also seen in some other constructions: Here's the problem: the frammis and The issue is: the virus and even What bothers me (is): their passivity. That is, the inverted motion construction has two immediate constituents, a setup consisting of a motional adverbial followed by a motion verb, and a phrase serving as the payoff. So long as the payoff phrase is not actually a subject (even though it's interpreted as the subject), the basic case rule would predict accusative case.

Yes, it's speculative, and it needs some filling out (just what is the grammatical function of the payoff?). But it's not entirely crazy.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:01 PM

Mountain Tidal Waves

An indication of the salience of 津波 tsunami in Japanese life is the fact that one term for landslide is 山津波 yamatsunami "mountain tidal wave".

Posted by Bill Poser at 12:34 AM

MLA Story

My favorite story about the Modern Language Association convention was in an article in the New York Times many years ago. I no longer have the citation, but I copied down the relevant bit:

Posted on a convention announcement board is an agitated adolescent scrawl:

Did your parents force you to come to MLA? Are you going stir-crazy, like me? Call if you are age 18 and just want someone normal to go sightseeing with!!
Red-inked below, in a firm professorial hand:
Watch punctuation. Vary sentence structure. Try not to end sentences in prepositions.

Posted by Bill Poser at 12:13 AM

December 27, 2004

Eggheads' Naughty Word Games

So goes the New York Times headline (p. B1, 12/27/04) for this year's pissy snarky story (by John Strausbaugh) on the Modern Language Association meetings. While admitting, "The convention has become a holiday ritual for journalists", the piece goes on to revel anyway in the wackiness of those "postmodernists, multiculturalists, feminists and queer-theory advocates", piling on astounding titles, especially the sexy ones, and summarizing things as follows: "The association has come to resemble a hyperactive child who, having interrupted the grownups' conversation by dancing on the coffee table, can't be made to stop." (This is not a quotation; this is the journalist's own voice. It's not labeled as opinion, but, hey, this is a feature story, so I guess anything goes.)

Concluding sentence, of a story with 24.25 column-inches of text: "And yes, many believe that the press is encouraging them [the freak show contingent of the MLA] by continuing to pay attention."

I know, I know, the editor made him do it. (But I think that he secretly enjoyed it, and lord knows it's an easy story to write.)

By the way, other papers picked up the story; the pared-down Palo Alto Daily News version ran (on page 1) under the nutty head "Language purists, rebels face off".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:24 PM

Meaning and Necessity

Heard today on NPR, one for the "contingent fatalities" file: "Tsunami is a Japanese word for a good reason."

Posted by Geoff Nunberg at 04:48 PM


James Fallows has an article in yesterday's NYT business section under the headline "At I.B.M., That Google Thing Is So Yesterday". He's talking about UIMA, which I've heard pronounced as "weema", and which stands for Unstructured Information Management.

If you're interested in more, there's a whole issue recent issue of IBM Systems Journal, 43(3), entitled Unstructured Information Management:

Unstructured information represents the vast majority of the data collected and accessible to enterprises. This data may be in various formats and may lack the organization of traditional sources such as database records. Exploiting this information requires systems for managing and extracting knowledge from large collections of unstructured data and applications for discovering patterns and relationships. This issue presents eight papers on the tools, methods, and architectures which are evolving for managing unstructured information in areas such as life science and market research.

Here's an IBM diagram that lays out what this is all supposed to do:

On 12/16/2004, IBM posted to alphaWorks its Unstructured Information Management SDK ("Software Development Kit"), from whose User's Guide the previous picture came:

Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, etc.) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies. IBM's UIMA is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities and the linking of them to structured information services, such as databases or search engines. The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications.

More specifically

UIMA is an architecture in which basic building blocks called Analysis Engines (AEs) are composed in order to analyze a document. At the heart of AEs are the analysis algorithms that do all the work to analyze documents and record analysis results (for example, detecting person names). These algorithms are packaged within components that are called Annotators. AEs are the stackable containers for annotators and other analysis engines.

Unfortunately, the downloadable stuff is just a development framework -- no interesting Analysis Engines or Annotators are supplied. IBM's framework would be more likely to be widely adopted, instead of various emerging (partial) alternatives, if at least a basic set of analysis methods (and procedures for training new ones) were provided.

Another IBM banner recently raised is Autonomic Computing, featured in another recent issue of the IBM technical journal:

The development of autonomic computing will make systems capable of self-configuring, self-healing, self-optimizing, and self-protecting, analogous to the abilities of living organisms with autonomic nervous systems. In this issue, an overview, 15 papers, and the Technical Forum present concepts, directions, and current work in the evolving research on autonomic computing for such areas as systems architecture, server infrastructure, systems management, security, service, applications, and the effect on users. This issue is an initial contribution to the creation of a body of literature on autonomic computing.


Posted by Mark Liberman at 02:48 PM

Irish on the Rebound?

Tom Hundley of the Chicago Tribune reports some good news about Irish, a language that has long been giving way to English. Although Irish is a required subject in Irish schools and since 1922 has been the national language of Ireland, outside of the small area known as the Gaeltacht, whose population is only 83,000 (1991 census), few people actually used Irish in their daily lives. Even in the Gaeltacht over two-thirds of Irish speakers use the language less than once a week according to the Central Statistics Office. However, the number of people reporting themselves as Irish speakers rose 9.8% from 1.43 million to 1.57 million between 1996 and 2002, and over the last twenty years, the number of Irish-medium schools has increased tenfold. According to Padhraic O Ciarda, an executive at Irish-language television station TG4, over the past decade it has become cool to speak Irish. Irish has gone from being the language of the poor, rural, and backward, to being a symbol of the new, modern, prosperous Ireland. Indeed, here's a website promoting cultural tourism in the Gaeltacht. In addition to the usual sorts of tourist activity, they offer Irish language instruction.

I've learned to be skeptical of reports of good news about endangered languages, but Irish may be recovering.

Posted by Bill Poser at 12:14 AM

December 26, 2004

Ginger Snap

In my post on gingerly, I suggested that one curiosity of the adverbial use is people's reluctance to back-form an adjective ginger. Not so fast, emails Charles Belov, who reports that a Google search on ginger steps -- which of course I should have done myself before issuing my pronouncement -- turns up a number of instances of an adjectival use:

In the Night Kitchen... marks Santen's first solo foray, taking a few ginger steps away from his longtime collaborative project Birddog.

Lack of walls and throngs of gawking passengers attracted a host of children, initially in curiosity and later with hesitant, ginger steps into participation.

The songwriting is studied and careful, guitars taking ginger steps through the melodies.

Since this use of ginger is considered obsolete by the OED, these instances suggest a re-invention via back-formation rather than a survival of the old word. And for these writers, gingerly is pretty clearly just a common-or-garden adverb formed in -ly. But the adjective is still relatively rare. A Nexis major papers search on "gingerly way," "gingerly fashion," or "gingerly manner" turns up 124 hits; a search on the same phrases with ginger turns up just one, from the Washington Post (2/15/1988):

In a very ginger way on Iowa caucus night, just before a commercial, Rather told viewers, "Vice President Bush declined our request to be interviewed on this broadcast."

It seems fair to conclude that for most people who use gingerly as an adverb, it has the distinction of being the only manner adverb formed in -ly (well, make that the only one I can think of) that isn't formed on an adjectival stem -- that is, unless you assume as I do that the word is really a haplologized version of gingerlily. Not a reason to condemn it out of hand, but I'm still going to give it a pass.

Posted by Geoff Nunberg at 01:32 PM

About half

According to an AP Newswire story on the recent midwest storm,

'They're about half-scared to drive fast today,' Kentucky state trooper Barry Meadows said.

The American Heritage Dictionary says that half as an adverb means

1. To the extent of exactly or nearly 50 percent: The tank is half empty. 2. Not completely or sufficiently; partly: only half right.

So did trooper Meadows mean that drivers were scared to the extent of 50%? or that they were not completely or sufficently scared? We'd have to ask him, but my bet is that he meant to let us know that drivers were like, really scared. In support of this view, the same AP story mentions 5-foot snowdrifts, "more than 100 stranded travelers ... rescued from their snowbound vehicles", hundreds of abandoned cars, sections of interstate being closed, and so forth.

There's another case of a modifer based on half used as an intensifier. The OED notes the use of not half to mean "extremely, violently" as well as "to a very slight extent":

3. not half: a long way from the due amount; to a very slight extent; in mod. slang and colloq. use = not at all, the reverse of, as ‘not half bad’ = not at all bad, rather good; ‘not half a bad fellow’ = a good fellow; ‘not half long enough’ = not nearly long enough; also (slang), extremely, violently, as ‘he didn't half swear’.

I guess that "didn't half" became an intensifier by (partial?) conventionalization of ironic understatement. In the case of "about half", something else seems to be added to irony and/or modesty. Maybe there's a bit of uncertainty about whether the modified phrase is exactly the right way to put it, comparable to Muffy Siegel's analysis of like as "used to express a possible unspecified minor nonequivalence of what is said and what is meant". And the basic meaning "partly" is always still lurking defensively in the shadows, ready to be called forward when socially stigmatized characteristics are being confessed or attributed to others.

In the Charlie Daniels song "Good Ole Boy", some negative self-evaluations are qualified with "little" and "about half"

I'm a little wild and a little bit breezy
Rollin 'em high and ridin' 'em easy
Hog wild and woman crazy
About half mean and about half lazy
But I know what I am
And I don't give a damn
Cause I'm a good ole boy

even thought the verses make it clear that the singer is proud of having pretty well pegged the meter on wild, breezy, mean and lazy. Well, maybe you've got to refer to another song to get the details on lazy. And there I'd say that Daniels wants us to interpret layin'-around-in-the-shade as a sort of countrified appreciation of the Second Noble Truth:

Poor girl wants to marry and the rich girl wants to flirt
Rich man goes to college and the poor man goes to work
A drunkard wants another drink of wine and the politician wants a vote
I don't want much of nothing at all but I will take another toke

In a 1977 novel by Dan Jenkins, "Semi-Tough", a couple of football players from Texas and their friends use semi as as a jokey substitute for colloquial half. The book was made into a semi-popular movie, which spread semi (used to mean "very") around for a while.

This may remind you of an old controversy over another scalar predicate: if a poor performance is half-assed, is a good performance no-assed or full-assed?


Posted by Mark Liberman at 09:50 AM

December 25, 2004

Gingerly We Roll Along

"It was a gingerly first step," The New York Times' Erik Ekholm and Eric Schmidt wrote on December 24, in a page one article about the return of some residents to Falluja (or "war-ravaged Falluja," to give the city its official name). The sentence caught my attention merely because it used gingerly in what I always assumed to be the correct way, as an adjective. That's so rare as to be newsworthy -- if you do a Nexis search of the previous 50 instances of gingerly in the Times, going back to July 1, 2004, you find it used an adverb every single time:

... a part of its anatomy that, as Mr. Benepe put it gingerly, ''separates the bull from the steer.'' (12/21/04)

...his chubby right foot pressing gingerly into her wrist... (12/20/04)

A couple of cleaning women in white uniforms stepped gingerly around the large potted palms... (12/19/04)

Everyone slurped, adding the sauces or not, gingerly tossing in more bean sprouts for crackle. (12/19/04)

...behavioral economics, which is gingerly stepping away from the economists' orthodoxy that humans are eternally rational... (12/19/04)

Maybe I should throw in the towel on this one, I thought, but then began to wonder whether there was ever actually a towel for me to be holding in the first place.

In defense of the usage, gingerly began its life as an adverb. It was formed from the adjective ginger, "dainty or delicate," and the OED gives citations of its use as an adverb right up to the end of the 19th century -- the adjectival use appeared in the 16th century. And unlike most other adjectives in -ly, like friendly or portly, gingerly has an adverbial meaning, so that it can only apply to nominals denoting actions (like "step" in Ekholm and Schmidt's article); otherwise it requires a clumsy periphrasis like "in a gingerly way." Moreover, Merriam-Webster's exhaustive Dictionary of English Usage gives no indication that anybody has ever objected to the use of the word as an adverb.

But the adjective ginger has been obsolete for a long time, and it's notable that nobody is tempted to back-form it anew, as in "his ginger handling of the question," which is what you'd expect if the adverbial gingerly were really analyzed as composed of the root ginger plus the derivational suffix -ly.

What we seem to have here, rather, is a haplology (or "haplogy," as some linguists can't resist calling it), the process which gave us Latin nutrix in place of the predicted *nutritrix and which leads people to say missippi instead of mississippi. Gingerly is just the way the mental lexicon's gingerlyly comes out on the tongue or the page. That's natural enough, but there's something to be said for insisting that the word be used as an adjective, as one of the small obeisances we make to the capriciousness of grammar. So, kudos to Ekholm and Schmidt, one each.

Posted by Geoff Nunberg at 03:32 PM

The law of the excluded bowling alley

Thomas Shannon said that "[y]esterday was one of the worst days of my life". That's because the Bloomberg wire service reported that Shannon's bowling alley chain was partly owned by Yasir Arafat, or at least by "Palestine Commercial Services Co., a Ramallah, West Bank-based holding company" controlled by Arafat through his financial advisor Mohamed Rachid.

The AP wire service story quotes Shannon as saying "We don't choose to be affiliated with any political-based organization, especially one that may or may not have ties to things we find absolutely abhorrent."

There's an interesting point of interpretation here. What Shannon means, I guess, is that the question of whether the PA has ties to abhorrent things is a serious one, worth thinking about; or maybe he means to assert that such ties exist, while weakening the assertion so as to avoid offense or lawsuits. Weasel words like "may or may not [have property P]" are often used to flag uncertainty while still getting the idea out there. This is a perfect way to spread a rumor: "Madonna may or may not be pregnant".

Logically, this phrase offers perfect deniability while transmitting no information whatever. That's because it's necessarily true, no matter what P is. At least, "it's possible that P or it's possible that not P" is a theorem in just about anything worth calling a modal logic (given the "law of the excluded middle", which says that "P or not P" is always true, and the fact that if P is true, then P is possible, at least for alethic modality). So strictly speaking, I myself may or may not have ties to things Mr. Shannon finds absolutely abhorrent, and so may you. Not only Madonna, but also Geoff Pullum may or may not be pregant. In order to conclude that these statements are literally true, we don't need to learn anything more about my ties, your ties, Mr. Shannon's moral sentiments, or Geoff Pullum's physiological state.

There's a principle of relevance here. If you ask me why I don't trust X, and I say "well, I'm always uneasy about someone who may or may not be an embezzler", I'm communicating a fairly specific suspicion. Literally, it's true of every single one of us that we may or may not be an embezzler -- or drug dealer or a saint -- but the fact that I choose to raise the particular issue of embezzlement by X communicates something in and of itself.


Posted by Mark Liberman at 11:57 AM

December 24, 2004

Living, staying, resting, standing, being

The stay/live thing that Geoff Pullum started has morphed into Bill Poser's allusion to "J'y suis, j'y reste" as uttered by the commander of the French forces in the Crimean War, and that's caused me to think of Martin Luther's famously recalcitrant "Hier stehe ich" at the Council of Worms.

Now I'm trying to find out just what Luther said. No question that the man was as stubbornly unmoving as a hunk of granite, but there's some question about what words he used.

Look, I had a Lutheran childhood, before I traded up to the Anglicans and then trucked on out of church affiliation entirely, and what I was taught as a kid was that Luther locked his legs, gritted his teeth, and defiantly intoned: "Hier stehe ich, ich kann nicht anders" 'Here I stand/stay, I cannot do otherwise' (or something like that in English). But when I check sources, some of them quote it this way and some of them have Luther saying "Hier stehe ich und kann nicht anders" 'Here I stand/stay and cannot do otherwise' -- a small difference, one of stylistic choice, but not literally identical to the version I was taught. Everyone seems to agree that he went on to finish with the dramatically pious "Gott helfe mir, Amen".

Lord knows what the man actually said. There were no council stenographers taking down the proceedings. What we have now is, apparently, recollections from years afterwards. You have to suspect that the text got polished up some in the intervening years.

In any case, there's that stand/stay business with the German verb stehen.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:44 PM

Talking animals: miracle or curse?

There's a widespread European superstition that at midnight on Christmas Eve, animals are able to speak in human voices. There are some heartwarming treatments of this story, in which the gift of speech is God's thanks to the animals in the manger in Bethlehem, and the animals use this gift to help humans. However, in most of the versions of the story, overhearing the animals' midnight speech is very bad luck.

In some cases, the animals predict misfortune.

"Once upon a time there was a woman who starved her cat and dog. At midnight on Christmas Eve she heard the dog say to the cat, `It is quite time we lost our mistress; she is a regular miser. To-night burglars are coming to steal her money; and if she cries out they will break her head.' `Twill be a good deed,' the cat replied. The woman in terror got up to go to a neighbor's house; as she went out the burglars opened the door, and when she shouted for help they broke her head."

Again a story is told of a farm servant in the German Alps who did not believe that the beasts could speak, and hid in a stable on Christmas Eve to learn what went on. At midnight he heard surprising things. "We shall have hard work to do this day week," said one horse. "Yes, the farmer's servant is heavy," answered the other. "And the way to the churchyard is long and steep," said the first. The servant was buried that day week.

In these stories, occult knowledge is bad, and research beyond normal bounds creates not only unwelcome knowledge of misfortune, but also misfortune itself.

A story by Saki, Bertie's Christmas Eve, takes a different route to a similar end. Listening for the miracle of animal speech brings misfortune, but this time it's because stereotypical properties are transferred across species in the opposite direction. Bertie, it seems, can be beastly.


Posted by Mark Liberman at 08:55 AM

Staying and living

Another quick follow-up on Geoff's note regarding stay vs. live: I think stay is used instead of (or at least synonymously with) live in at least some varieties of English. While riding BART from Oakland to Concord once (maybe 10 years ago), a 40-something African American man and I struck up a conversation. At one point the man asked me, "Where do you stay?" (Maybe it was "Where you stay?" -- I was too struck by the use of stay to remember that detail.) "I live in Walnut Creek," I replied, with contrastive stress on live to emphasize that I wasn't just passing through. "What about you?" The man replied, with no particular emphasis, "I stay in Oakland."

From the rest of our conversation, I gathered the man had lived in Oakland all his life.

[ Comments? ]

Posted by Eric Bakovic at 06:43 AM

J'y suis, j'y reste.

Geoff's discussion of the usage of stay with an address as complement reminded me of a joke. During the Crimean War when General Patrice de MacMahon, the commander of the French forces, was asked by Lord Raglan, the commander of the British forces, whether he could hold Malakoff fortress, he gave the famous response:

J'y suis, j'y reste.
"Here I am; here I stay." (The story is told in more detail here.) The joke is the mistranslation:
I'm Swiss and I'm spending the night.
This is probably the only funny thing about the Crimean War.
Posted by Bill Poser at 01:41 AM

More threats to language

As I noted here recently, advice givers often view slang as a threat to their languages. And again and again they finger another class of innovations as an agent of decay and decline: borrowings, especially from languages that they view as culturally intrusive (or invasive, or imperialistic, as the case might be). Case in point: official French hostility to English borrowings since World War II.

Sometimes the two types of threatening innovations come in a single package. That most dangerous of creatures, the street-speech borrowing.

Luis Casillas has sent me a particularly nice example of panicked response to such borrowings:

Spanglish, the composite language of Spanish and English that has crossed over from the street to Hispanic talk shows and advertising campaigns, poses a grave danger to Hispanic culture and to the advancement of Hispanics in mainstream America. Those who condone and even promote it as a harmless commingling do not realize that this is hardly a relationship based on equality. Spanglish is an invasion of Spanish by English. ["Is 'Spanglish' a Language", Roberto González Echevarría; New York Times, March 28, 1997;]

Casillas comments:

The nice thing about that quote is that the "threat" theme is developed in a more concrete manner than it often is, because the use of "invasion" casts it in nationalistic terms, which makes the cultural issue at stake quite clear. Further down:
If, as with so many of the trends of American Hispanics, Spanglish were to spread to Latin America, it would constitute the ultimate imperialistic takeover, the final imposition of a way of life that is economically dominant but not culturally superior in any sense.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:30 AM

December 23, 2004

You're only staying if you're not

In conversation last night my friend Jatin described an area in Philadelphia as "near where Professor Katz stays." I pointed out that he had apparently used the wrong verb: English does not use stay that way. The apartment in which Professor Katz lives cannot be referred to as the place where he stays. (Linguists are allowed to make this kind of remark about syntax and semantics whenever they like. Don't worry, it's not rude. It's like when you're watching birds with an ornithologist.) Jatin agreed with me immediately about the semantics: he knew how live and stay are used, but had made the slip anyway (it's an extremely common lexical error among non-native speakers of English). But it then struck me as we talked about it that the situation with this verb is very strange. The root meaning of stay is something like "remain". But you can only say you are staying at an address if you are not going to remain.

If you are going to remain there for the indefinite future, so it's your actual domicile, you can say you plan to live there. To say that you're going to be there for a fixed period but it won't become your regular domicile, you would say that you'll be staying there. You're only staying if you're not staying.

(This is the Standard English usage I'm describing, of course. Both Susannah Kirby and Eric Bakovic have pointed out that there are African American Vernacular varieties of English in which stay does mean "permanently reside", and at least three people have pointed out that the same is true of Scottish English.)

Curiously, the actual case at hand turned out to be a problematic one that hadn't occurred to me. The Professor Katz involved was Elihu Katz. It turns out that he currently spends exactly six months of the year in Israel and six months in Philadelphia, so it is completely unclear which place should be regarded as his permanent domicile (doubtless, his home is where is heart is). So that would be the one case in which it would be unclear whether to say he stays there or that he lives there.

Posted by Geoffrey K. Pullum at 05:51 PM

Blending in

The briefest of footnotes to Mark Liberman's observation that Geoff Nunberg's example "page-burner" is not just a common, or garden, malaprop, but an idiom blend:

Surely the current hot number in the world of blends involving noun-noun compounds is "X BE NOT rocket surgery", as in "Look, sentence diagramming isn't rocket surgery". A few thousand Google web hits. It's a bit more complicated than "page-burner", since the contributing compounds, "rocket science" and "brain surgery", are not themselves idioms (while "page-turner" and "barn-burner" are). Nevertheless, formulaic expressions figure in "rocket surgery" examples, since what's blended are the clichés "X BE NOT rocket science" and "X BE NOT brain surgery", both conveying 'X really isn't difficult'.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:40 PM

The Meaning of Kemosabe Jokes

Mark's discussion of Kemosabe jokes raises an interesting point. If one doesn't know the background and takes the jokes literally, they seem to support the notion that Kemosabe is offensive. But the very basis for the humor is the fact that one starts off with the assumption that Kemosabe is not offensive; the punchline is the (false) discovery that it is offensive after all.

Posted by Bill Poser at 11:38 AM

What kemosabe really means

The Lone Ranger was on the radio every week when I was a kid. The cry "A cloud of dust and a hearty ‘Hi-Yo, Silver!'" rang out often on the playground. And in third grade, I remember a rash of jokes about what kemosabe really means. The gloss in the punchline was never respectful or even friendly, so I sympathize with the notion that the term can be an offensive one.

In response to a Reuters story about the same Nova Scotia court case that Bill Poser discussed earlier today, a site called "The Jokester" has recently posted several Lone Ranger jokes. They are all suitably bad, but none offers a hypothesis about the meaning of kemosabe. My favorite Lone Ranger joke, back in third grade, was this one (also lacking a gloss for kemosabe). I think that others must have agreed, because its punch line "what do you mean 'we', white man?" seems to have become a proverbial tag line.

Gary Larsen immortalized the genre of kemosabe jokes with a Far Side cartoon in which kemosabe turns out to be "an Apache expression for a horse's rear end." And a few of the other "meaning of kemosabe" jokes that I remember from third grade are still rattling around the internets, for example here.


Posted by Mark Liberman at 09:44 AM


According to this CBC News article, the Supreme Court of Canada is being asked to decide on a complaint originally submitted to the Nova Scotia Human Rights Commission in 1999.

Dorothy Kateri Moore, a Mi'kmaq woman working at a sports store in Sydney, N.S., had complained that her boss, Trevor Miller, referred to her and other workers as "kemosabe" – the term used by the 1950s TV character Tonto, the Lone Ranger's sidekick, to describe the masked cowboy.
Moore said Miller told her the word meant "friend." But she claimed it was a racial slur and that its repeated use led to a poisoned work environment.
The Human Rights Commission and the Nova Scotia Court of Appeal have taken the position that kemosabe is not a slur and that Ms. Moore had no basis for complaint.

If the criterion is how the term is used by most people, it is pretty clear that they are right. The idea that the term is offensive is new to me. Tonto, the Indian, would hardly have used a pejorative term for Indian to refer to the Lone Ranger, his white partner. Nor is there any hint in the show that Tonto means the term in any way that is disrespectful. If anyone was slurring anybody in The Lone Ranger, it was the other way around: in Spanish tonto means "silly, foolish".

Nobody seems to know for certain what, if anything, Kemosabe was intended to mean. There's a good review of the possibilities that have been suggested here. It isn't offensive on any of the plausible suggestions. Both the script writer and the director thought that it meant "trusty scout".

Some people will argue that it doesn't matter whether a word is generally considered offensive or if the speaker intended it to be offensive: so long as the person hearing it was offended, that makes it offensive. The problem with this approach is that you never know what will bother someone so a speaker is always at risk of being accused of harassment. Even if culpability is limited to persistent use of the offending word after learning that it is offensive, problems remain. For instance, when dealing with a single individual it is possible to cater to his or her preferences, but what happens when one speaks to a group? A case in point is how to refer to black people. Everyone agrees that nigger and coon are offensive, but there is no such consensus about the terms black, African-American, Afro-American, and Negro. Many people have strong preferences, but in the absence of a consensus, there is no choice of terms that will please everyone. As far as I know, no satisfactory analysis of this question exists. That's probably a good reason for courts and human rights commissions to stay away from all but the clearest and most egregious cases.

Posted by Bill Poser at 02:10 AM

December 22, 2004

Do hybrid whales sing hybrid songs?

Yesterday's NYT had an article by Andrew Revkin about "a solitary whale, species unknown, that has been tracked since 1992 in the North Pacific by a classified array of hydrophones used by the Navy to monitor enemy submarines". Apparently experts are convinced that the sounds are made by a whale, but the pattern is different from that of any known species, and tracking information indicates that there is only one individual calling out this way. The article indicates that the "most likely" explanation is that the whale is "a hybrid of a blue whale and another species".

You can hear the "52 Hz 'whale-like' signals" for yourself at the Vents Program Acoustic Monitoring site.

The idea that a hybrid whales should sing hybrid songs may surprise some people, though it shouldn't. We don't really know why hybrid animals show a mixture of complex physical characteristics inherited from their parents, and it's no harder in principle to explain how a hybrid animal might show a mixture of complex behavioral characteristics such as species-specific vocalizations. And in the specific case of hybrid vocalizations, there's a well-documented precedent: gibbons.

Gibbons are arboreal apes who live in the tropical rain forests of southeast Asia. There are 12 species in four subgenera, and the subgenera are apparently more different from one another genetically than humans are from chimps.

Here's the "family tree" showing the relationship of gibbons to us and the other primates. The gibbons are the Hylobatidae.

Gibbons travel mainly by swinging through the treetops ("brachiation"). Like 90% of bird species but only 3% of mammal species, they're monogamous, pair-bonded animals.

According to a discussion of gibbon singing at Thomas Geissman's Gibbon Research Lab:

All species of gibbons are known to produce elaborate, species-specific and sex-specific patterns of vocalisation often referred to as "songs" (Haimoff, 1984; Marshall & Marshall, 1976). Songs are loud and complex and are mainly uttered at specifically established times of day. In most species, mated pairs may characteristically combine their songs in a relatively rigid pattern to produce coordinated duet songs. Several functions have been attributed to gibbon songs, most of which emphasise a role in territorial advertisement, mate attraction and maintenance of pair and family bonds (Geissmann, 1999; Geissmann & Orgeldinger in press; Haimoff, 1984; Leighton, 1987).

Among the apes, only gibbons and (ambiguously) humans have pair bonding. Also, only gibbons and humans sing. According to Charles Darwin's theory that human language developed from love songs, this is not a coincidence.

Apparently, gibbon duetting is initiated and dominated by the female's contributions:

The most prominent song contribution of female gibbons consists of a loud, stereotyped phrase, the great call. Depending on species, great calls typically comprise between 6-100 notes, have a duration of 6-30 s. The shape of individual great call notes and the intervals between the notes follow a species-specific pattern.

A female song bout is usually introduced by a variable but simple series of notes termed the introductory sequence; it is produced only once in a song bout. Thereafter, great calls are produced with an interval of about 2 min. In the intervals, [are] so-called interlude sequences consisting of shorter, more variable phrases … The typical female song bout hence follows the sequential course ABCBCBCBC…,

Guy gibbons are by nature laconic, but open up a bit as a duet proceeds:

As a rule, adult males do not produce great calls, but "male short phrases" only. Whereas female great calls remain essentially unchanged throughout a song bout, males gradually build up their phrases, beginning with single, simple notes. As less simple notes are introduced, these notes are combined to increasingly complex phrases, reaching the fully developed form only after several minutes of singing …

During duet songs, mated males and females combine their song contributions to produce complex, but relatively stereotyped vocal interactions… Both pair partners contribute to an introductory sequence at the beginning of the song bout (A). Thereafter, interlude sequences (B) and great call sequences (C) are produced in successive alternation…

During great call sequences the male becomes silent and does not resume calling until near or shortly after the end of the female's great call, when he will produce a coda.

Gibbon species are (sometimes?) able to cross-breed, and often do so in zoos. The hybrid gibbon offspring produce hybrid songs, which are a predictable amalgam of the songs of the parents' species. The picture below shows six sound spectrograms, top to bottom. Each spectrogram shows time on the x axis (for twenty seconds), and frequency (here equal to perceived pitch height) on the y axis.

The topmost spectrogram (a) is a typical female "great call" of the white-handed gibbon, Hylobates lar. The bottom spectrogram (f) is a typical female "great call" of the gray gibbon, Hyobates muelleri. The intevening spectrograms are hybrid great calls from hybrid female gibbons, (b) and (c) from H. Muelleri x H. Lar crosses, and (d) and (e) from H. Lar x H. Muelleri crosses. The two versions of each type of hybrid call were produced by unrelated hybrid gibbons at different zoos:


You can learn more by reading the paper from which I took this illustration (Thomas Geissmann, "Gibbon Songs and Human Music from an Evolutionary Perspective", fig. 6), and you can listen to many sound samples at Thomas Geissmann's site.

Is there any analogy to human language here? If so, it must be a very abstract one, because there do not seem to be any connections at all between human genetic differences and human linguistic differences.


Posted by Mark Liberman at 10:47 AM

A threat to the English language

Advice about English usage -- don't use X; use Y instead -- is often couched in terms of the threat to the language presented by X. The threatening Xs are usually innovative, non-standard, or informal in style, so slang (which is informal, and might also be innovative or non-standard or both) is especially likely to be seen as the ruination of English.

We're used to hearing such cries of panic from people who make some kind of living out of giving advice about English usage. But, probably as a result of the advice-givers' alarms, ordinary people express concern too.

Now it turns out that this concern has spread around the world, as far as Vietnam, or so it would appear from the January 2005 issue of Harper's Magazine, in which a column ("Good Question, Vietnam", p. 26) offers selected "questions submitted by Vietnamese people to the U.S.-Indochina Educational Foundation for its 'FAQ About America' project." Remarkably, the threat of slang finds its way into the list.

The questions cover some basic information ("How many people in the U.S.A. like to drink Coke?" and "What is Hollywood?"), cultural matters ("When did your culture form?" and "Why do many Americans like to be single nowadays?"), and many political issues ("Why are American presidents so bellicose?" -- by the way, I assume the questions are translated from Vietnamese into English -- and ""What do Americans think about Communists?"). And, in question 14 of the 20 listed, a touching concern for our language:

Do you think using an excessive amount of slang will gradually destroy the beauty of the English language?

Note the suggestion that English speakers use an excessive amount of slang. Where might some Vietnamese have picked up that idea?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:40 AM

December 21, 2004

All's fair in love and REDACTED

The documents that the ACLU's FOIA suit brought out on Monday, discussed in a NYT article today by Neil Lewis and David Johnson, are full of sections that have been omitted for national security (or in some cases perhaps privacy?) reasons:

“Of concern, DOD interrogators impersonating Supervisory Special Agents of the FBI told a detainee that REDACTED. These same interrogation teams then REDACTED. The detainee was also told by this interrogation team REDACTED. These tactics have produced no intelligence of a threat neutralization nature to date and CITF believes that techniques have destroyed any chance of prosecuting this detainee. If this detainee is ever released or his story made public in any way, DOD interrogators will not be held accountable because these torture techniques were done the “FBI” interrogators. The FBI will be left holding the bag before the public.”

I suppose that the documents were supplied in paper form, with the omitted sections blacked out. At least, that's what been shown in past facsimiles of similarly released documents. But in the version on the web site, the missing stuff is replaced by all-caps REDACTED, as in the paragraph above.

This has become standard usage, I think. However, oddly enough, dictionaries don't seem to recognize this usage in any specific way.

The American Heritage says that redact means

1. To draw up or frame (a proclamation, for example). 2. To make ready for publication; edit or revise.

Merriam-Webster's 3rd Unabridged has

1 obsolete : to lower in condition or quality : REDUCE *being a little prodigal in his spending, redacted his estate to a weak point— Robert Monro*
2 [back-formation from redaction] a : to put in writing : make a draft of : COMPOSE, FRAME *council of ministers ... engaged in redacting the two proclamations— W.G.Clark* b : to select or adapt for publication : EDIT, REVISE *historical accounts redacted for the modern reader*

The Oxford English Dictionary gives two senses "in modern use":

a. To draw up, frame (a statement, decree, etc.).
b. To put (matter) into proper literary form; to work up, arrange, or edit.

These various glosses are not inconsistent with the usage to mean DELETED ON PURPOSE, but they don't give the reader any reason to expect that redacted would be used in such cases, as opposed to (say) edited or deleted or removed.


Posted by Mark Liberman at 11:38 PM

Higher-order whammying

From an 12/21/2004 NYT article by Cordelia Dean on heat-tolerant algae in coral reefs:

About half the reefs that were left badly damaged after the 1997-98 El Niño event have bounced back, Dr. Baker said. "So even from a major event there can be recovery," he said, if overfishing, habitat destruction and other threats are mitigated. "But there can't be recovery if we triple and quadruple whammy these reefs."

Whammy started life as a noun meaning "an evil influence or hex", apparently derived from wham "blow". The OED's earliest citation for whammy is about baseball:

1940 J. R. TUNIS Kid from Tomkinsville x. 151 Interest round the field now centered in the Kid's chances for a no-hit game... On the bench everyone realized it too, but everyone kept discreetly quiet on account of the Whammy. Mustn't put the Whammy on him!

and the second one is this from Al Capp's Li'l Abner in July of 1951:

Evil-Eye Fleegle is th' name, an' th' ‘whammy’ is my game. Mudder Nature endowed me wit' eyes which can putrefy citizens t' th' spot!.. There is th' ‘single whammy’! That, friend, is th' full, pure power o' one o' my evil eyes! It's dynamite, friend, an' I do not t'row it around lightly!.. And, lastly--th' ‘double whammy’--namely, th' full power o' both eyes--which I hopes I never hafta use.

Evil-Eye Fleegle seems to have been the source of most Americans' experience of whammies, and especially quantified whammies. Apparently he later advanced to the quadruple whammy, which "could melt a battleship but almost kill Fleegle himself", though it's not clear to me what the anatomical correlates were for whammies above order 2.

There's clearly been some whammy inflation since 1951, since the double whammy has become by far the commonest whammy, according to Google:

Phrase web hits
on Google
"single whammy"
"double whammy"
"triple whammy"
"quadruple whammy"
"quintuple whammy"

The verbal form used in the NYT quote ("if we triple and quadruple whammy these reefs") doesn't make it into any of the dictionaries that I've checked, but it's definitely out there:

The main thrust of the Commission's proposals seek to 'double whammy' current Tier 2 assisted areas such as Wigan.
We're getting double-whammied by a policy producing a trade deficit approaching $50 billion per month, or 6% to 7% of gross domestic product. .
Long term investors look for these opportunities and they relieve other investors of their investment that has double-whammied them.\
There is a pattern emerging that I'm betting against all the teams I bet the under on so if I'm wrong it will double whammy me.
"His daughter was almost 'double-whammied' last week by a jealous neighbour" was another rumour.
As one of their reps once double whammied "you can tell it's a Linn playing before you even go in the room".

Some of these uses are little strange -- "as one of their reps once double whammied"? -- but for the most part, you can read past them without even noticing. That's certainly the case for the phrase "if we triple and quadruple whammy these reefs" in the cited NYT article. It's interesting how different faith is in this regard. Verbal uses of faith remain quite striking, at least to me.


Posted by Mark Liberman at 09:27 AM

December 20, 2004

The way the cookie bounces

Neal Whitman at Literal Minded points out that "page-burner" is not a typical old-fashioned malapropism, since it doesn't involve simple subsitution of one word for another. Instead, there's a sort of exchange of parts between two similar phrases, "barn burner" and "page turner", where "A B-er" and "C D-er" are hybridized to form "C B-er" (and perhaps "A D-er").

This exchange process has been studied experimentally by J. Cooper Cutting and Kay Bock, who used the naturally evocative term idiom blend rather than malapropism in referring to it: "That's the way the cookie bounces: syntactic and semantic components of experimentally elicited idiom blends", Mem Cognit. 1997 Jan;25(1):57-71. Neal cites a 1/2004 post by Justin Busch at Semantic Compositions, asking for the right terminology for similar cases, and (independently of Cutting and Bock) Neal suggests the term idiom blending.

Here's the Cutting & Bock abstract:

Idioms are sometimes viewed as unitized phrases with interpretations that are independent of the literal meanings of their individual words. Three experiments explored the nature of idiomatic representation with a speech-error elicitation task. In the task, speakers briefly viewed paired idioms. After a short delay they were probed to produce one of the two idioms, and their production latencies and blend errors were assessed. The first experiment showed greater interference between idioms with the same syntactic structure, demonstrating that idiomatic representations contain syntactic information. The second experiment indicated that the literal meaning of an idiom is active during production. These syntactic and literal-semantic effects on idiomatic errors argue against a representation of idioms as noncomponential lexicalized phrases. The final experiment found no differences between decomposable and nondecomposable idioms, suggesting that the lexical representation of these two types of idioms is the same.

Here's a reference that explains what they mean by decomposable vs. non-decomposable. I suspect that their use of the term "blend" derives from what Dwight Bolinger called syntactic blends (Syntactic Blends and Other Matters," Language, 37 (1961), 366-381), though Bolinger doesn't discuss idiom blends as such in the cited article. I dimly recall that he has written about idiom blending in some other place, but a brief search didn't find it, and I might be wrong. Other relevant examples can be found in the list of Farberisms on the Icon site, many of which are idiom blends ("he flipped his cork", "that's a different cup of fish"), while others are just malaprops ("this report reads like a bleached whale", "he's as ugly as godzilla the hun"), or other sorts of modifications of fixed expressions ("let's solve two problems with one bird").

Unfortunately, Memory and Cognition is not yet available on line before 2001.


Posted by Mark Liberman at 07:09 PM

Microsoft dodges a booger bullet

There's been a lot of discussion about the recent decision by a WIPO arbitration panel to deny Microsoft's claim to the domain name The MSM articles that I've seen (such as this one in The Register) focus on the difference in sound between micro and moco, but there's a much greater gap in meaning. Is this a fight that Microsoft really should have wanted to win? Trevor at kalebeul has a relevant song, with a midi file so that you can sing along.

Posted by Mark Liberman at 09:05 AM

December 19, 2004

"... people who love me (be)cause of I've been able to bring them some joy ..."

In the commentary on today's Partially Clips, Rob Balder writes:

To my readers from Language Log, sorry for no apostrophe on "CAUSE" in the last panel. It was more of a graphic design issue. Don't stop linking me 'cause of that, 'kay?

No fear -- we'll even warn you if Lynn Truss is on the prowl!

Reading Rob's comment, I realized something about my own speech: I think that I've lexicalized cause of as a preposition, with a different pattern of usage from because of. If that's true, then perhaps the apostrophe should be omitted for linguistic rather than graphic design reasons, at least some of the time.

I have the expected pattern with clauses after because and noun phrases after because of, and not vice versa:

Leslie left because Kim arrived.
*Leslie left because of Kim arrived.
*Leslie left because Kim.
Leslie left because of Kim.

But for me, 'cause of works like since or after, introducing either a clause or a noun phrase:

Leslie left 'cause of Kim.
Leslie left 'cause of Kim arrived.

(This is in informal speech, of course).

Google confirms one end of this -- the use of 'cause of as a clause-introducer (examples found by searching for "cause of I"):

Now I am sad, cause of I can't help fight child molestors anymore.
The reason why I wrote KLIRA above caption is, cause of I had seen the Klira guitar with same headstock pattern and fretboard design.
Note that just cause of I have posted the programs here you have absolutely no right to pass them on.
Ahh, it ONLY has one coxaial digital input next to the RF input that I cant use cause of I dont have a ld/dvd player.

Unfortunately for my theory, similar examples are common with because of:

Regarding to the painting, I can't contribute with too much secrets because of I don´t use a specific standard or scheme of colours ...
I’m world famous, everywhere I go there are people who love me because of I’ve been able to bring them some joy from the movies I’ve made.
This is more because of I have found less and less time to devote to the numerous responsibilities I have ...
I think it will help [getting noticed at Pitt], because of I've been playing for a winning team.

The Oxford English Dictionary doesn't recognize the (be)cause of SENTENCE construction, but it gives both spellings cause and 'cause, and indicates that the apostrophe is a sort of new-fangled fashion in this case:

Since c 1600 often written 'cause; now only dial., or vulgar; also spelt cos, coz, cuz, case, etc.


Posted by Mark Liberman at 11:43 PM

Diagram this

Take a look at the complexity of this sentence from an article in The New Yorker (11/22/04, p. 62 of the print edition):

"We are world champions at lawmaking," Christine Ockrent, who has anchored the evening news on two channels, run the weekly L'Express, and, as she says, "seen everything," told me a few days after the law was signed.

That's a preposed direct quote ("We are world champions at lawmaking") followed by the rest of a clause headed by the verb tell (Christine Ockrent told me ___). The clause has an additional adjunct at the end a few days after the law was signed): a preposition phrase headed by after, containing a pre-head measure adjunct noun phrase (a few days) and a post-head passive clause complement (the law was signed). Attached to the subject of the tell clause (Christine Ockrent) is a supplementary relative clause (beginning who has...), and the predicate in that clause is a three-part coordination of past-participialverb phrases, the three head verbs being anchored, run, and seen. Within the third of the coordinate verb phrases is another supplement (supplements used to be called ‘parentheticals’), an as-phrase with clause complement (as she says). And the third of the three coordinate verb phrases is itself a direct quote, semantically within the scope of the verb say.

Just 37 words (counting L'Express as one, since this is English), but enough complexity to keep a syntactician busy for a quiet hour or two. I'm not entirely sure I could justify a complete structure for this sentence at all; it would certainly take half an hour to explain all the details. It is a mystery to me how we are able to read such sentences and understand them. Yet I doubt that any other reader of The New Yorker (other than perhaps Chris Potts) even noticed the sentence.

Posted by Geoffrey K. Pullum at 02:27 PM

December 18, 2004

Prescribe Away

It's always a little disappointing to have to say that I agree with Geoff P., not because he isn't much more often right than wrong (or for that matter than I am), but because the linguistic discourse is invariably more engaging when people leave a scrap of red meat outside his office door and quickly move out of the way. But Geoff and Mark are right to say that there's a difference between saying, say, "relationship is a verb" to suggest metaphorically that a relationships require constant work and saying "baptism is a verb" under the belief that verbs are "words that denote things that happen or are done," as Geoff puts it. Not that it isn't prescriptivism to condemn such usages, but it's prescriptivism of an entirely appropriate kind, the same way it's appropriate for economists to get on the case of conservatives who say that social security will become "insolvent" in 2019, the year the program begins to run a deficit. Linguists own the word verb; others are only borrowing it.

And while there's no easy way to know how often statements like "Baptism is a verb" are offered in a spirit of figuration rather than out of ignorance, it's clear that these metagrammatical howlers often testify to a deplorable level of general linguistic knowledge, which can sometimes cause real mischief in the world.

In a piece I wrote a few years ago for American Lawyer, I mentioned a decision by a Florida district court in a patent infringement case that turned crucially on the claim that the decoder key to a cable TV subscriber box was "not subject to revision or change." The court concluded that subject was used in the claim "as a verb (in the passive tense)," and identified the relevant dictionary sense as "to cause to undergo," as in "He wouldn't subject himself to any inconvenience." And on that basis, the court ruled that "not subject to change" meant that the decoder key could be changed but would not be changed. (See TV/COM International v. MediaOne of Greater Florida, No. 3:00-cv-1045-J-21HTS (M.D. Fla. Aug. 1, 2001)).

Judicial incompetence doesn't come much grosser than that: it's fair to say that someone who doesn't know how to read a dictionary entry has no business adjudicating cases that call for interpretation of language -- which is to say, damn near all of them. But courts are full of judges who have no more knowledge of grammar and meaning than the half-remembered dicta they learned at the end of Sister Petra's ruler. Let's by all means continue to flog these things, even at the risk of sounding like pedants.

Posted by Geoff Nunberg at 02:18 PM

December 17, 2004

Prescriptivism and folk linguistics

Readers will have noted that a battle has broken out among the normally collegial Language Log staff. I found Liberman and Nunberg the other day just about ready to dash glasses of chardonnay in each other's faces. Cries of "Scoundrel!" and "Nay, sir!" echoed down the hall in the gleaming office block we all recently moved into, Language Log Plaza in downtown Philadelphia. I had to pull them apart; they were frightening the secretaries. Basically Liberman is saying that the Christians who claim faith is a verb are utterly and preposterously wrong, and Nunberg in reply comes pretty close to calling Liberman a prescriptivist (them's fightin' words) for not just accepting their cliché as an honest piece of vernacular usage. Here's my take on the dispute about whether we should criticize such people (and Nunberg is right that there are a lot of them out there).

Let me begin by speaking in terms of a parable. There would be two possible criticisms of someone who claims that (for example) seeing is believing. To claim that they were using the language incorrectly, since is implies identity and the denotations of see and believe are in fact distinct, would be ludicrous prescriptivism of the worst kind. People have the right to use a familiar idiom with the interpretation they place on it (in this case, the intention is to express something like the claim that in ordinary situations one's visual apparatus supplies information that translates directly, effortlessly, and reliably into belief formation).

But suppose a philosophical view was starting to gain ground to the effect that there was in fact no difference between sensory perception and belief: that seeing something and believing it were literally the same thing, as a matter of metaphysics. That (or so I would claim) would be an insane metaphysical view; speaking up against it would be one's moral duty as an intellectual.

Everything turns on whether people really do think they are making claims about language use — whether a kind of folk linguistics has arisen that makes the gross error of equating nouns with things and verbs with actions. You can look at the cases Nunberg specifically points to and make up your own mind. But I think there are signs of people really being confused about the difference between a noun and the thing it purportedly names, between a verb and the action it purportedly names, and so on. In a world where people cannot tell active from passive clauses even where it is important to their argument, I think one could be forgiven for worrying that people really are blundering on elementary linguistic concepts. You be the judge. Here is another quote from Bible Food for Hungry Christians:

The Bible word "baptism" can be a noun or verb:

An excellent Bible example, and one that has actually spawned religious denominations, is in 1 Pet 3:21, where the Greek word "baptisma" is translated "baptism". Some religious denominations believe this verse teaches "baptismal regeneration", that the "ACT" of water baptism itself regenerates or makes a person a born again child of God.

The Greek word "baptisma", in 1 Pet 3:21 is a NOUN, meaning the THINGS SIGNIFIED BY BAPTISM, it is NOT A VERB as the English reader would naturally assume! Peter is saying that "baptism doth save us (is presently saving) ", meaning that the "things", or "Bible teachings", or "doctrines" CONCERNING baptism are now saving us. What are those things, or teachings that baptism signifies? We are buried with Christ, sins washed away, raised in newness of life, the great doctrines of soteriology, or salvation, these are the "things" now saving us, not the verb, the ACT of baptism! The ACT of water baptism is a beautiful ritual that outwardly PORTRAYS what God HAS ALREADY DONE for us. The REALITY is what God does, the RITUAL is what we do to publicly acknowledge what God has done.

Here technical issues of translation from Greek are mentioned, and the suggestion is made that a word denoting the act of baptism must be a verb in English. I see real conceptual chaos here: I get a real sense that the writer doesn't understand that baptism in English is never a verb lexeme, regardless of whether it denotes something that God does or the doctrines concerning what God does. I think it is reasonable to worry that people are not just using a familiar cliché when they say this sort of thing, they are spreading a myth (that verbs in English can be identified by determining whether or not they denotes things that happen or are done — a hopeless view of how to diagnose verbhood), and amplifying the confusions that have evolved out of it.

Pointing out the extreme degree to which even the educated public in the USA tends to be ignorant of even elementary technical facts about phonology and grammar and semantics is precisely what Language Log is all about (in its rare serious moments in between discussions of dude and critiques of Dan Brown's writing style and collecting of eggcorns and the other things we do for the sheer unadulterated fun of messing around with linguistic material). It is our métier; it our raison d'être; it's what the enormous endowment of the Language Log Foundation is devoted to, the reason we pay nearly $17 a year to rent the domain names that bring this great enterprise to you the public, the reason for our annual pledge drives.

Language Log contributors are almost uniformly of the opinion that judgments about what is a linguistic error have to be based on inference from actual evidence about linguistic behavior. What distinguishes prescriptivists from typical professional linguists is the utter contempt prescriptivists show for that principle. But that doesn't mean that Language Log has no business critiquing gross abuse of elementary linguistic terms. Faith and baptism are essentially never verbs (except insofar as you can sort of use almost any word as a nonce verb if you push it in the way that Mark alludes to: to add to his examples, you can Photoshop a picture, you can Toyota around, you can Tabasco your sushi up a bit if you want it spicier). And terror is indeed a noun (Jon Stewart is wrong); and London is not an adjective in London fog; and so on. These points are important. For Language Log, they are of the essence. We can't just say "Oh, people just talk that way" and leave it at that. So I'm with Liberman on this one.

Posted by Geoffrey K. Pullum at 10:29 PM

The economics of grading

A column by Ailee Slater, in the Oregon Daily Emerald of 12/6/2004, presents a striking argument against the method of grading now used in our educational system:

Personally, I have come to the conclusion that the University system makes absolutely no sense. Students pay teachers to educate us, yet they are then allowed to tell us how much we're learning. The whole situation seems akin to a boss paying her employee to clean toilets and the employee turning around and telling the employer how much she is or isn't happy with the cleaning job. If I'm paying someone to do my housekeeping, I'll be the one to tell the receiver of my hard-earned money exactly how well they did. Shouldn't it be the same with education?

Comments from various perspectives have been echoing around the blogosphere ( here , here, here, here, here, here, here, among others). My favorite, though, is this Partially Clips cartoon, created independently of Ms. Slater's essay:

What seems to bother my colleagues most about Ms. Slater's column is the analogy between teaching college courses and cleaning toilets. But she isn't the first to see knowledge as the result of cleansing:

The ancient tradition that the world will be consumed in fire at the end of six thousand years is true. as I have heard from Hell.

For the cherub with his flaming sword is hereby commanded to leave his guard at the tree of life, and when he does, the whole creation will be consumed, and appear infinite. and holy whereas it now appears finite & corrupt.

This will come to pass by an improvement of sensual enjoyment.

But first the notion that man has a body distinct from his soul, is to be expunged; this I shall do, by printing in the infernal method, by corrosives, which in Hell are salutary and medicinal, melting apparent surfaces away, and displaying the infinite which was hid.

If the doors of perception were cleansed every thing would appear to man as it is: infinite. For man has closed himself up, till he sees all things thro' narrow chinks of his cavern.

However, Blake failed to specify any technique for evaluating the degree of cleanliness of the doors of perception. Nor did he specify who should be responsible for the evaluation. His main prescription for educational advancement is clear, though, and must work to some extent, since it has been followed by students since the first universities were established about a thousand years ago.

I like the idea of a system in which there is only one formal grade -- call it "success" -- but students vary in how long it takes them to get it. For scheduling reasons, this is completely impractical in most settings, but it's essentially how the post-graduate system works. We don't formally evaluate the relative quality of PhDs -- you either get one, or you don't. Evaluation, comparative or otherwise, is informal, and works in both directions. Institutions and individual faculty are evaluated in terms of the quality of the students they turn out, just as new PhDs are evaluated in the job market in terms of the perceived qualities of their dissertation research and its presentation.

[Cartoon reference due to Geoff Pullum]


Posted by Mark Liberman at 10:00 PM

Mocosoft Saved from Microsoft

We've previously mentioned Microsoft's rather expansive notions of phonetic similarity, such as their claim that lindash will be confused with windows. They've just had a setback. An arbitration panel has denied Microsoft's demand that Mocosoft, a Spanish company, hand over the domain names and on the grounds that Mocosoft sounds so much like Microsoft that people are likely to be confused.

Posted by Bill Poser at 09:40 PM

O ye of little faithing

The idea that faith is a verb starts as a metaphorical turn of phrase, as Geoff Nunberg suggests, though a few folks take it literally:

(link) Blue faithed his way through his life ...
(link) Another way David faithed was to refuse the kings' armor and sword.
(link) Families need to learn how to “faith” problems as Nehemiah did.
(link) Thank you, Holy Spirit, for allowing us to experience and know what we have been faithing - THAT LIVING IS CHRIST!

A larger number don't fully understand the metaphor, I suspect, and just think of it as a way to express their belief that faith is an action or process rather than a static condition. However, in looking around the web, I noticed many examples that embody in linguistic form the idea of faith as a process, without involving verbal forms at all. A page on Faith Development has four of these:

Faithing is as much a verb or action as it is a noun.
Fowler states that a person comes to the activity of faithing through one’s community.
One of the problems with Fowler’s approach to faith development is that it describes a generic developmental faithing process regardless of the object of one’s faith.
An atheist goes through similar stages of “faithing” as a Christian.

The trick here is that -ing can be used to make process nouns out of verbs, but it also often is used to make process nouns out of nouns.

The OED explains that the original function of -ing1 is a

was to form nouns of action ... These substantives were originally abstract; but even in OE. they often came to express a completed action, a process, habit, or art ... and then admitted a plural; sometimes they became concrete ... By later extension, formations of the same kind have been analogically made from substantives ... and, by ellipsis, from adverbs ... ; while nonce-words in -ing are formed freely on words or phrases of many kinds, e.g. oh-ing, hear-hearing, hoo-hooing, pshawing, yo-hoing (calling oh!, hear! hear!, etc.), how-d'ye-doing (saying ‘how do you do?’); ‘I do not believe in all this pinting’ (having pints of beer).

Thus pinting is a noun for a process associated with pints (i.e. drinking them in company), not a noun derived from some hypothetical verb to pint; likewise faithing seems generally to be a noun for a process associated with faith, not really a noun formed from the possible verb to faith.

As evidence for this analysis, I'll note that the cited page on "Faith Development" includes 64 instances of faith used as as a noun, along with 4 instances of faithing used as a noun, and no instances of faithing as a verbal participle or faith as a verb. Similarly, the great majority of examples of faithing on the web seem to be process nouns of this same kind.

The OED goes on to explain:

The notion of simple action passes insensibly into that of a process, practice, habit, or art, which may or may not be regarded as in actual exercise; e.g. ‘reading and writing are now common acquirements’; so drawing, engraving, fencing, smoking, swimming. Words of this kind are also formed directly from ns. which are the names of things used, or persons engaged, in the action: such are ballooning, blackberrying, canalling, chambering, cocking (cock-fighting), fowling, gardening, hopping (hop-picking), hurting (gathering hurts), nooning, nutting, sniping, buccaneering, costering, soldiering, and the like. [emphasis added]

Faith as "process, practice, habit or art" is the notion that the theorists of "faithing" are aiming for, rather than what the OED describes as

...formations in -ing from substantives without a corresponding verb; esp. in industrial and commercial language, with the sense of a collection or indefinite mass of the thing or of its material; as ashlaring, coping, cornicing, costering, girdering, piping, scaffolding, tubing; bagging, quilting, sacking, sheeting, shirting, ticking, trousering.

In any case, the grammatical category of verb need not be involved in faithing. Nor is there necessarily any verbal faith in faithed -- as the OED explains, -ed has a long tradition of being added to nouns to mean "possessing, provided with, characterized by (something)":

-ed2 is appended to ns. in order to form adjs. connoting the possession or the presence of the attribute or thing expressed by the n. ... In mod.Eng., and even in ME., the form affords no means of distinguishing between the genuine examples of this suffix and those ppl. adjs. in -ed1 which are ultimately f. ns. through unrecorded vbs. Examples that have come down from OE. are ringed:--OE. hringede, hooked:--OE. hócede, etc. The suffix is now added without restriction to any n. from which it is desired to form an adj. with the sense ‘possessing, provided with, characterized by’ (something); e.g. in toothed, booted, wooded, moneyed, cultured, diseased, jaundiced, etc., and in parasynthetic derivatives, as dark-eyed, seven-hilled, leather-aproned, etc. In bigoted, crabbed, dogged, the suffix has a vaguer meaning.

Although such derivations have been normal since Beowulf, there was a brief attempt in the 18th and 19th centuries to ban them:

1779 JOHNSON Gray Wks. IV. 302 There has of late arisen a practice of giving to adjectives derived from substantives, the termination of participles: such as the ‘cultured’ plain..but I was sorry to see in the lines of a scholar like Gray, the ‘honied’ spring.
1832 COLERIDGE Table-T. (1836) 171, I regret to see that vile and barbarous vocable talented..The formation of a participle passive from a noun is a licence that nothing but a very peculiar felicity can excuse.

I like the idea of calling talented a "vile and barbarous vocable". I wonder why this bit of ignorant grammatical pontificating never caught on, while the equally ill-founded prescription against splitting infinitives did?


[Update 12/21/2004: John Cowan emails

I myself have used an "is a verb" sentence that I think is rather better than any of the ones you cite: "Viking is a verb". Now this is literally false, in that there is no verb *vike for viking to be the present participle of. Nor was it so even in Old Norse: the meaning of the root *vik-* is a little unclear, but the *-ingr* ending is clearly the PGmc *-ing* suffix meaning something like "one out of a group" and still surviving insweeting,darling (and revived by Tolkien in Beornings, the descendants of Beorn).

Nevertheless, what I mean to convey by this sentence is that there were no such people as "the Vikings"; a man might go in viking (to calque a little Old Norse) one year, and the next year return to the same location as a peaceful trader. The implication, then, is that "viking" is the name of an activity, even if not literally derived qua gerund from a verb.

OK -- but in that case, why not just say that "Viking is the name of an activity"?

John replies: "'Viking is a verb' is much more colorful and therefore memorable than 'Viking is the name of an activity', and neither one is literally true of English as she is spoke." ]

Posted by Mark Liberman at 09:36 PM

Begin arming Israel

Ours is a rather serious and academic household, Barbara's and mine, with interests in language, linguistics, and philosophy. It isn't all that often that one will find Barbara and me lying on the floor laughing till the tears roll down our cheeks. At least, not from reading about lexical differences between Biblical and Modern Hebrew in The New York Review of Books. But Amos Elon recently had us in this unusual state. Don't read on, though, if you are easily offended, or if doctors have specifically warned you against falling on the floor laughing.

When writer Amos Oz was a 12-year-old boy, we are told in Elon's review (NYRB 12/16/04, 22-24), he once sat with his father and his grandfather, along with other right-wing Israelis, in the front row at an event where a speech was given by Menachem Begin. Like most right-wing politicians of the time, Begin spoke a rather classical Hebrew, reminiscent of the Bible, not of the street. The front three rows were mainly intellectuals, but the people behind them, the great majority of the audience, were working-class immigrants to Israel from Middle Eastern countries, and they spoke the colloquial "street" Hebrew of the Jerusalem area. Now, it turns out that in biblical Hebrew, though not in the Jerusalem vernacular, the same word was used at the time for "weapon" and the male sexual organ. And in the vernacular, though not in Biblical Hebrew, the verb "to arm" (to slip someone your weapon, as it were) had acquired a new meaning: it was used to mean "to fuck". Says Elon:

Begin, a great orator, was attacking the readiness of the great powers to arm the Arabs.

In rising, melodic cadences Begin was, for most of those present, complaining that Eisenhower and Anthony Eden were "fucking" Nasser day and night. "But who is fucking us?" he asked in an outraged voice. "Nobody! Absolutely nobody!" A stunned silence filled the hall. Begin did not notice. He went on to predict that if he were to become prime minister everyone would be fucking Israel.

A pitter-patter of applause came from the Zionist scholars in the front three rows. Most of the audience, though, maintained a stunned and horrified silence. Only the 12-year-old Amos Oz was apparently unable to contain himself, and burst out in helpless laughter.

Same for Barbara and me, I'm afraid. Sorry. Normally we're serious people, but... (Can't write any more. Got the giggles just imagining it.)

Posted by Geoffrey K. Pullum at 09:06 PM

December 16, 2004

Arboreal hors d'oeuvres

One good malaprop deserves another. From "New flashlights are bright idea", Home & Design column in the Palo Alto Daily News, 12/15/04, p. 49:

We were on vacation and went on a moonlight walk with our wives at a nearby botanical garden... At the end of trail there was a beautiful opening in the canape and the sky was clear and overflowing with stars.

Hmm... Just what was in those hors d'oeuvres?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:05 PM

I seem to be a literalist

Boy, what literalists we linguists can be. No, faith isn't really a verb, as Mark notes. But can we cut people a little slack here? Or does descriptivism stop at grammatical metalanguage's edge? In the first page of Google results on the string "is a verb," I find people saying "News is a verb," "A program error is a verb," "Friendship is a verb," "Relationship is a verb," and, several times, "God is a verb." Condemn those as cliches, but not as category errors -- it's the same device that allowed Dwight Bolinger to say, in one of my favorite linguistic epigrams, "Words are not things, but activities."

Posted by Geoff Nunberg at 02:33 AM

As opposed to a barn-turner

Not an eggcorn, really, but just an old-fashioned malaprop -- a blurb for a recent All Things Considered review reads:

In his new novel about a global-warming information conspiracy, Michael Crichton gives us a 600-page "page-burner" bolstered by footnotes, charts and graphs. Reviewer Alan Cheuse reviews State of Fear.

I suspect they may have Michael Crichton confused with Salman Rushdie.

Posted by Geoff Nunberg at 01:58 AM

December 15, 2004

OFAC Relents

Last May we discussed the US Treasury Department's view that publication in the United States of material originating in countries embargoed by the United States is illegal and the effect that this has had on scientific publication. I am pleased to report that the Office of Foreign Asset Control has relented and has issued proposed new regulations [pdf] which explicitly license the publication of material originating in Cuba, Iran, and the Sudan provided that the author not be the government of the embargoed country.

Posted by Bill Poser at 06:36 PM

Faith, hope and charity -- all verbs?

When I claimed that faith is one of the few monosyllabic English nouns that have resisted being verbed, I should have hedged. In response to my observation that the OED's published citations for verbal faith end in 1605, Jesse Sheidlower checked Oxford's archives and emailed four unpublished citations from the 20th century:

1928 DJUNA BARNES Ryder (1990) xxxiv. 150, I spaded not because I faithed. I faithed me that Wendell was a large account, lost in a small and trifling balance.
1990 Los Angeles Times 19 Oct. 14 Says Charles, the vicar who has been de-faithed but not defrocked: `You didn't destroy my faith. It was already destroyed. You simply described the vacuum.'
1991 ARTERBURN & FELTON Toxic Faith iii. 57 You may have the most incredible, powerful, mature faith in the world, but if God has a different plan, you will not be healed. You can't `faith it' into a divine intervention that God knows might lead you away from Him rather than toward Him.
1992 Washington Post 18 Apr. B6 This week he received a phone call from a woman dying of cancer. She was filled with hope, not despair. `Instead of freaking out, she faithed out.'

Jesse added:

No clue what Barnes is talking about. This is apart from _faithed_ adj. 'having some (usually specified or evaluated) faith'. And, of course, this doesn't change the fact that OF COURSE _FAITH_ ISN'T A VERB.

I was wrong to be so categorical about the lexical category of faith, but I also agree with Jesse's implication that these citations don't really give us much new evidence on the subject.

Out of context at least, Djuna Barnes' "I faithed me" is certainly mysterious. Online reviews describe her novel Ryder as "[imitating] the style of Shakespeare, Chaucer and the Bible", and "written in arcane language that mimics the diction of Chaucerian and Restoration bawdy". The following quote from the book sounds this same note:

"Let thy lips choose no prayer that is not on the lips of thy congregation, for though it is not given to all men to pray alike, nor blame alike, it is not shown thee to know the difference in these matters. Therefore when thou dost ask for the mercy of God, do thou ask it as thy neighbour seems to ask it. And when thou art pitiful, be pitiful like thy sister and thy brother."

As Calvin put it, "verbing weirds language", and perhaps Barnes verbed faith in search of some Calvinist weirding. That's Calvin the 20th-century cartoon child, not Calvin the 16th-century resident of Geneva. Or maybe it's both: until I read Ryder, I'm not sure.

I'd argue that the second example, "de-faithed", is not reliable evidence for verbal faith. Expressions of the form de+NOUN+ed, meaning "deprived or relieved of (one's) NOUN(s)", can be freely created for just about any noun at all. If someone has had his headphones confiscated, he can be described as "de-headphoned" without giving us significant evidence that headphone is starting to be used as a verb in any general sense. No one happens to to have used "de-headphoned" within the ken of Google, but that's just an accident of history, it's Out There waiting to be born. Picking a few nouns at random, I find credible web citations for "de-trousered", "de-coffeed", "de-oranged", "de-cookied", "de-computered", "de-virused" and so on. This doesn't mean that trouser, coffee, orange, cookie, computer and virus have become verbs -- they might be, but this isn't good evidence.

The third example, "you can't 'faith it' into a divine intervention", is a genuinely verbal use of faith. But the authors' quotes flag this as a concocted usage that needs some special attention from the reader. And "faith it" has not caught on among religious writers to any significant extent -- the word string occurs frequently on the web, but all of the first 100 examples that I checked were things like

Finding my faith: it came in an envelope ...
Faith it is.
But without faith it is impossible to please him ...
What is the definition of faith? It's in the Bible ...
Knowing that while reason is somehow at the basis of faith, it is not the whole basis ...

The fourth example, "instead of freaking out, she faithed out", is as equivocal as "de-faithed". Intensive constructions of the form X out, X-ed out and X-ing out (with various meanings) are fairly common with words that are not otherwise used as verbs. Google has 25 hits for "charitied out", compared to 3 for "faithed out". This shouldn't lead us to conclude that charity is eight times more verbal than faith, but rather to doubt that either word is really very verbal at all.

Some other X out examples:

All truffled out.
The band launches into "The Passenger," the venue's giant disco ball flares up, and she dorks out completely.
Or will they fluke out completely and win, just to vex the odds-makers?
My comprehension sucks when it comes to nerded out geek language ...
He called when he got into town and we duded out to the max for a few hours.
I enjoy swimming the oceans, riding my BMX, travelling around, listening to jazz, and duding out with my buddies.
I think the people I know and read might be all a little electioned out at this point.
Clear Channel's all computered out. They call it progress.
We tried many a beer (although by the time we got to Redbones, Rose and I were feeling pretty beered out, and had pecan pie and vanilla custard instead),

In some cases (like spazz out?) the X out construction may result in the development of a verbal use. But the mere fact that a word is used occasionally in the X out construction doesn't mean much by itself -- and especially not when it's used in a parallel pattern like "... instead of freaking out she faithed out".

So in conclusion, any English noun can be verbed, but some are more resistant than others. Syntactically speaking, faith is just about as purely a noun as an English word can be. Theologically, you're on your own: this is Language Log.


Posted by Mark Liberman at 09:45 AM

December 14, 2004

Encore une fois, dude

Among all the things said and written over the past few days about dude, the one that interested me most was an observation by Céline at Naked Translations (French version here):

I've come across dude when subtitling American films/TV series, and I often chose to not translate it. I just don't think there is an appropriate equivalent in French. The best solutions I can think of, in some cases, are mon pote or mon vieux, but these two expressions just aren't as flexible in their meaning as dude, which can be used to express a variety of emotions. For a start, you couldn't use them on their own, they generally end a sentence : "Ça va, mon pote?" (are you ok, dude?), "Tu veux aller boire un coup, mon vieux ?" (fancy going for a drink, dude?).

Once a proofreader inserted a mec (bloke, dude, guy) in my translation. I just don't agree with that. I know that's what the dictionary says, but it just doesn't sound natural, I've never heard a French person use mec in the same sense as dude. As this word has only ever cropped up in my subtitling work, I have no qualms in not translating it when it is part of a sentence, as it is difficult enough to convey crucial information in the limited space allowed for subtitles. Besides, dude often indicates a certain level of familiarity between people, and this is conveyed on the screen. When it is on its own and used to express an emotion, I chose an equivalent interjection in French : for example, Ça alors ! (my goodness !) to express surprise, Tu plaisantes ? (you're joking ?) to express incredulity, etc.

One note that doesn't ring true: is there any speech community in which both fancy (for "feel like") and dude are current? And again, "my goodness" doesn't seem like a preferred expression of surprise in the dude demographic. And while I'm being fussy, it seems over-formal to me to retain are in "are you ok, dude?"

I'm not a speaker of dudish, so I could be wrong about all of these, which are really quibbles about the details in any case. The details matter, of course, but what I liked about Céline's post is the principle of coming at the problem of what words mean from the angle of how to translate them into another language. This is often enlightening, but never more so than when the word is mainly indexical of group membership and interpersonal attitudes.

In the end, though, it's a little bit surprising that dude hasn't been borrowed tel quel, especially in light of this other anecdote from Céline's blog.

[Update 12/15/2004: Lal Zimman emails that "There are indeed now speakers using both fancy and dude! Dude has made its way across the pond ..."

Lal adds that "As a native speaker of dudish (ou bien dudais,) I would confirm your intuitions about dude-users not also using phrases like "my goodness!", but I thought Céline's point was that rather than using dude, because French has no word that conveys the pragmatic meaning of cool solidarity, she uses words that convey the same secondary meanings, e.g. surprise, and then her parenthetic comments were a translation. So, I didn't think that she was implying that a dude speaker might have said the thing in the parentheses, but rather that the pragmatic meaning was partially the same. "

Well, my sociolinguistic judgments about French are next to non-existent, but FWIW ça alors doesn't strike me as a very dudish way to express surprise. Sites like this and this don't make this expression seem very promising as an index of cool youthful solidarity. ]


Posted by Mark Liberman at 08:44 PM

She's they until you acknowledge her

I found this very beautiful and subtle example illustrating the use of singular-antecedent they in a passage (here) written (or more likely dictated — this sounds like speech) by the distinguished BBC Africa correspondent Ofeibea Quist-Arcton (photograph here) for the BBC, which asked a number of foreign correspondents to write something about awful travel experiences in Africa (I take the liberty of adding underlining on the anaphoric pronouns I want to talk about):

One of my favourites is when you are sitting on the aircraft and you just happen to have a free seat next to you - and you think, "my goodness, I can lie down and sleep after one of those heavy assignments".

And then as you begin to relax, you see one of those huge West African traders, she could be from Ghana, Nigeria or Senegal or Togo, you name it.

Then you hear the footsteps coming down the plane boom, boom, boom and then you hear move over! Their 25 kg luggage is hurled onto your lap, their boom box is pressed against your shoulder.

You all but carry their clothes on your head, whilst of course this very, very determined woman, who is going to sit right there, tries to shove her stuff into the overhead compartment - but there is no room left.

So you end up carrying her stuff on your lap - and that's how your two-hour trip is going to end.

The first she has the antecedent one of those huge West African traders. At that point the trader in the remembered anecdote is a woman visible down at the front of the plane, waiting to get down the aisle to find her seat. The appropriate pronoun for a woman you can see is she. But our narrator still thinks she will be lucky and spend the flight beside an empty seat. Perhaps she closes her eyes to rest as the rest of the passengers board.

Then footsteps are heard ("boom, boom, boom"), and a voice ("move over!") is heard, and at this point (imagine you still have your eyes closed) the producer of the pounding footsteps and stentorian voice is an indefinite individual suitable to be referred to with singular they. The genitive form of this pronoun (their) is used three times (you can almost see Ms Quist-Arcton struggling to keep her eyes closed, to pretend that she's asleep and this isn't happening).

But the moment the phrase this very, very determined woman has been used, we are back in a situation where the grammatical demands of English call for the feminine pronoun: although I am always prepared to be surprised, my assumption is that you simply cannot say anything like *The woman said they were unhappy with they referring to the woman. (Compare if your partner says they are unhappy, which plenty of people would use to allow for partners of either sex.) So the final two pronoun references to the monstrous trader woman are forms of the feminine singular she (the genitive form, her).

This back-and-forth alternation between forms of she and forms of they and back again is unusual, and I'm not recommending it as the perfect style for carefully prepared serious prose. But it offers a beautiful glimpse the dynamics of pronoun choice. The use of singular-antecedent they is extremely subtle, and I'm not going to offer a hard generalization. But we see here that it is used when it is possible to imagine being mistaken about the sex of the referent or when the sex of the referent is indeterminate, for example, when neither details of the actual referent nor details of the noun that occurs as the linguistic antecedent make feminine or masculine pronoun gender a necessity.

So when a woman appears at the front of the plane and you can see she is a woman, you have to refer to her with she for pragmatic reasons. When you hear an unknown person approaching, they can be referred to using they, even if they get close enough to press their boom box against your shoulder. But the moment you acknowledge that you are going to be seated next to this person, and she is a woman, and you're going to help her by carrying fifty pounds of her cabin baggage on your lap, and you acknowledge the situation linguistically by referring to her with the word woman, the pronoun she has to be used from then on for grammatical reasons.

There is a subtle and beautiful system here. It is not to be dismissed with the idiotic sexist authoritarianism of Strunk and White's The Elements of Style (p. 60: "Do not use they... Use the singular pronoun... he), which so many Americans believe is gospel.

Quist-Arcton, by the way, has an extraordinarily clear, refined, impeccable, BBC British accent. As Mary Macfarlane of the University of York reminds me, it is often the case that British speakers are more comfortable with the English language as it is, and hence less prone to believe ill-informed prescriptive nonsense about what's "bad" or "wrong" in usage. Scores of literary citations can be given to show that singular-antecedent they is common in good writing as well as speech; but most of the literary sources are British; it is American writers who are more inclined to live in terror of the usage fascists.

Posted by Geoffrey K. Pullum at 12:54 PM

So which is it?

Earlier today, Mark quoted the author of Bible Food for Hungry Christians, who wrote:

[T]he English language is one of the least precise languages on planet Earth.

Earlier this year, I quoted the author of The Miracle of Language, who wrote:

[T]he essential reasons for the ascendancy of English lie in the internationality of its words and the relative simplicity of its grammar and syntax.

So which is it? Are these points of view contradictory, or simply about different aspects of English? To settle the question (or at least for my entertainment), I think that Richard Lederer (responsible for the blue quote) should have Robert T. Jones III (responsible for the red quote) as a guest on Lederer's public radio show, A Way With Words. The show now has a new co-host, Martha Barnette, who could be the moderator of this discussion. Wouldn't that be fun?

(Unfortunately, the only question this forum is even remotely likely to settle is which of these two "authors" is the bigger d-ck.)

[ Comments? ]

Posted by Eric Bakovic at 12:33 PM

Precision, expressivity and ambiguity

The author of Bible Food for Hungry Christians, whose arguments about the lexical category of faith I discussed in another post, was inspired in passing to insult the English language:

I have come to see that the English language is one of the least precise and expressive languages on planet Earth. Our English dictionary often contains 10 or more definitions for the same word. Just for example, what do I mean when I say "bark"; do I mean the noise from a dog? Is it a boat? To hurt your shin? The covering on a tree? To speak sharply and loudly? To verbally advertise?

I have to admit that bark is guilty as charged -- and don't forget that it can also mean to remove the outer covering from a tree, and a few more obscure things besides. But is English really "one of the least precise and expressive languages on planet Earth?" Boasian relativists and English chauvinists can unite in rejecting this idea.

There seem to be four different issues here:

  • homophony -- how often are different words or phrases pronounced the same way?
  • word-sense ambiguity -- how many significantly different senses does a word have? and
  • lexical category ambiguity -- how many different syntactic functions can a word-form have?
  • what is the impact of local ambiguity (of whatever kind) on precision and expressivity?

The last question is the most important one, and in my opinion the answer is "no impact, or perhaps a positive one". Local ambiguity has no necessary impact on precision because a skillful writer or speaker resolves ambiguity in context as needed. Local ambiguity may even have a positive impact on expressivity, because a skillful writer or speaker can choose to take advantage of alternative interpretations, as Margaret Atwood does in this short poem:

you fit into me
like a hook into an eye

a fish hook
an open eye

The invocation of alternative meanings is more often implicit, as in the political frame wars.

All languages have a certain amount of homophony. I've never seen a cross-linguistic quantitative study -- and it wouldn't be easy to frame such a study fairly -- but I'm skeptical that English is very far from the norm in this respect. I've done some quantitative studies of the degree of homophony in the English lexicon, and I'll come back to this in another post.

Chinese is by reputation much more (locally) homophonous than English. Thus one of the translations for bark in Mandarin is , which is pronounced fei4, i.e. fei with 4th tone. Fei4 in turn has 59 matches in the Unihan database, and 44 in CEDICT, including single-syllable words glossed as "anger", "boil", "lung", "small", "coarse, sandals", "to cost, to spend, fee, wasteful, expenses", "prickly heat", "abolish, crippled", "hamadryad baboon", "dam up water with rocks", and "fermium".

In terms of word-sense ambiguity, I'm also skeptical that English is much out of the ordinary. Bark's various senses and lexical categories have different translations into French, but each of those in turn has multiple different senses, generally not shared with English. For example, écorce has senses that come out in English as tree bark, fruit rind, the earth's crust, and the cerebral cortex. And I guess I should also point out that Greek pistis , which is where the whole thing started, gets variously glossed by Liddell & Scott as trust, faith, persuasion, confidence, (commerical) credit, guarantee, assurance, argument, proof, that which is entrusted, and political protection or suzerainty.

In terms of lexical category ambiguity, languages do differ in the extent to which their word-forms are specialized for syntactic function. English has relatively little inflectional marking, and many English word-forms are ambiguous as to lexical category, though faith is not one of them. But I've never seen a convincing argument that this dimension really means anything in terms of either precision or expressivity.

Note that I'm not denying that languages can and do differ in their relative amounts of homophony, word-sense ambiguity and lexical category ambiguity. However, I doubt that English text is unusually underspecified either by its pronunciation or by its standard spelling. I'm not convinced that local measures of ambiguity have significant overall implications for the precision or the expressiveness of texts as a whole. And both common sense and everyday observation make me doubt that English is really "one of the least precise and expressive languages on planet Earth."


Posted by Mark Liberman at 09:58 AM

Religious syntax

A Google search for {"faith is a verb"} returns 777 results, including a 1989 book by Arthur Stokes, a 1998 sermon by the Rev. Roberta Nelson, and another from 2001 by Pastor Bill Pevlor.

Situated among 14 "Insights from New Testament Greek", a page entitled "Bible Food for Hungry Christians" takes a contrary view:


Both sides of this argument are preposterous.

Let's look at a few examples (emphasis added):

But to him that worketh not, but believeth on him that justifieth the ungodly, his faith is counted for righteousness. [Romans 4]
Him that is weak in the faith receive ye, but not to doubtful disputations. [Romans 14]
To them that have obtained like precious faith with us through the righteousness of God and our Saviour Jesus Christ [Peter 1]
And though I have the gift of prophecy, and understand all mysteries, and all knowledge; and though I have all faith, so that I could remove mountains, and have not charity, I am nothing. [Corinthians 13]

Trust me, if you understand English, you're interpreting all these instances of faith as nouns. However, it's understandable that you might not realize this, if you learned about English grammar in the American educational system in recent decades, and were told that a verb is an "action word", while a noun is a name for a "person, place or thing". From this point of view, the question of whether faith is a noun or a verb becomes a theological one. Thus the Rev. Roberta Nelson sermonized that

Faith is not an estate to be attained or a stage to be realized. It is a way of being and moving, a way of being on a pilgrimage.

and therefore

It is something we do; it is an action word--a verb; it changes and expands; it involves action with others.

I won't comment on the theology of this position, but as linguistics, it's nonsense. Geoff Pullum discussed this same misunderstanding about what terms like noun and verb mean, when he tried to explain how Jon Stewart could possible assert that "terror [is] not even a noun". If you think that a noun is a word for a "person, place or thing", then you might come to the conclusion that terror is not a noun; if you think that a verb is an "action word", and your theology tells you that faith is an active process rather than a stable state, then you'll conclude that faith is a verb.

However, if faith were a verb, you should be able to faith God, or go out faithing, or step forward to get faithed. But no such phrases occur in English versions of the Bible, or in any other English text. Instead, we read about his faith, the faith, precious faith, and so on.The reason is simple: faith in English is always a noun, and never a verb. English readers may or may not grasp this consciously, but they know it in their linguistic hearts.

The author of Bible Food for Hungry Christians, Bob Jones III, is not as linguistically naive as the Rev. Nelson. He observes that

Many English words can be nouns or verbs, with the exact same English spelling. This can cause confusion, because the reader must decide from the context whether the writer is using the word as a noun or verb.

One of the many beauties of the Greek language of the New Testament is that the ENDING on the Greek word tells us the part of speech. Whether a word is a noun or verb is not up for grabs in the New Testament Greek, as it is in our English. You can look up any New Testament Greek word in The Analytical Greek Lexicon, by Zondervan, to see if it is a noun or verb.

and specifically points out that

Bark - The noise a dog makes, can be a NOUN OR a VERB! When we tell the dog to "bark", it is a verb which tells the dog to ACT, and when we describe the dog's "bark", it is a noun describing the "thing" that the dog did.

The example are correctly construed here. And what Rev. Jones says about the Greek word translated as English "faith" also seems sensible:

The common word for "faith" in the New Testament is the Greek word "pistis". This word is used 244 times in the New Testament, and it is a NOUN, not a verb!

But then he jumps to the conclusion that "the English reader tends to read [faith] as a verb", and concludes that this is because "the reader must decide from the context whether the writer is using the word as a noun or verb."

This is wrong. Context is not necessary in this case. There are not very many monosyllabic English nouns that have successfully resisted being verbed, but faith is one of them.

[Update: I have to confess a sin. I failed to consult the OED in writing this post; this is always a mistake, one for which I've criticized others. The OED does have an entry for faith as a verb, glossed as "a. intr. To place or rest one's faith on. b. trans. To provide with a creed or standard of faith. c. To utter upon one's word of honour. d. To give credit to, believe, trust."

In my own defense, I'll claim that the none of the citations will work in modern English (for example, 1430 LYDG. Chron. Troy I. vi, By whose example women may well lere How they shuld faith or trusten on any man, or 1553 N. GRIMALDE Cicero's Offices I. (1558) 10 It is called faithfulnes because it is fulfilled which was faithed [quia fiat quod dictum est]), and I'll note that the most recent citation is from Shakespeare's Lear in 1605: "Would the reposal of my thee Make thy words faith'd?"

In any case, I don't believe that any of the 100-odd instances of faith in the King James Version are plausibly construable as verbs. Nor do the word forms "faithed", "faithing" or "faiths" occur in that that document. ]


Posted by Mark Liberman at 09:50 AM

December 13, 2004


In response to yesterday's post on typed citation links, Stefano Bertolo emailed to draw my attention to Bibster. It's a Semantic Web project that combines the widely-used BibTex format for bibliographical records with the "napster"-associated idea of peer-to-peer searching.

The project is documented in Haase et al., "Bibster - A Semantics-Based Bibliographic Peer-to-Peer System", whose abstract reads

This paper describes the design and implementation of Bibster, a Peer-to-Peer system for exchanging bibliographic data among researchers. Bibster exploits ontologies in data-storage, query formulation, query-routing and answer presentation: When bibliographic entries are made available for use in Bibster, they are structured and classified according to two different ontologies. This ontological structure is then exploited to help users formulate their queries. Subsequently, the ontologies are used to improve query routing across the Peer-to-Peer network. Finally, the ontologies are used to post-process the returned answers in order to do duplicate detection. The paper describes each of these ontology-based aspects of Bibster. Bibster is a fully implemented open source solution built on top of the JXTA platform.

(Another version of the same paper seems to be available here).

The two ontologies in question are SWRC (the "Semantic Web Research Community Ontology"), which "models the semantic web research community (its researchers, topics, publications, tools, etc. and relations between them)", and the ACM Computing Classification System (often called "the ACM topic hierarchy").

I downloaded the Bibster application, installed, and tried an initial search for papers by Haase, the first author of the Bibster documentation cited above. The search took about three minutes and returned nothing. I'm not sure whether the application actually succeeded in finding any peers to query -- maybe now that the "case study" is done and the paper has been published, the project is inactive? The schedule on the Bibster web site says that it should continue

29.03.2004 Phase I will start
SWAP developers will work with the system. The network will consist of about 20 peers.
07.04.2004 Phase II will start
The Bibster partner team will join the case study. Additionally about 30 peers will join the Bibster test team. The Phase II partners will be completely supported by the Bibster team.
09.07.2004 Phase III will start
Bibster will be announced on several mailing lists and will become a public system.
30.09.2004 The official case study will be stopped. The logging will be cancelled and the collected data will be evaluated. Of course, the Bibster system itself will continue its work.

but this depends on the user base keeping it going. If Bibster peers are out there, the application running on my machine here doesn't seem to be able to find them.

Additionally, I will say that I don't think that the ACM topic hierarchy is very helpful as a framework for bibliographic search. My hope to check this by trying to use it in searching the computer science literature has been frustrated by my failure to find any Bibster peers to search over. However, I took three topics that I happen to be interested in right now, and tried to see where they would fit into the ACM CCS. In each case, my experience was the same -- the topic of interest to me seemed to fit, more or less uneasily, into several categories at once, leaving me skeptical that choosing such categories would do me much good in searching.

All in all, I was disappointed. There are many interesting ideas in the Bibster documentation, but it's not clear to me that (even if there were lots of peers out there) it would work better as a way to navigate in bibliographical space than Google Scholar, CiteSeer, Scirus or similar centralized search tools, which use the bibliographies of the indexed literature as a proxy for the distributed bibliographical databases that Bibster aims to take advantage of. I did not have a chance to experience the claimed benefit of the ontologies in query routing and duplicate detection, but my experience with trying to use the ACM topic hierarchy in query formulation was not an inspiring one.

So it looks to me like the Semantic Web is a still set of possible solutions looking for a problem -- see here, here, here, here here for prior discussions. And there's an interesting article by Dan Brickley entitled Nodes and Arcs 1989-1999, which puts the Semantic Web in a recent historical perspective, and in particular displays this figure from the original 1989 WWW proposal

— which reminds us that the web has always been about to be about semantic networks, but somehow it keeps turning out to be about (communication by means of text and pictures embedded in) structures with functions but no intrinsic semantics.

[Update 12/16/2004: Steffen Staab sent these comments by email:

There is an important trivial comment:
in fact the two rendevous servers have been down because of simple administration problems.
As Bibster is no longer a core task, we do not look for it every day....
(and actually the rendevous server is all about JXTA and not about Semantic Web at all)

The more interesting comment is about what is the use of ACM topic hierarchy?

Well, it is as bad as such a hierarchy goes.

a. It is too general for the expert.
b. It does not closely mirror new developments
c. It is not possible to unambigously assign bib items.

It is also as good as such a hierarchy goes.

a. The less proficient user (not one that does not know the domain) gets an idea how the domain could be structured.
b. The hierarchy allows for easier search through larger parts
(this only becomes relevant for more complex searches; actually this is a point where ontoprise earns money with!)
c. It is better to have a weak agreement than none.


a. you can still do keyword search! So, you are still as good as plain information retrieval.
b. you can use the social network structure. Knowing that a particular entry on semantic annotation is relevant for your fellow researcher is an implicit evaluation!
c. you can use Bibster to manage your bib entries!

Most importantly:
It really is a complete shift of the paradigm that some central server stores all the resources and raises many more potential than even Bibster already has.

Granted, all this needs much more software engineering (e.g. giving feedback about responsiveness of servers, leaving less CPU and memory imprint etc.).

Granted, this also must be able to attract enough attention in order to solve the chicken and egg problem that users go, where other users are (e.g. there were many marketplaces but ebay virtually hooks all auctioneers now).

Hence, I think the arguments you make in your blog don't hit the nail on its head.

The next couple of days are kind of busy for me, but I'll try Bibster again over the weekend and report back when I've had a chance to try it.

There are serious issues here about centrally-controlled vs. peer-to-peer search, the role of ontologies in search and in navigation of search results, etc., which deserve more thought and discussion. ]


Posted by Mark Liberman at 06:15 PM

Opening for a copy editor at the Associated Press

Emma Ross had an article on the AP wire Saturday about the nature and effects of dioxins. One of her editors did her a disservice by sending it out under the headline "What Are Dioxins and What Is Their Affect?" It ran that way in the Miami Herald, the Columbus Ledger-Inquirer, Newsday, the Seattle Post-Intelligencer, and for all I know in many other papers.

That should have been "What Is Their Effect?" As written, the headline seems to ask about dioxins' emotional state rather than about their medical consequence.

At least that's what the norms of contemporary standard English say, according to the American Heritage Book of English Usage:

Affect and effect are sometimes confused, but before you can sort them out, you must sort out the two words spelled affect. One means “to put on a false show of,” as in She affected a British accent. The other can be both a noun and a verb. The noun meaning “emotion” is a technical term from psychology that sometimes shows up in general writing, as in this quote from a Norman Mailer piece about the Gulf War: “Of course, the soldiers seen on television had been carefully chosen for blandness of affect.” In its far more common role as a verb, affect usually means “to influence,” as in The Surgeon General’s report outlined how smoking affects health.

 Effect can also serve as a noun or a verb. The noun means “a result.” Thus if you affect something, you are likely to see an effect of some kind, and from this may arise some of the confusion. As a verb, effect means “to bring about or execute.” Thus, using effect in the sentence The measures have been designed to effect savings implies that the measures will cause new savings to come about. But using affect in the very similar sentence These measures may affect savings could just as easily imply that the measures may reduce savings that have already been realized.

So the AP headline tries to use affect as a noun to mean "result" or "consequence". This is simultaneously a spelling error and a malapropism. It's an easy mistake to make, because the noun effect is pronounced with final stress and a reduced initial vowel, homophonous for many speakers with the verb affect (though with not the noun affect, which is pronounced with initial stress). The headline writer may have been further confused by the fact that the second sentence of Ross' article reads "This is what dioxins are and how they affect human beings".

The mistake is not only an easy one, but also a common one. The sequence "their affect" now has 398,000 web hits on Google, and when I check the first 40 current hits, I find that 39 of them are mistaken usages that should have been effect. So is this really a mistake? Hasn't Norma Loquendi spoken, and told the American Heritage Book of English Usage to sit down and shut up?

Well, it's a little more complicated than that. The phrase "their effect" has 1,390,000 ghits, so really what we've learned is that writers on the web use affect for effect about 1/3 of the time in the context "their ___', and that the "true" noun affect is so rare that it's outnumbered roughly 30 to 1 by these "mistakes". And in some contexts, the rate of "mistaken" substitution is much lower than one in three: "the affects" is about 79 times rarer than "the effects" (252K to 20.6M ghits).

The standard usages in this area are difficult to keep straight, and I suspect that the norms will not be maintained much longer, given that headline writers at the AP are abandoning ship, and are not being corrected by editors at papers like Newsday and the Miami Herald. All the same, violations of these norms will continue to be scorned by those who understand the old standards and try to maintain them. So as a public service, I'll try to explain those standards in terms of their history.

First, let's recap what's supposed to be correct, and why.

The verb affect most commonly means "to have an influence on or effect a change in". But there's no standard nominal form of affect with a corresponding meaning. Instead, affect as a noun means "feeling or emotion, especially as manifested by facial expression or body language", corresponding to the verbal sense "to act on the emotions of; touch or move".

On the other side, effect as a verb means "to bring into existence" or "to produce as a result", and it does have a corresponding noun form meaning "something brought about by a cause or agent; a result".

Is that perfectly clear? I didn't think so.

How did this mess come into existence? Well, it starts with a simple spatial distinction, basically to and fro, towards vs. away from. More exactly, Latin ad "to, towards" vs. Latin ex "out from the interior of". These Latin prepositions combined with the verb facere "do, make", to create the compound verbs afficere and efficere.

The Romans took afficere to mean "to do something to one, i. e. to exert an influence on body or mind, so that it is brought into such or such a state". So to + do = "do to". Fair enough. Lewis and Short note that the "influence" in question is "of the body rarely" and "more freq. of the mind".

The Romans took efficere to mean "to make out, work out; hence, to bring to pass, to effect, execute, complete, accomplish, make, form". So out (of) + make = "make out (of)", with "out" here having something like the sense it has in English outcome. Again, fair enough.

The past participles affectus and effectus got borrowed into English (perhaps with passage through French) as affect and effect. And the core meanings of the verbal forms are still close to the Latin originals: affect is "to have an influence on or effect a change in" -- still basically "do to"; effect is "to bring into existence" or "to produce as a result" -- still recognizably "make out (of)".

However, while the English verb affect has lost the Romans' preference for influences on body instead of mind, the noun affect has largely kept it, from Chaucer's time to the present:

c1374 CHAUCER Troylus III. 1342 And therto dronken had as hotte and stronge As Cresus did, for his affectes wronge.
1533 TINDALE Supper of the Lord Wks. III. 266 God is searcher of heart and reins, thoughts and affects.
1626 BACON Sylva §97 The affects and Passions of the Heart and Spirits, are notably disclosed by the Pulse.
1894 W. JAMES Coll. Ess. & Rev. (1920) 358 We may also feel a general seizure of excitement, which Wundt, Lehmann, and other German writers call an Affect, and which is what I have all along meant by an emotion.
1923 Wkly. Westm. Gaz. 24 Mar. 181 Their psychic lives are overfull of complexes, levels and affects.

In fact, it seems that this preference has gradually strengthened into an exclusivity. The OED gives as senses 5. "[T]he way in which a thing is physically affected or disposed; especially the actual state or disposition of the body" and 6. "esp. [a] state of body opposed to the normal; indisposition, distemper, malady, disease; ‘affection’", but the most recent cited examples of these senses are from 1679:

1563 T. GALE Antidot. II. 9 Very precious in burnings and scaldings and lyke affectes.
1616 SURFLET & MARKH. Countrey Farme 245 It is of great vse for the affects of the lungs.
1679 tr. Willis's Pharm. Ration. in Blount Nat. Hist. (1693) 112 Who presently after drinking Coffee became worse as to those Affects.

And even these (in my opinion obsolete) uses don't cover the offending AP headline "What Are Dioxins and What Is Their Affect?", where affect obviously means something like "consequence(s)" or "result", i.e. effect.

In the body of the AP dioxin article, Ross uses affect-the-verb or effect-the-noun four times, all of them correct according to the standards we've described:

This is what dioxins are and how they affect human beings:

Most of what's known about the health effects of acute doses comes from studies on animals.

Other effects from high dioxin doses in humans include decreased liver function, an enlarged liver and slight increase in blood fats, though both effects tend to be mild and short-lived.

So I bet that Ross knows the standard norms in this case, and is feeling mortified by the headlines under which her piece appeared.


Posted by Mark Liberman at 12:23 AM

December 12, 2004

A misattribution no longer to be put up with

[Guest post by Benjamin G. Zimmer]

Introduction: Ben Zimmer writes to me to point out that the old Churchill story about an editorial correction being dismissed as "nonsense up with which I will not put" is almost certainly a case of fake attribution. Famous people (especially famous men) tend to get notable sayings retrospectively misattributed to them. He makes a strong case that this is one such case. (I always thought the lack of documentation for this story in any serious works about Sir Winston was suspicious.) I decided to quote Ben's very interesting research (originally seen on alt.usage.english) in full for Language Log readers, as a guest post. Notice, as he goes on, the changing wording of the purported quotation. —Geoff Pullum

The earliest citation of the story that I've found so far in newspaper databases is from 1942, without any reference to Churchill:

The Wall Street Journal, 30 Sep 1942 ("Pepper and Salt"): When a memorandum passed round a certain Government department, one young pedant scribbled a postscript drawing attention to the fact that the sentence ended with a preposition, which caused the original writer to circulate another memorandum complaining that the anonymous postscript was "offensive impertinence, up with which I will not put." —The Strand Magazine.

Churchill often contributed to London's Strand Magazine, so it seems unlikely that the magazine would fail to identify the unnamed writer as Churchill if he were indeed the source of the story. Attributions to Churchill only began to surface well after the war's end. The usual source of the Churchill attribution is Sir Ernest Gowers' Plain Words (1948):

( It is said that Mr. Winston Churchill once made this marginal comment against a sentence that clumsily avoided a prepositional ending: "This is the sort of English up with which I will not put".

Though Gowers is typically the only source cited for the attribution (as in The Oxford Dictionary of Quotations and The Oxford Companion to the English Language), the Churchill story was circulating in 1948 in various forms. Here is the earliest reference I've found:

'Up With Which I Will Not Put' Is Latest Winston Churchillism Portland (Maine) Press Herald, 20 Mar 1948 London March 19 (UP) -- Another Churchillism has been read into the record -- "up with which I will not put." Thursday night in the House of Commons, Glenvil Hall, financial secretary to the treasury, made a plea for clearer English. He cited as an example of Winston Churchill's "forceful if not always grammatical English" this marginal notation that the wartime Prime Minister scribbled on a document: "This is nonsense up with which I will not put."

This same wire story appeared later in March '48 in another newspaper — the Daily Gleaner of Kingston, Jamaica — so clearly the anecdote was traveling far and wide. By December of that year, a more embellished version was circulating:

The Wall Street Journal, 9 Dec 1948 ("Pepper and Salt") The carping critic who can criticize the inartistic angle of the firemen's hose while they are attempting to put out the fire, has his counterpart in a nameless individual in the British Foreign Office who once found fault with a projected speech by Winston Churchill. It was in the most tragic days of World War II, when the life of Britain, nay, of all Europe, hung in the balance. Churchill prepared a highly important speech to deliver in Parliament, and, as a matter of custom, submitted an advanced draft to the Foreign Office for comment. Back came the speech with no word save a notation that one of the sentences ended with a preposition, and an indication where the error should be eliminated. To this suggestion, the Prime Minister replied with the following note: "This is the type of arrant pedantry up with which I will not put."

Over the following years, other variations circulated in the newspapers, all featuring Churchill. (By the time a reader inquired after the Churchill anecdote in The New York Times's "Queries and Answers" section in 1951, "countless readers" sent in versions of the story, but none had an authoritative citation.) Some later versions feature an officious book editor rather than a Foreign Office clerk. (A review of the variations can be found at:

Further research into the Churchill attribution would require searching the House of Commons archives to track down exactly what Glenvil Hall said in March 1948. I'm guessing he embellished the story along the lines of later attested versions. It appears, however, that the anecdote emerged during WWII featuring a generic memorandum writer, and only after the war did the story get attached to Churchill (as so many other anecdotes have).

—Ben Zimmer

Posted by Geoffrey K. Pullum at 03:39 PM

Typed citation links

James Tauber at journeyman of some, reacting to Google Scholar, asked

...what if citation indices were annotated with the relationship between the newer publication and what it was citing? You could have relationships like "quotes", "summarises", "provides further evidence for", "argues against", "answers question posed by", and so on.

I agree that having typed document links is a neat idea. A system that already does something like this is CiteSeer, which provides (as in this example) separate lists of related documents along the seven different typed links "Cited by", "Similar documents (at the sentence level)", "Active bibliography (related documents)", "Similar documents based on text", "Related documents from co-citation", "Citations", and "Documents on the same site".

Some of the kinds of links that James suggests ("provides further evidence for", "answers question posed by") are likely to be hard even for human readers to agree about. James notes one of the reasons for this:

The granularity of many articles might not be right for this to really work given that one might argue for one part of an article and argue against another.

There are also other reasons why it might be hard. People don't always see the same evidential connections, nor do they always agree about them when they're pointed out. Automatic procedures for finding such connections will disagree as well. Textual similarity relations of the type that CiteSeer uses for (some of) its links are similarly fuzzy, but most users recognize that fact, I think, and are easily able to use such relations for what they may be worth. Logical-sounding relations like "provides evidence for" might be (mis)taken more seriously. Still, I'd like to be able to explore a link graph that included such relations.

James muses that bloggers could start using typed hyperlinks as part of the process of composing entries:

I wonder if it might be more practical in blogs. People could link to this entry with annotations like "agree", "agree with additional ideas", "agree with caveats", "seen something like this already", "really dumb idea with reasons stated".

As James seems to intend, I guess this could be done using a sort of souped-up version of the rel attribute on a or link elements. But this seems to require an agreed-on controlled vocabulary for such link types. If you let people add arbitrary comment-like annotations to such links, you'd get a large space of variants, both roughly equivalent ones like "really dumb idea with reasons stated", "silly idea, here's why", "follow this link to learn about the problems with this foolishness", "nonsense, for elaborately documented reasons", and also clines shading off in various directions like "apparently obvious idea that turns out to be false for interesting reasons", "idea that seems dumb at first but is actually profoundly true", and so on. Clustering techniques could be used to establish some sort of structure over these annotations, but you could do that with the original text around the link, without the writer-supplied metadata.

When link types are treated as information supplied by writers rather than information creating by an indexing process, the game changes. Indexers can structure a set of documents in lots of different ways for lots of different purposes, and still invent new types of links to try tomorrow. Users of such indices can pick and choose as they please in this evolving garden of relationships. But a set of link types to be used by writers can't be used by readers unless all the writers they care about have used roughly the same ones, in roughly the same way.

The trick would be to create a taxonomy of link-types -- or as James implies, link-dimensions -- that's close enough to orthogonal to span an interesting space of relationships, expressive and flexible enough to be useful to different sorts of people doing different sorts of things, and simple enough for most people to be willing to learn it and use it.

You could argue -- probably someone already has -- that html has been so successful because it's at a sweet spot of semantic incoherence, which allows writers to adopt vague and variable theories about what it all means, and readers to reconstruct some approximate analog of the writers' intentions. Just like language, maybe, but that's another story. I like the idea of adding some link types to the system, but it won't be easy to do it in a way that works.


Posted by Mark Liberman at 06:04 AM

December 11, 2004

Hawaiian spelling contest answers

The two Hawaiian spelling mistakes in Rich Monastersky's article on language revival in Hawaii were (1) *Mā'noa for Mānoa and (2) *'okino for ‘okina. And let the record show that Rich knows his stuff: on being told of the existence of the two errors he immediately identifed them himself; only the merest prompt from Language Log was needed for him to spot them. (It's not clear whether the webmaster at the Chronicle will fix the web page.) A few more detailed notes follow, for fans of this deceptively difficult language.

1. In the incorrect spelling for the place name that is often casually spelled Manoa by people using just the unembellished roman alphabet, namely *Mā'noa, the macron on the a is correct (it's a long vowel), but the apostrophe is not correct, and couldn't be. The apostrophe is a common substitute for the symbol , which represents a sound traditionally known in Hawaiian philology as the ‘okina and known to phoneticians as the glottal stop. Although the ‘okina is often left out by English speakers and others writing Hawaiian words, and was often omitted (with wholesale resultant ambiguity) by missionaries transcribing the language, the ‘okina is not a minor orthographic detail (like the apostrophe in English, which, though important in the standard written language, never represents any sound); it's a letter of the alphabet representing a full-fledged consonant. The consonants of the eastern dialects of the Hawaiian language (e.g. for O‘ahu) are: {p, k, , m, n, l, w, h}. The vowels are {a, e, i, o, u}).

Now, it is a crucial fact about the phonology of Hawaiian that there are no consonant clusters. None whatever. That means that every glottal stop must be followed by a vowel. Any occurrence of a sequence like ‘n in what purports to be a Hawaiian word must be a spelling mistake. (Not in every language, of course. In English, lots of people would pronounce witness with a glottal stop right before the [n]. It just can't happen in Hawaiian.) So, the name of the place on which the University of Hawaii built its main campus is Mānoa. It has two stress groups, [ma:] and [noa], and thus gets two stresses, on [ma:] and [no] (there is some controversy about which one of these is the secondary stress, so Keola Donaghy informs me). The final [a] is unstressed.

2. The spelling *'okino is a different matter: it's just a case of Rich (or someone) typing the wrong last letter. There could have been a word of Hawaiian spelled like this, but there just happens not to be. The word for a break or cessation that was used to name the glottal stop is spelled (and pronounced) ‘okina. It is stressed on the penultimate syllable, [ki].

Posted by Geoffrey K. Pullum at 04:56 PM

Scholarly suit

According to John Battelle's Searchblog, the American Chemical Society has just sued Google on the grounds that Google Scholar (established 11/19/2004) infringes on the trademark of SciFinder Scholar ( registered with USPTO 5/25/2001). Yet to be heard from: WebElements Scholar, Rhodes Scholar, Fulbright Scholar, the Scholar's Bookshelf, and the Ten O'Clock Scholar.
[Update 12/12/2004: Arnold Zwicky wonders about Scholar Press, the American Scholar, and (his favorite) Road Scholar (which is a registered trademark, but seems willing to share its name with an unrelated activity.). ]
And I'm waiting to see whether the American Philosophical Society (founded 1743) will wake up and take legal action against the American Chemical Society (founded 1876), for transparently ripping off its name in a way likely to confuse the public.

Meanwhile, I'm hard at work on a patent application for my new invention, "Method for post-modifying English language product names to communicate their usefulness in scholarship". And rumor has it that Geoff Pullum is preparing to back up his existing copyright claim to snugglebunny with patent protection.

Posted by Mark Liberman at 02:01 PM

Gift ideas for hir and hir

John Lawler emailed a pointer to US patent application 20040249626, "Method for modifying English language compositions to remove and replace objectionable sexist word forms", filed 6/3/2003 and published 12/9/2004. The abstract:

A method for removing objectionable sexist word forms from English language text and substituting new non-sexist word forms for the objectionable sexist word forms provides ten new blended word forms. Each new blended word form provides a non-sexist substitute for a word from ten associated sexist word pairs. If the gender of the being under consideration by the word from the sexist word pair is unknown, an appropriate new non-sexist word form is substituted for the objectionable sexist word form.

After reading the text of this document, I had to check the URL to be sure that I was really looking at the site of the US Patent and Trademark Office, rather than a spoof on our out-of-control system for evaluating algorithm patents. A check on the USPTO site suggests that this is an application is still pending review. As I'll explain below, I hope that it turns out to be in the 30% of patent applications that the USPTO does not allow.

The inventor -- Richard S. Neal of Edmond, OK -- cites 68 claims, all of which are of the same form as claim 1:

1. A method for removing objectionable sexist word forms from a portion of English language text and substituting new non-sexist word forms for the removed sexist word forms, said method comprising the steps of: a. Providing a non-sexist word HIR; b. Searching the portion of English language text to locate each instance wherein the sexist word HIM is used in third person objective case; c. Determining in each instance from the context whether the gender of the being referred to by the sexist word HIM is known or unknown; d. Substituting said non-sexist word HIR for the sexist word HIM in each instance wherein the sexist word HIM is used in the third person objective case and, further, wherein the gender of the being referred to by the sexist word HIM is unknown; e. Searching the English language text to locate each instance wherein the sexist word HER is used in third person objective case; f. Determining in each instance from the context whether the gender of the being referred to by the sexist word HER is known or unknown; and g. Substituting said non-sexist word HIR for the sexist word HER in each instance wherein the sexist word HER is used in third person objective case, and, further, wherein the gender of the being referred to by the sexist word HER is unknown.

The essential idea in this claim is to substitute the word hir for the standard English words him or her, just in case these are objective-case pronouns referring to a "being" whose "gender" is unknown. This requires four steps:

  1. Finding instances of him and her as independent words in text. This is trivial.
  2. Determining whether each instance is an objective-case pronouns. This is hard to do with 100% accuracy (consider "...gave her dog food"), and no methods are specified.
  3. Determining what each objective-case pronoun refers to. This is very hard to do and no methods are specified.
  4. Determining whether the "gender" of the reference is "known or unknown". This is a completely undefined step, which would probably be hard to do if it were defined precisely enough to make it possible to do it. "Gender" is not defined in the text of the patent, as far as I can find -- does Mr. Neal mean grammatical gender? does he mean gender as a euphemism for biological sex, or a term for culturally-defined sex-related roles? Nor does he ever define to whom the "gender" is "known or unknown" -- the author of the text? a typical reader of the text? a person or machine implementing the algorithm (that would make it easy, since the machine could simply plead ignorance in every case)?

The other 67 claims are all exactly of the same form, except that they deal with nine other proposed lexical substitutions, multiplied by a variety of grammatical and morphological distinctions (oddly analyzed in some cases, but never mind that): hirs as a substitute for his and hers, hir as a substitute for his and her, hesh as a substitute for he and she, fother in place of mother or father, mir in place of sir or madam, hirself for himself or herself, birl in place of boy or girl, wan as a substitute for man or woman, and wen as a substitute for men or women.

The patent description (following the patent claims) suggests that the author is really thinking of a technique of editing for use by humans, rather than an algorithm for use by machines, and sees his innnovation as the provision of a set of words to use in defined circumstances.

Objectionable sexist word forms (especially pronouns, nouns, and the possessive adjectives his and her) have plagued the English language for generations [...] writers have used such phrases as "him or her," "her or his," or the awkward "their" in largely isolated attempts to avoid the problem of objectionable sexist language. [... ]

Because of a general lack of a suitable substitute, writers are both reluctant to employ the objectionable sexist language and also reluctant to fashion a remedy. [..]

Entry rules for a contest may use "contestant" or "participant" repeatedly in an effort to avoid using "him or her" or "he or she", etc. As a manager was heard to say, "If somebody decides not to participate then tell that somebody that that somebody doesn't have to." [...]

All occurrences of "him" or "her" in English language composition are sexist, of course, but not all sexist occurrences are objectionable. If the being under consideration is clearly male (for example, George Washington), it would be completely appropriate (and non-sexist) to refer to him in a subsequent sentence. Likewise, it would be appropriate to refer to his horse or to his presidency. Similarly, a second reference to Emily Dickinson might refer to her poetry.

When gender is unknown, however, the use of multi-compounded expressions, the adoption of word forms from other contexts, and the interposition and repetition of needlessly larger word forms deny the English language (and writers of the English language) what is required--word forms which are simple, accurate, and easily expressed. After a century and a half, no set of words adequately solves the problem of sexist language. [...]

I'm not going to comment on the improbability of the idea that these innovations might become widely used. The patent office has always been open to silly inventions, and as far as I'm concerned it should be. And no significant harm would be done by this particular patent, since it seems unlikely to have any real impact. But this case strikes me as symptomatic of wider flaws in the patent system, which lead to a proliferation of inappropriate patents that sometimes do have a big impact.

First, there's no reference to the considerable "prior art". Many of the specific "non-sexist" words in this application have been around for a while. A few minutes of googling turns up a reference to hesh and hir in this document dated 1995, and the Wikipedia article on sie and hir mentions the first recorded use of hir on usenet in 1981, and possible roots as far back as the 1930s. Birl is common and/or obvious enough that birls is the name of the livejournal community for "boyish girls" -- though I don't know how far back the term can be documented, I suspect that it precedes 6/2/2003. I haven't checked the other new (?) words in the patent. There's good reason to believe that consideration of "prior art" is generally inadequate in the case of software patents. I can only imagine what would happen if the patent office had to investigate prior art in the (hypothetical) case of lexical patents.

Second, (one interpretation of) the scope of the application seems entirely inappropriate. I'm no kind of expert in patent law, but I don't believe that you should be able to patent a word. What makes this document look like a patent are the claims of (pseudo-) methods for substituting words in texts. But if these are interpreted as instructions to writers or editors, you could use this approach to patent any proposed new word or word usage -- just list a bunch of claims of the form "...find all words or phrases referring to the concept X and replace them with the (patented) word Y...", suitably fleshed out with restrictions on syntactic and semantic categories and structures.

Third, if these methods are interpreted as an algorithm to be implemented by a machine, they pose problems in automatic text understanding that can't now be solved, like accurately determining the intended referents of pronouns, and evaluating the epistemological status of the sex or gender of those referents. I happen to think that progress is to be expected, at least in the long term, but there are some contrary views, and if anyone wants to place a bet about the performance of relevant reference-resolution algorithms over the next 17 years, I might be willing to take the "under". Patents that involve the solution of genuinely impossible problems cause no problems, other than a waste of patent-office resources. However, if you let people patent any process they can imagine, even if they have no glimmer of an idea how to implement it, the mass of resulting fantasies is sure to include a fraction of processes that others else might later be able to create -- if they weren't forestalled by a pre-existing fantasy patent. Thus this kind of patent would actually discourage innovation instead of encouraging it.

Finally, the algorithms as specified in the application don't do what the author wants to them to. They're buggy. If we fill in all the vagueness in the most sympathic way, and supply skilled human oracles to solve all the unsolved problems, the result is still junk.

Let's take a specific case. Remember that the patent is aiming to solve problems like this one:

[0004] Objectionable sexist word forms (especially pronouns, nouns, and the possessive adjectives his and her) have plagued the English language for generations. In countless papers and documents written over the last 150 writers have used such phrases as "him or her," "her or his," or the awkward "their" in largely isolated attempts to avoid the problem of objectionable sexist language.

What's the proposed method for fixing up "him or her", say in the phrase "gift ideas for him or her"?

Well, the first step is to "locate each instance wherein the sexist word ... is used". Check -- we just found two.

Are these uses "in third person objective case"? Check.

Now we need to be "[d]etermining from the context whether the gender of the being referred to ... is known or unknown". Well, when I look at the context, at, I'd have to say "unknown". The page talks about "boys and girls", "bride and groom", "Mom and Dad", "boyfriend and girlfriend". You can't get much more gender-role inclusive than that.

OK, the pattern has matched, and so we take the recommended action, which is:

Substituting said non-sexist word HIR for the sexist word HIM in each instance wherein the sexist word HIM is used in the third person objective case and, further, wherein the gender of the being referred to by the sexist word HIM is unknown

and also

Substituting said non-sexist word HIR for the sexist word HER in each instance wherein the sexist word HER is used in third person objective case, and, further, wherein the gender of the being referred to by the sexist word HER is unknown.

This part I can even write a computer program to do. For the input "gift ideas for him or her", the output is "gift ideas for hir or hir".

I guess there's another alternative here, which is that the "gender of the being referred to" is actually "known". But in that case, no subsitution will take place, and the output is the same as the input: "gift ideas for him or her". The problem is still not solved. Probably what the patent author wants to recommend in this case is the phrase "gift ideas for hir" -- but although he features such disjunctions of gendered words as the problem, his specified methods fail to produce the appropriate solution when applied to them.

Although I may seem to be beating up on the patent applicant here, I'm really concerned with the patent examiners. If this application as written were allowed, it would mean that some patent examiners are so careless in evaluating algorithmic applications that that they don't notice that the specified methods produce wrong results (or no results) when applied to the sample problems that are listed in the application. More seriously, it would extend the patent system's notion of "invention" in a major way, by allowing what looks like a patent on a software method -- albeit a buggy one -- but is actually a patent on the use of some particular words.

The 1972 Gottschalk v. Benson decision "held that a patent cannot cover all possible uses of a mathematical procedure or equation", and for a while had the effect of preventing software patents. Then Diamond v. Diehr in 1981, though dealing with software control of a physical manufacturing process, opened the floodgates to software patents, and the 1994 Federal Circuit Court decision In re Alappat seems to have systematically validated the idea of software patents by ruling that software turns a general-purpose machine into a special-purpose one that can constitute a patentable invention. Since then, this line of development has been extended further to encompass business method patents, although (as far as I know) this took place purely as a matter of USPTO practice, without any new legal foundation. Allowing a patent like the one under discussion here would open the door to patents on modes of linguistic expression, by making them seem as if they are algorithms for transforming text, just as Diamond v. Diehl opened the door to software patents by treating them as methods for controlling machinery.

Looking on the bright side, this patent application does suggest some exam problems for a semantics course. The question of whether the "gender of the being referred to ... is unknown", in a phrase like "gifts for him or her", raises all sorts of interesting issues. Do the pronouns in "gift ideas for him or her" refer at all? If so, do they refer as individual words or only as a disjunction? If "the sexist word HIM" and "the sexist word HER" don't refer to any particular being at all in this context, is it true or false that "the gender of the being referred to ... is unknown"?

[If you like reading about patents that never should have been granted, Jason Schultz's LawGeek site has a good collection. ]

[Today's IEEE Spectrum Careers has an article by Adam B. Jaffe & Josh Lerner on "A Radical Cure for the Ailing U.S. Patent System." ]


Posted by Mark Liberman at 10:41 AM

December 10, 2004

Dudes, It's John and Marsha

The current dude flap features (invented) exchanges consisting entirely of occurrences of the word dude, with varying prosodies and accompanying gestures, the whole thing telling a story. These lexically minimalist dialogues are fables, of course -- tales conveying either the poverty of the expressive resources of the young, or (more positively) their creativity in using language in context.

The lexically minimalist dialogue is not a new literary form. Over fifty years ago (February 1951, to be exact), Stan Freberg issued a recording of his most famous comic routine, "John and Marsha", the words to which are, well, John and Marsha and nothing else except discourse particles like um hm and oh and various kinds of laughter. No visuals at all.

Sites that offer you lyrics for songs are pretty much stumped by this one. One site gives you only the first three (of, by my count, seventeen) exchanges, and then lapses into ellipsis dots: "John... Marsha... John... Marsha... John... Marsha..." Not only does this fail to suggest the prosodic characteristics and voice qualities of the performance, it totally misses the climax of the piece: "Oh Marsha Marsha Marsha"... "John John"... "Marsha Marsha Marsha"... "John John John"... "Marsha Marsha Marsha Marsha"... "John John John John John"... [long pause before the denouement] "Marsha"... "Hm John"... "Marsha" [the end]. I chose the word "climax" intentionally; apparently, some people found this section obscene and thought it should be censored. It wasn't, and Freberg sold tons of copies.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:19 PM

Saving Hawaiian

A nice article by Richard Monastersky on the attempted revival of the highly threatened Hawaiian language can be found in the December 10 issue of The Chronicle of Higher Education. [They are meanies: the link to the article will only last about five days, so read it while you can.] It has various audio links — chants, some basics of pronunciation, and part of a Hawaiian language class at the Manoa campus of the University of Hawaii. There are some linguistic pointers on the language at the end of the article, which, in the web version at least (I haven't seen the print edition), include one glaring spelling mistake in Hawaiian, a spelling that couldn't possibly be right for a Hawaiian word for phonological reasons. The first person who sends in the correct answer (email to pullum at the ucsc site in the edu domain) will win our prize: they will be named on Language Log and they will be given some otherwise closely guarded information: the actual location of the next secret cabal at which the nation's linguists will gather in early January.

Sorry, correction: there are two spelling mistakes in Monastersky's tutorial. In fact he cites seven words and misspells two of them. That's a success rate of under 72%, which gets him a grade of D according to the scale I use. For more details, read on . . .

Melissa Fox of Oxford, U.K., was the first to mail me with the spelling mistake I had in mind. The existence of the second spelling error (a straightforward misspelling, but not an impossible phonological form) was pointed out to me by Sandra Fan of the Computer Science department at the University of Washington. It is not clear who should get the prize any more, so they both get their names on Language Log, and here is the scoop on where the next secret cabal of the linguists of this country will be: contrary to incorrect rumors that were put about to fool our enemies, it will be at the Oakland Marriott City Center, from January 6th to 9th. Be there! Sneak in and try to catch sight of famous linguists like George Lakoff who will be giving talks there! Visit the LSA meeting web page and make an honest person of yourself by registering for the meeting!

I'm now looking for someone to send me both spelling errors in the Monastersky article without getting them from Melissa or Sandra. That person will win a free cup of coffee at the book exhibit at the LSA meeting from me personally.

Sorry, correction, I'm not waiting for anything. Lal Zimman of San Francisco State University has spotted both of the typos, and wins the free coffee. Naturally, you'll be wanting to know what the errors were. But not half as much as Rich Monastersky will be wanting to know, assuming that he reads Language Log.

Posted by Geoffrey K. Pullum at 02:23 PM

From dude to duuuuuuuuuuuuuuuude

As today's final (?) contribution to dude science, we take a look at the relative frequency of dude as spelled with different numbers of u's, advancing an area of inquiry pioneered here.

Here's the table (as of yesterday morning's Google counts):

u count
1 dude
2 duude
3 duuude
4 duuuude
5 duuuuude
6 duuuuuude
7 duuuuuuude
8 duuuuuuuude
9 duuuuuuuuude
10 duuuuuuuuuude
11 duuuuuuuuuuude
12 duuuuuuuuuuuude
13 duuuuuuuuuuuuude
14 duuuuuuuuuuuuuude
15 duuuuuuuuuuuuuuude
16 duuuuuuuuuuuuuuuude

I'll spare you the spectral analysis of the apparent resonance near <7 u>.

Mindful of the broader vistas of linguistic science, I'll end with two-dimensional table of variant spellings of wassup: thus the entry of 465 in row 3, column 2 means that "wasssuup" (with 3 s's and 2 u's) got 465 hits.

  1 u 2 u 3 u 4 u 5 u 6 u 7 u 8 u
1 s
2 s
3 s
4 s
5 s
6 s
7 s
8 s

The resonance near <4s 3u> awaits explanation. Could this be a sign of the hip quark?

[Update: Vardibidian emailed to say

I'm afraid your two-dimensional depiction of wassup is woefully inadequate. For instance, you list 463 ghits for 'wasssuuup' but the more usual spelling, at 538 ghits, is 'waaasssuuup'. On the other hand, where there are 199 ghits for 'wassssuuuup' there are only 19 for waaaassssuuuup'. There are 3,050 for 'waassuup' to 10,400 for 'wassuup'. So there is some sort of resonance at 3/3/3.

There are also 317 ghits for 'waassuupp' and 339 for 'wasssuuuppp', so even a 3-d model would not fully express the resonances.

Well sure, it's really a five-dimensional space, I agree. If someone will volunteer to determine the values in all (say) 20x20x20x20x20 cells (all 3.2M of them), I'll gladly link to a post on the subject :-). Even 6x6x6x6x6 (7,776 cells, which could be harvested in 8 days via the Google API) might be interesting... especially if accompanied by a model of the underlying process that fits the data well. ]


Posted by Mark Liberman at 08:37 AM

Duding out

My mailbox contains evidence of growing interest in the expressive possibilities of the word dude, and growing puzzlement in the face of this interest. One example of each follows.

David Wald sent a pointer to Ruben Bolling's meditation on "DudeSpeak Translated":

This shows that cartoons are subject to a sort of variant of the second law of thermodynamics: in all isolated cultural exchanges, irony increases.

John McCall wrote

I must confess a little confusion over the Log's recent "dude" activity; the complexity must escape me. Grammatically, it's a token of address like any other: these all-dude conversations could easily have used any other form of address, and the only superficial surprise here arises because both speakers are using the same form. Imagine, for example, two speakers, John and Martha, holding exactly the BASEketball conversation, but each saying the other's name in place of "dude" -- even if it doesn't seem fully natural (for reasons I'll describe), it's just as sensible as the original (which is to say, very).

Semantically, it acts just as you describe (i.e. hip, informal), except it's stripped of some of the baggage of other forms of address. For example, the John/Martha substitution seems a little unnatural: normally, we only address people by their names when we're specifically calling for their attention. "dude" doesn't have this quality, probably because it isn't as obvious of a reference to a specific person; that may explain its popularity, since English doesn't have many similarly unloaded forms of address (occupational titles being awkward and "milord" and "milady" normally being inappropriate).

And I might as well throw in how much I appreciate the blog; it is one of the most consistent forms of amusement in my internet-reading life, and I'm grateful for all the work you and the others put into it. Thank you very much; I hope you continue the effort for as long as you still enjoy it.

Let me try to put this another way. The same word(s) can always be used with a range of different (illocutionary and perlocutionary) meanings, which may be encoded to a limited extent in prosody and voice quality and so on, but mostly arise just from the normal dynamics of communication. This makes it easy to have conversational sequences in which the same word is used to mean a sequence of different things. The repeated word doesn't have to be slang, or a greeting, or anything else in particular. For example:

A: When should we do it?
B: January? = [how about January?]
A: January!?! = [how can you suggest that? It's just a month away, and you know what the start of the semester is like]
B: January. = [Yes, I know, but the alternatives are worse, and we can do it.]
A: January... = [Hmm, you're seriously suggesting January, aren't you...]
B: January! = [Yes, you dope, let's decide this and move on]

and so on.

People seem to be especially fond of these single-word conversations with newly-discovered slang like dude. One reason for this was featured in Scott Kiesling's American Speech article -- such jokes fit the always-popular view that youth culture has degenerated to a linguistic level barely above grunts and squeals. I think there's another side to it as well, seen from the other side of the fence: incoming slang is a sort of secret language, expressing exquisitely shaded meanings that are shared among the in-group but are baffling to outsiders. But what the outsiders are missing is not so much the lexical items as the shared cultural context, and so it's not so easy as learning a word definition. The all-dude cartoons are a way of making that point, and I suspect that's why natives of dudespeak seem to like them even more than the members of pre-dude generations do.

On the other hand, it's perfectly reasonable for John McCall to be puzzled by the general fascination with this topic. Dude gustibus non est disputandum.


Posted by Mark Liberman at 07:57 AM

December 09, 2004

Da da da

A little gem brightened grading of our Linguistics 1 final today.

Background: the McGurk effect involves integration of visual information in speech comprehension, as in this video (and see Sally Thomason's and Bill Poser's LL posts too) where the sound ba ba, ba ba, ba ba is overlayed on a video of a man who appears visually to be saying ga ga, ga ga, ga ga. If you listen and watch at the same time, you might well perceive da da, da da, da da. Pretty damn cool. It's surely no coincidence that the place of articulation (i.e. where  the air gets blocked) for "d" is midway between "b" and "g". Looks to me like your fantastic mother of a brain is interpolating multimodal evidence as to the physical air-blockage position in realtime. For full appreciation of coolness, try the video with eyes closed or no sound. Warning: I personally have played the video so many times that I even perceive da with the sound and visuals both turned off. da da da. It just goes on and on.

One unfortunate student taking Linguistics 1 was sick during the quarter. She missed the class where I explained the stuff above, and all she had to go on was her memory of the video from the class website. So question 9 on her final, worth 8 points, ended up looking like this:

Question: 9. What is the McGurk effect and what does it show? (1 brief paragraph)
Answer:       In the video the man seemed to be repeating [daa] over and over again. Aphasia?

How many points should I give her? For full marks she would have had to have specified Broca's (aka Production) Aphasia, right?

Tangentially, you seem like the generous type, and I have to let someone know about my new Christmas wish: I'd do anything for a McGurk remake of this bilingual Trio classic. This is what you need to know. Aha.

da da da (repeat to fade)

Posted by David Beaver at 02:15 AM

Dude, no way

As a native speaker of Dude (originally Northern Californian, influenced recently by Southern Californian speech), I feel the need to address Mark's "injured finger" interpretation of the Zits cartoon. Bottom line: I think it's just not possible to interpret "Dude?" as "Do you believe this?" or anything of the sort.

(Other native Dude speakers should please speak up if they disagree with either of these reasons; I'm relying entirely on my own intuitions here.)

Let's repeat the cartoon. The relevant panel is the second one.

The intonation indicated by the question mark in "Dude?" allows a very small range of interpretations for me, and they all seem to be contextualizable variants of the following two possibilities.

  1. Want some?
  2. Ready?

(I have to admit at this point that I haven't given this a huge deal of thought. Note that there are 25 examples of dude followed by a question mark in Scott Kiesling's Dude corpus of 520+ examples; in all of these cases it is a tag on a separate sentence, not an utterance on its own.)

(1) is most likely the right interpretation, though (2) is not incompatible with the cartoon -- maybe the exchange of gum is a ritual between these two characters, and one is asking the other if he's ready for class. If there really was an injured finger involved, any other (non-rising) intonation ("Dude." or "Dude!") would work to mean "Hey, check this out." But, for whatever reason, the question (rising) intonation doesn't work to mean something like "Do you believe this?" or "What do you think?" or anything of the sort. In fact, just trying to imagine a person saying "Dude?" and meaning something other than (1) or (2) really does sound like that person just has a limited vocabulary, not Dude competence.

On a related point, I have seen BASEketball. It wasn't nearly as good as Parker & Stone's various other efforts, but I do remember the all-dude bit that Mark discusses here.

Remer: ... Dude, quit thinking about yourself for a change!
Coop: Dude, I'm not gonna cave in! End of story, dude!
Remer: Duuude??
Coop: Dude!
Remer: Dude!!
Coop: Dude.
Remer: Dude!
Coop: DUDE!
Remer: Duude!! [Coop opens his mouth but says nothing. Remer continues firmly] Dude.
Coop: [speechless, mouths around for something to say] I guess you got a point there. All right all right, look. Maybe I was wrong. From now on... we're full partners.
Remer: Really?

Here's an audio clip I found of just the all-dude bit. If you listen closely to Remer's first dude (transcribed "Duuude??" above), you'll notice that the intonation is emphatic (falling-rising), meaning something like "What the hell do you mean?". Again, a simple rising-intonation "Dude?" here would not fit the context.

(Note that the page where I found the link to the audio clip transcribes the passage this way:

Dude. Dude. Dude. Dude! Dude! DUDE! DUDE! Dude.

That is, with no question marks anywhere at all.)

I agree with Mark that a video clip would be good, just to see just how much of what Remer and Coop are saying to each other might be recoverable from context, body language, etc. My memory of the scene, however, was that the all-dude bit was purposely over the top. The point is that Remer and Coop are such close friends that they can communicate almost telepathically; if they had used Dude-speak completely "correctly" during this exchange -- that is, such that native Dude speakers could more or less follow it -- that point would be lost on that part (probably 99.9%!) of their audience. (Parker & Stone do get the Dude-speak right throughout the movie in general, of course -- they, and probably the entire cast, are doubtlessly native Dude speakers.)

Update: James Thompson (another native Dude speaker and Parker & Stone watcher) tells me that he recalls understanding the all-dude exchange perfectly, and notes that the VHS copy he owns has other versions of the scene that are also understandable. I could not, however, get James to tell me exactly what point it was that Coop acknowledges that Remer has.

Remer and Coop share another (shorter) all-dude moment later in the film, ending in a long screen kiss:

Remer: Oh, shit, Coop, I'm sorry. I guess the money did go to my head.
Coop: No I, I'm sorry, Remer. I ...think I've got a lot to learn about sharing.
Remer: Look at me. I've become everything I used to hate.
Coop: Yeah. Maybe we ...just grew up too fast.
Remer: Our worst enemy turned out to be
Coop: [they look lovingly at each other] Dude.
Remer: Dude.
Coop: Dude?
Remer: Dude. [gives Coop a French kiss. They go at it for several seconds before a fireman pops up next to them]

Coop's final "Dude?" here could have something along the lines of the "Ready?" meaning, but audio/video would help to settle that.

As long as I'm admitting that I appreciate Parker & Stone's work ...

Watch/listen to the "Buddy" skit on Adam Sandler's comedy album They're all gonna laugh at you!. [Click here, then click on the sepia-toned album cover (left-hand side of the screen, the last of five album covers). You can see three skits from the album, one of which is "Buddy".]

There are six male characters in this skit: two of them constantly say "buddy", two others constantly say "dude", and two others constantly say "homie". (In the skit these are all assumed to work just like "dude"; I wonder how accurate that is.) All three of these words are used as utterances on their own several times -- especially once all six characters are on the scene -- but there are only two three occasions in which I would transcribe such utterances with a question mark (or two). Both Two involve "buddy", one involves "dude".

  • Buddy #2 drinks too much of Buddy #1's drink. Buddy #1 says: "Buddy??", with a falling-rising intonation very similar to Remer's "Duuude??" in the BASEketball scene above.
  • Buddy #1 has his foot on Dude #1's chest. Buddy #1 says: "Buddy?" with rising intonation, offering Dude #1 a beer. (This is again the "Want some?" meaning in the Zits cartoon above.)
  • Train crashes. One of the Dudes (hard to tell which) says: "Dude?", with rising intonation. As Alexander Koller points out, this Dude appears to be addressing the other Dude (or anyone): "Dude, are you there?". So much for (1) and (2) ...

Update #2: Here's another film (appropriately called "Dude"), courtesy of James Thompson.

[ Comments? ]

Posted by Eric Bakovic at 12:54 AM

December 08, 2004

You got a point there

OK, one more. Rich Alderson emailed "to point out the partnership-breakup scene in Stone and Parker's [1998] movie BASEketball [sic], in which the entire conversation consists of the word dude in various intonations".

I've never seen BASEketball, but I located the script, and found the cited passage, which seems to be the emotional climax of the movie:

Remer: ... Dude, quit thinking about yourself for a change!
Coop: Dude, I'm not gonna cave in! End of story, dude!
Remer: Duuude??
Coop: Dude!
Remer: Dude!!
Coop: Dude.
Remer: Dude!
Coop: DUDE!
Remer: Duude!! [Coop opens his mouth but says nothing. Remer continues firmly] Dude.
Coop: [speechless, mouths around for something to say] I guess you got a point there. All right all right, look. Maybe I was wrong. From now on... we're full partners.
Remer: Really?

This is a case where an audio clip would be helpful, I think. Video, too.

Anyhow, a quick scan suggests that the movie overall has more than 100 examples to add to the Dude Corpus, at least the [nonexistent] scripted portion of it.


Posted by Mark Liberman at 07:31 PM

Dude cartoon decoded

Several people have written to clue me in to the meaning of the second panel of the dude cartoon. Combining various of the messages, the correct interpretation is something like:

Panel 1 A: "Dude!"
B: "Dude."
A: "Hey!" or "Hello!"
B: "Hey there" or "Hi."
Panel 2 A: "Dude?"
B: "Dude!"
A: "You want some gum?" or "Want a piece?"
B: "Thanks!"
Panel 3 A: "Dude."
B: "Dude."
A: "So long."
B: "See ya."
Panel 4   [Girl is scornful of male monosyllabic communication,
but seriously, dude, *as if* she didn't understand every word of it. ]

My problem with panel 2 was visual, not linguistic -- it looked to me like the kid was showing off an injured finger, not offering a stick of gum. I think the injured finger interpretation would work too, dude-wise:

A: "Dude?" == "Do you believe this?"
B: "Dude!" == "You poor guy."

But it didn't really look enough like a cartoonish injury, and... Anyhow, thanks to Eric Bakovic, Melissa Fox, and John Bell for enlightenment.

Eric also sent the following example of transgenerational usage during Dave Perlmutter's office hours:

Student: "Dude, Dr. Perlmutter, that final exam you gave us? Dude!"

John Bell sent the following reminiscence:

I recall the first time I heard "Dude!" used as a general mark of approbation. About 23 or 24 years ago, I was playing a baseball video game in an ice cream parlor, and two kids aged about 13 were watching. When one of them realized that this game had some new features on it that previous versions didn't, he cried, "Dude!" It was clearly not directed either at me or his friend (nor, for that matter, at the action on the field, since I was an incompetent player).

(Correction. Since I'm writing this in mail(1), I'll just add it here: that was the first time I'd heard "Dude!" used that way in northern California. I had been hearing it in SoCal for at least a couple of years before that.)

And Melissa Fox added this:

I've been observing this term that repetition of the single word "dude" can constitute whole conversations, and have even taught a British friend of mine that it is entirely correct to answer "Dude!" with "Dude, I know!" (I even sent an e-mail to a fellow of my own college this afternoon pointing out a minor typo -- the sort of thing caused by what, in the US, we used to call a "brain fart" -- and finishing with "Dude!", which in that context meant "Surely you can't have meant to use this word instead of that word!"

Finally, I'll point out that although the history of dude must be generally similar to that of boy, man and so on, they're quite different in effect at this point, as (what I take to be) the non-equivalence of the following shows:

1. Oh man! Dude! [overheard the other day at Penn]
2. Oh dude! Man!


Posted by Mark Liberman at 04:03 PM

A Churchill story up with which I will no longer put

An old, old story about Winston Churchill (almost certainly misattributed) is retold one more time by Joe Carter at The Evangelical Outpost:

After an overzealous editor attempted to rearrange one of Winston Churchill's sentences to avoid ending it in a preposition, the Prime Minister scribbled a single sentence in reply: "This is the sort of bloody nonsense up with which I will not put."

Joe notes correctly that in The Cambridge Grammar of the English Language (see page 627, footnote 11) it is mentioned that "The ‘rule’ was apparently created ex nihilo in 1672 by the essayist John Dryden." (See the article "Preposition at end" in (Merriam-Webster's Dictionary of English Usage for more discussion). However, there is one thing he doesn't point out, and hardly anybody ever has, except in footnote 12 on page 629 of The Cambridge Grammar, and briefly on Language Log in a post that Mark did a while back: Churchill (or whoever it may have been) was cheating, in two separate ways. I think perhaps the point may bear repeating and elaborating a bit (you don't have to read on if you've already know this stuff).

The strategy was to construct a case in which leaving a preposition at the end of the clause would be decisively the preferred style (for other such cases, see The Cambridge Grammar, pp. 628-630), and then to front the preposition to show the ignorant editor what a stupid rule he was trying to enforce. But the example involves cheating. Twice.

First, the example is one in which the preferred form of the sentence ended in two prepositions, the second with an object and the first without, and he fronted both of them. That's never allowed. So no wonder it sounds ungrammatical. The ungrammaticality shows nothing about whether or not preposition stranding ordinarily sounds ungrammatical.

To see clearly that it is illicit, it is useful to steer round the second point (which I'll come to later), and start with a different case of a sentence ending in a preposition sequence, one that does not involve an idiom or fixed phrase (my invented examples in what follows will be in blue):

The restaurant got a complaint from the people that the woman was staring in at.

To make this not end in a preposition, should you feel for some reason you want to avoid the normal construction, you would simply do this:

The restaurant got a complaint from the people at which the woman was staring in.

That's much more formal, and not at all an improvement (one is almost inclined to put a ‘?’ in front of it to signal lowered acceptability), but it is English. However, you might ask, doesn't it still end with a preposition? Well, yes and no. It ends in a word that is classed as a preposition by The Cambridge Grammar, which takes what I consider the right view. But it's a preposition that does not take an object. For that reason it is irrelevant. In fact the traditional view (which has a somewhat fetishistic attachment to the Latin meaning of pre-) refuses to call it a preposition because it is not before a noun phrase.

All current dictionaries follow the traditional view: they would call in an adverb in a case like She stared in. And in cases of that sort, everyone has always agreed that such words can end a sentence. Otherwise you'd be saying that sentences like I'm afraid Mr Threadcroft is not in, or It's cold, so we should go in, are ungrammatical. That would be even more crazy than banning the cases where a preposition is stranded. (By calling a preposition stranded I mean roughly that it's not followed by its complement because it's in a clause like a relative or an interrogative that permits the complement to be at the beginning of the clause, as in the people that the woman was staring at, or to be understood as having an earlier noun phrase as its antecedent, as in the people the woman was staring at That isn't a totally watertight definition of stranding, but it will perhaps do for present purposes).

Now, the key thing, which is independent of the terminological conflict, is this: you certainly can't front one of these prepositions that traditional grammar would call an adverb, in addition to fronting a preposition that has an object:

*The restaurant got a complaint from the people in at which the woman was staring.

Yet that's what Churchill (if it was he) did in the famous up with which I will not put.

But there's another dishonesty in the example. It uses an idiom that doesn't like to be broken up at all by any kind of reordering. When you use the idiomatic verb phrase put up with X, you have to keep the sequence put up with as is. Almost nobody, however formal, thinks that it would be a style improvement to take this interrogative sentence

How many interruptions am I supposed to put up with?

and re-phrase it this way:

??With how many interruptions am I supposed to put up?

It's decidedly awkward, possibly even ungrammatical.

So in the first place, up with which I will not put illicitly preposes not one but two prepositions (the second one being a preposition that under traditional analyses of his time would have been called an adverb), and that's never permissible. And in the second place, it does it to an idiom which resists preposition fronting anyway, so even fronting just the with would have sounded bad.

The mythical rule about preposition stranding being a grammatical fault is indeed nonsense, and it's not something you should put up with. But the tricky little piece of cheating attributed to Churchill does not show that.

Posted by Geoffrey K. Pullum at 03:06 PM


Mike Crissey has a story today on the AP wire about Scott Kiesling's recent paper Dude, published in the fall issue of American Speech (supplementary materials here). The AP also provides a sort of quick dude reference guide. The cultural context is set by this cartoon

Kiesling observes that

Older adults, baffled by the new forms of language that regularly appear in youth cultures, frequently characterize young people’s language as “inarticulate” ... For American teenagers, these examples usually include the discourse marker like, rising final intonation on declaratives, and the address term dude, which is cited as an example of the inarticulateness of young men in particular.

He feels that the comic strip exhibits a stereotype that "views the use of dude as unconstrained – a sign of inexpressiveness in which one word is used for any and all utterances". Instead, Kiesling argues that

Dude is developing into a discourse marker that need not identify an addressee, but more generally encodes the speaker’s stance to his or her current addressee(s). The term is used mainly in situations in which a speaker takes a stance of solidarity or camaraderie, but crucially in a nonchalant, not-too-enthusiastic manner. Dude indexes a stance of effortlessness (or laziness, depending on the perspective of the hearer), largely because of its origins in the “surfer” and “druggie” subcultures in which such stances are valued. The reason young men use this term is precisely that dude indexes this stance of cool solidarity. Such a stance is especially valuable for young men as they navigate cultural Discourses of young masculinity, which simultaneously demand masculine solidarity, strict heterosexuality, and non-conformity.

He draws these conclusions not only from his own intuitions as a native of dudespeak, but also from the Dude Corpus, which presents 520 real-life examples in the form of an Excel spreadsheet, and the Dude Survey, also available as a spreadsheet.

A few examples from the corpus:

45. Candy hearts are for third grade, dude.
53. Discrete structures sucks, dude.
61. [responding to a woman who has been "telling a story about a guy trying to hit on her"]: Dude (with a tone of disbelief and disgust).
139. Dude, bandwidth is better than sex.
371. It's like, dude, I can't even take care of myself let alone some squirmy thing.

As Kiesling points out, dude is also now used by and to young women -- two of the examples quoted above were spoken by women, and in three of them, the addressee was female.

It looks to me like the sequence described in the cartoon is consistent with Kiesling's gloss for dude, though I can't quite make out what is happening in the second panel.

Kiesling also suggests class assignments based on extending and analyzing his corpus and survey.

I grew up in pre-dude days, but the patterns of usage that Kiesling describes seem pretty natural and easy to assimilate to me. What I have more trouble with is the current fashion for Significant Capitalization: what's with capital-D Discourse? Kiesling has "the discourse marker like", but "Such a stance is especially valuable for young men as they navigate cultural Discourses of young masculinity"; he has "This indexicality also explains where dude appears in discourse structure", in contrast to "how cultural Discourses of gender are recreated in interaction with the help of dude".

You might think that he's going back to 18th-century ideas about capitalizing nouns, except that this is the only noun affected. I infer that "Discourse" may be meant to index Michel Foucault or maybe Michael Silverstein, whereas "discourse" would index Wally Chafe. In that case, I guess the thing about Discourses is that they're not actually discourses. But I'm out of my cultural depth here.

[AP story via email from Fernando Pereira]

[Update: Jonathan Mayhew pokes a little gentle fun by linking to this post with the quote: "I'm like, Dude... let's navigate the Discourses of juvenile masculinity, and he's like, dude, what are [they] talking about?" I'm not quite sure whether the joke is on Keisling or on me, but it's a good one in any case.

[Update #2: several people wrote in to point out that Kiesling explains his capitalization explicitly in a footnote:

"I use the term cultural Discourse in the sense of poststructuralists, following Foucault (1980). Cultural Discourses are similar to ideologies, yet leave open the possibility of contradiction, challenge, and change, and describe more than idea systems, including social practices and structures. For a review of the term, and its relevance to masculinities, see Whitehead (2002). I will always use a capital D with cultural Discourses to distinguish them from the linguistic notion of discourse, or talk-in-interaction."

So I understood him correctly, but read him carelessly. Thanks to Lauren Marie Squires and and Tom Ace for the tip. ]


Posted by Mark Liberman at 10:18 AM

Prairie dog talk

Tenser, said the Tensor discusses an AP story by Tania Soussan about Con Slobodchikoff's research on prairie dog communication. TstT mentions that Slobodchikoff's web page doesn't have online versions of the relevant research, so I thought I'd try out Google Scholar and Elsevier's Scirus. Since Slobodchikoff is an unusual name, it should make a good probe.

Scirus first. It turns up 317 total results, of which 64 are journal articles. There are several interesting things on the first couple of pages, some of which are relevant:

J. Placer and C. N. Slobodchikoff, "A method for identifying sounds used in the classification of alarm calls", Behavioural Processes, Volume 67, Issue 1 , 30 July 2004, Pages 87-98.
R. K. Bangert and C. N. Slobodchikoff, "Prairie dog engineering indirectly affects beetle movement behavior", Journal of Arid Environments, Volume 56, Issue 1 , January 2004, Pages 83-94.
Judith Kiriazis and Con N. Slobodchikoff , "Anthropocentrism and the Study of Animal Language". Chapter 26 in Robert Mitchell and Nicholas Thompson, Eds., Anthropomorphism, Anecdotes, and Animals, SUNY Press, 1996. [only the blurb is online]
Robert Cook's listing of links to Courses in Animal Cognition, Learning, and Behavior
, which includes a dead link to Con Slobodchikoff's course entitled "Behavior of Animals".
Bianca S. Perla and C. N. Slobodchikoff, "Habitat structure and alarm call dialects in Gunnison's prairie dog (Cynomys gunnisoni)", Behavioral Ecology Vol. 13 No. 6: 844-850.

Now Google Scholar. The same probe ("Slobodchikoff") yields 438 results. Because of publisher restrictions (especially by Elsevier!), many of the papers that it finds are not given active links. Partly making up for this, GS offers citation information. For example, one of the first papers in the list is

CN Slobodchikoff, J Kiriazis, C Fischer, E Creef, "Semantic information distinguishing individual predators in the alarm calls of Gunnison’s prairie …", Anim. Behav, 1991 [cited by 24]

and following the "cited by 24" link yields

Steven H. Ackers & C. N. Slobodchikoff, "Communication of Stimulus Size and Shape in Alarm Calls of Gunnison's Prairie Dogs, Cynomys gunnisoni", Volume 105 Issue 2 Page 149 - February 1999.

which does have an active link to the abstract (and text) because it's published by Blackwell. When it's possible to navigate around in a dense network of "cited by" links, it can be a very good way to see what's happening in a new subdiscipline. Unfortunately, many of the interesting nodes in this case are dead links, e.g.

DT Blumstein, KB Armitage, "Alarm calling in yellow-bellied marmots: I. The meaning of situationally variable alarm calls", Anim. Behav., 1997, 53, 143–171.

Many of the things found by Google Scholar are [citation]-type items, which seems to mean a "hit" which has not been scanned itself, but whose existence is inferred from a reference in a scanned work.

Of course, all of the Slobodchikoff papers found in both searches are listed on his web page, without hyperlinks but with references that could be used to track down the papers using more troublesome techniques, either online or in the library stacks. There are two kinds of value added: convenience in finding the links (more of which are available in this case through Scirus), and information about citations offered Google Scholar's "cited by" feature -- though you could also use ISI's citation index -- if you have a subscription, or your institution does.

Among other search tools, regular old Google turns up Slobodchikoff's home page, of course, but also page 1 (!) of

C. N. Slobodchikoff, "Cognition and communication in prairie dogs." In Marc Bekoff, Colin Allen and Gordon M. Burhgardt, eds., The Cognitive Animal: Empirical and Theoretical Perspectives on Animal Cognition. The MIT Press, 2002.

Technorati turns up some 17 weblog posts, as of early this morning, all of which seem to be references to the AP article. I was disappointed, hoping to find that someone like Cosma Shalizi had researched Slobodnikoff's work. Though of course if he had, the Google and perhaps Scirus searches would already have told me.

CiteSeer doesn't turn up anything, in this case, which was also disappointing.

I got very good overall results from PsychInfo, in the version which is a specialization of the Cambridge Scientific Abstracts service. I accessed it through the Penn library site. It offers a variety of ways to get at the texts, including library call numbers and interlibrary loan where needed, and also citation information in some cases. It finds nine of Slobodchikoff's most relevant articles and book chapters.

Finally, I was disappointed to find that the MLA International bibliography, which is usually quite helpful, found nothing at all with "Slobodchikoff" as a probe. I guess Prairie Dog is not considered a Modern Language. That's fair enough, since I imagine that it hasn't changed much since the towers of Ilium were topped.

All in all, it's nice to be able to do so much so quickly -- the searching reported here took less than 15 minutes -- and all without leaving home. And a lot of the material can now also be found on line, though at least half of it still requires real-life activity (visiting libraries or buying books) to access.

The content of Slobodchikoff's papers is course interesting in itself. From what I've read, it looks very much like the pattern familiar from Seyfarth and Cheney's classic work on vervet alarm calls, with additional results on the encoding of more abstract size and shape information in call variation, and especially a focus on the use of "variation in the internal structure of a vocalization to define possible information structures", as Placer and Slobodchikoff put it in their 2004 paper. The work is definitely worth further study, and I'll have more to say about it at some point in the future.


Posted by Mark Liberman at 07:21 AM

Dander: froth or scurf?

Francis Heaney notes another popular eggcorn: get one's gander up, which has 1,070 Google hits, compared to 30,100 for "dander up". Some of the gander examples are witting wordplay -- suburbanites upset about goose droppings, and the like -- but most of them seem to come from people who've misunderstood the idiom, probably because their image of hot anger is better matched by a cranky goose than by cat scurf.

The OED gives angry dander its own lemma: dander, n.4, glossed as "Ruffled or angry temper; in phr. to get one's dander up, etc." It's said to be "colloq. (orig. U.S.) and dial." The etymology is "[Conjectured by some to be a fig. use of DANDER3, dandruff, scurf; but possibly fig. of DANDER2, ferment.]"

The fermentation-version of dander is apparently a variant of dunder, and is given only one citation:

?c1796 SIR J. DALRYMPLE Observ. Yeast-cake 1 The season for working molasses lasts five months, of which three weeks are lost in making up the dander, that is, the ferment.

Dunder in turn is glossed as "The lees or dregs of cane-juice, used in the West Indies in the fermentation of rum", with the etymology "[Corrupted from Sp. redundar to overflow.]"

The American Heritage Dictionary agrees, explaining angry dander as

Perhaps alteration of dunder, fermented cane juice used in rum-making, fermentation, possibly alteration of Spanish redundar, to overflow, from Latin redundāre. See redundant.

So maybe the original image for someone getting his dander up is the frothy overflowing ferment of yeasty cane-juice. This would be the same metaphor as in the OED's figurative gloss for ferment as "3. agitation, excitement, tumult":

1681 DRYDEN Abs. & Achit. 140 Several Factions from this first Ferment, Work up to Foam, and threat the Government.

The picture shows fermenting grape juice, not cane juice, but the idea is the same. It doesn't look like cats are in the picture at all, however much they may shed when stressed.


Posted by Mark Liberman at 12:19 AM

December 07, 2004

A fine, or imprisonment... and both

The New York Times reports an instance in which a criminal case was overturned on appeal because of a single word that was changed in a statute: In between consideration by a Congress conference committee and the preparation of the bill for signing by President Clinton, an or was changed to an and in a statute regarding the sentencing of people guilty of distributing child pornography, the result being the following surprising wording:

"Any individual who violates ... this section, shall be fined under this title or imprisoned not less than 10 years nor more than 20 years, and both."

The question at issue was whether Jorge L. Pabon-Cruz, a young man of 18 with no criminal record, should be imprisoned for ten years for chat-room pornography distribution. The defense said that the jury should have been told what the sentencing consequence of their guilty verdict would be. But the Second Circuit decided on its own initiative to ask for a briefing on the question of whether the statute Clinton signed was even coherent.

The position taken by the defense was that, read literally, it made no sense: obviously Congress meant "or both", the "and both" being just a slip; hence it would have been legal to sentence the young man to either a fine or a term of imprisonment. The government, amazingly, defended the language of the final statute as signed, and maintained that even though it was ungrammatical, its intent was clear: that the punishment was to be both a fine and a term of imprisonment, with no judicial discretion. The United States Court of Appeals for the Second Circuit disagreed with the government. Linguistically, the court had it right.

The court reasoned:

As a grammatical matter, one cannot choose between "A, or B, and both." Rather, it seems obvious that Congress intended the provision to mean either "A, or B, or both," or "A and B."

And that is entirely correct. A coordinate construction of the form "A, or B, and both" is neither grammatical nor clearly interpretable. Imagine a restaurant offering a choice of "asparagus or beetroot and both". It's not even clear what your choices are. The phrase has no interpretation at all.

Of course, an and-coordination of the form can have a first part that is itself an or-coordination of the form "A or B": you could be offered asparagus or beetroot (you can't have both) and cauliflower. But there, C is a third thing that can co-occur with either of the first two. "(1) A or B, and (2) C" is coherent. What is not is what we have here: "(1) A or B, and (2) A and B". You can't make that interpretation relevant in the case at hand, and the court did not even consider it. It would have meant that judges had to sentence people to either a fine or a term of imprisonment, and both a fine and a term of imprisonment. But in that case the part after the and makes the part before it redundant.

Strangely, the prosecution actually agreed that it was "simply illogical" and "essentially [a] scriveners error." That was quite a concession, given that they wanted the intent of the erroneous scrivener to prevail in interpretation.

The court looked around for other textual evidence, and found it was only sinking deeper into the morass. For example,

Confusingly, the Senate Judiciary Committee Report on the Child Pornography Prevention Act employs the "and both" language when it sets forth the terms of the bill, S. Rep. No. 104-358, at 4 (1996), and the "or both" language in its analysis of the bills provisions. Id. at 23.

It explored the Congressional history a bit, and looked at some precedents for what courts have done when faced with laws that seemed eccentric (like a racketeering law that proposed either a fine or life imprisonment or death, but nothing in the middle like a few years of prison), and eventually concluded (you can read the decision here, or view it as HTML here) that the sentencing should be done again.

Perhaps the most worrying thing is that both Congress and the President were either asleep or uninformed about the content of what they were approving, despite the grave implications for the lives of people sentenced under the law they were putting on the books. The Second Circuit apparently figured that if Congress is going to require that a young man with no criminal past living with his mentally impaired mother should be put in the penitentiary for ten years if he sends dirty pictures to people on the Internet who ask for them, he at least has the right to be sentenced under a law of which the grammar and meaning are clearly understood by the people doing the sentencing.

Posted by Geoffrey K. Pullum at 06:16 PM

Second opinions on "hobbits"

There seems to be increasing controversy about whether the skeleton(s) of small hominins found on the Indonesian island of Flores really represent a new species. The Athena Review sums up some of the issues as discussed over the past month. Science (subscribers only, alas) published a review of some skeptical positions by Michael Balter ("Skeptics Question Whether Flores Hominid Is a New Species", Science, Vol 306, Issue 5699, p. 1116).

Balter writes that

Now a small but vocal group of scientists argues that the skeleton dubbed Homo floresiensis is actually a modern human afflicted with microcephaly, a deformity characterized by a very small brain and head. Meanwhile, an Indonesian scientist who also challenges the skeleton's status has removed the skull to his own lab for study. But members of the original team of Australian and Indonesian scientists staunchly defend their analysis, and outside experts familiar with the discovery are unmoved by the critique.

The main challenge comes from paleopathologist Maciej Henneberg of the University of Adelaide in Australia and anthropologist Alan Thorne of the Australian National University in Canberra. Neither has seen the specimen itself, and as Science went to press, they had yet to publish their criticisms in a peer-reviewed journal. But Henneberg published a letter in the 31 October Adelaide Sunday Mail arguing that the skull of the Flores hominid is very similar to a 4000-year-old microcephalic modern human skull found on the island of Crete. And at a press conference on 5 November, Indonesian paleoanthropologist Teuku Jacob of Gadjah Mada University in Jakarta claimed that the specimen was a diminutive modern human.

One of the crucial issues is that there is only one reasonably complete skeleton, which makes it harder to argue against the view that it is a single deformed individual. According to the Athena Review,

Besides the female skeleton called LB1 ... only two other human fossils were reported in the Nature article proposing the new species. These include a single premolar , and  a forearm bone (radius) found deeper down in Liang Bua cave. The forearm, based on its dimensions was claimed by Brown et al. (2004) as further proof that a local population existed of  diminutive Homo floresiensis. But, as Henneberg also points out, the reported length of the radius of  210 mm actually corresponds to human stature of 151-162 cm, within the range of many modern women and some men. 

For those of us who are metrically challenged, that's between 59.5 and 63.8 inches, or between 4' 11.5" and 5' 3.8".

I should also mention that the originally-reported quantitative data from the teeth of the Flores finds seem very consistent with the view that they are sapiens remains. According to the legend in the Brown et al. article, these are "Mean buccolingual tooth crown breadths for mandibular teeth in A. afarensis (filled circles), A. africanus (open circles), early Homo sp. (open squares), modern H. sapiens (filled squares), LB1 (filled stars) and LB2 (open stars)":

In my opinion, this is all a bit of a black eye for physical anthropology. If paleontologists (and Science magazine!) can't agree about such a basic question, something is wrong. I recognize that science progresses by controversy, and I suppose that the field will reach a consensus eventually, but (despite some previous instruction by the likes of Alan Mann) I remain surprised by how subjective this stuff seems to be. Of course, this might be a case like the Forster and Toth paper on IE phylogeny, where an apparent controversy in fact masks a genuine consensus among competent people that I'm too ill-informed to see.

At least this underlines how desperately speculative the reported "expert" discussion of this putative species' linguistic abilities was. On the other hand, the villagers of Boawae on Flores claim that if the right rituals are followed, perhaps all the uncertainty will be cleared up.

Meanwhile there seems to be a bit of a squabble over physical possession of the bones. There hasn't been much M(ain)S(tream)M(edia) coverage of the doubts about whether the find represents a new species or not, except in the context of a few stories about this argument over custody of the evidence.


Posted by Mark Liberman at 11:50 AM

Another comedian for singular their

According a 12/6/2004 BBC News article, Rowan Atkinson "has launched a comedians' campaign against a government bill to outlaw inciting religious hatred", arguing that "parts of the Serious Organised Crime and Police Bill are 'wholly inappropriate' and could stifle freedom of speech".

Mr Atkinson told a meeting at the House of Commons on Monday night there are "quite a few sketches" he has performed which would come into conflict with the proposed law.

He added: "To criticise a person for their race is manifestly irrational and ridiculous but to criticise their religion, that is a right. That is a freedom.

"The freedom to criticise ideas, any ideas - even if they are sincerely held beliefs - is one of the fundamental freedoms of society.

"A law which attempts to say you can criticise and ridicule ideas as long as they are not religious ideas is a very peculiar law indeed." [emphasis added]

Mr. Atkinson's argument strikes me as an eminently sensible one. However, the language-related point is his straightforward and unselfconscious use of singular their, in a well-written and serious speech delivered in a formal setting. [via email from Stefano Taschini]

[Some earlier posts on this topic are here and here. ]


Posted by Mark Liberman at 07:29 AM

Baffling award of the year, again

What's wrong with these people? Last year, the Plain English Campaign gave its Foot in Mouth award to Donald Rumsfeld, for a remark that was sensible and even eloquent. This year, the award went to Boris Johnson, a British Member of Parliament, for a comment made on December 12, 2003: "I could not fail to disagree with you less". This is a stock phrase that I first heard when I was twelve or so.

A quick check on Google informs me that it has been Chris Cane's mother's favorite "show stopper" for the past half century or more. In the form "I couldn't fail to disagree with you less", it's listed in an online collection of amusing sayings, along with "a closed mouth gathers no feet" and "I refuse to engage in a battle of wits with an unarmed person".

"I could not fail to disagree with you less" is a unoriginal and slightly childish play on the problems of overnegation. As a choice for the most "truly baffling comment" of the year, it's pathetic.

Posted by Mark Liberman at 02:17 AM

December 06, 2004

Lexemes and word forms

Language Log readers who are sharp of eye and typographically on the ball — the sort of readers who can tell one font from another, and thus tend to refer to Dan Rather's embarrassing Microsoft Word-processed Texas Air National Guard memos as "forged" rather than "of disputed authenticity" &mdash will have noticed that I sometimes cite words that I mention in a post by putting them in italics (like this), but then sometimes I put them in bold italics (like this). I should have explained this notational convention long ago. I did actually touch on it accidentally in another context once (in this post), but you could be forgiven for having overlooked it, since that post was primarily about trademark law. But anyway, my usage does not display random variation in font style selection. There is a semantics to it. The needed explanation follows.

Let's look at a typical case, from my post "Ray Charles, America, and the subjunctive":

. . . when you hear crown you have your crucial piece of evidence. The preterite of crown is crowned, so the line And crown thy good with brotherhood cannot be a preterite.

Why the first occurrence of "crown" in italics, the second in bold italics, and then "crowned" in italics again? The answer is that the font style distinction is systematically used to reflect a conceptual distinction: word forms are being distinguished from lexemes.

A lexeme is a word in roughly the sense that would correspond to a dictionary entry. Lexeme names are given in bold italics. The point about "crown", for example, is that as a transitive verb it would get one entry despite the existence of four different shapes in which it appears: crown, crowns, crowned, crowning. These different shapes spell out word forms that belong to the verb lexeme crown. In a big and detailed dictionary they would all be listed in the single entry for crown. (In shorter dictionaries you would just be expected to know that the word forms for a regular verb like crown would be crown, crowns, crowned, crowning, the word forms for a regular verb like walk would be walk, walks, walked, walking, and so on: they list the lexemes, you are meant to know the grammar.

There would be another lexeme in the dictionary for "crown", of course: a noun lexeme crown. Its word forms would be the plain singular crown, the plain plural crowns, the genitive singular crown's, and the genitive plural crowns'.

This notational convention emerged first in the work of Rodney Huddleston and is used systematically throughout The Cambridge Grammar. Occasionally it is suppressed when drawing the distinction would distract rather than clarify. So, for example, in my "Those who take the adjectives from the table" I say this:

How could "one of the few points on which the sages of writing agree" possibly be that "it is good to avoid them" when to utter the very thought you need the adjective good? How could William Zinsser possibly be serious in saying that most adjectives are "unnecessary" when he couldn't finish his sentence without the adjective unnecessary?

Here I actually mean the adjective lexeme good, which has the word forms good, better, and best. Using better would count as using the adjective good, though in its comparative inflectional form. But I the next adjective mentioned was unnecessary, which does not inflect for comparison: there is no *unnecessarier or *unnecessariest. I thought it would look distractingly odd to put just good in bold italics. I therefore didn't add that pedantic detail. Nothing about its inflectional forms was relevant to what I was saying. But in general, whenever there could be confusion about whether I meant a word form or a lexeme, I will use the distinction in font styles, and always in the same way.

For words that have only one shape the distinction between lexemes and word forms makes no sense (for a language that truly has no inflection at all, one wouldn't draw the distinction), so the minimum number of word forms for a lexeme would be two. That minimum is represented in English by verbs such as must and ought, which are modal verb with no preterite (inflected past tense). The shapes of the two word forms of must are must (present tense neutral) and mustn't (present tense negative).

Which English lexeme holds the record for most word forms? The answer is be. The absolute minimum number of separate word forms it has (assuming no distinct word forms that have the same shape, but counting the informal-style negative variants as word forms) is 12: am, are, aren't, been, be, being, is, isn't, was, wasn't, were, weren't.

In some languages (Sanskrit, for example) the number of word forms for a verb lexeme is in the high hundreds, and for some others (Turkish, for example) it is certainly in the thousands.

Posted by Geoffrey K. Pullum at 07:42 PM


From a 12/5/2004 Vic Carucci column at

"I don't think any defense can stop us," Westbrook said. "The thing that is going to stop us is ourselves."

If that's what Brian Westbrook actually said, I hope (as an Eagles fan) that it's not what he meant. It would have been OK if he had said "the only thing that can stop us is ourselves", or something like that, but the remark as quoted seems to predict self-inflicted failure.

Then again, the way Ben Roethlisberger has been hitting the iceberg lately, maybe remarks like this are a good omen.

Posted by Mark Liberman at 06:23 PM

Whose knees?

A 12/5/2004 item on the English-language Maidan site reads:

Legal action must immediately be brought against ORT, a Russian TV channel.

Eye witnesses to Viktor Yushchenko’s speech at Maidan have reported that the Russian ORT TV network has just broadcast video footage of the speech edited to have Yushchenko say that he will "bring the miners to their knees."

The witnesses unanimously confirm that Yushchenko said, in fact, that he is ready to kneel before the miners.

Such actions as undertaken by ORT, considered a "public" broadcaster and therefore not a private one, exceed all known limits of permissible international communications rules, and decency.

I was not able to find the corresponding item on the Ukrainian site. I'm curious about what the (alleged) linguistic trick was here: are the two Ukrainian expressions for "kneel before the miners" and "bring the miners to their knees" close enough that simple waveform editing turns one into the other? Or was some more complicated form of audio processing required? Or is something altogether different going on (like an unsubstantiated rumor about what ORT broadcast, or what Yushchenko said)?

There have been a number of recent scandals about news photos being photoshopped to convey a different message from what the participants intended. And of course the idea of non-persons being airbrushed out of photographs from Stalinist times is well known. And there are a number of standard journalistic tricks for getting interview subjects to say things that can be quoted out of context to give a misleading impression. But I think this is the first time that I've ever heard it alleged that a major news outlet actually falsified the words of a political figure by transforming an audio recording, not just by taking a clip out of context. It's perfectly possible to do this, and I'm sure that it's been done in other cases, at least as a joke and probably sometimes for fraudulent purposes; the question is whether a major Russian news outlet has done it to a public speech by Viktor Yushchenko, in an attempt to stir up sentiment against him in the eastern oblasts. So I'm interested to get some more detailed documentation. Write me if you know something more about this.

[For those who haven't been following events in Ukraine, some context is here, here and here. A quotation from the last source:

At the state-owned Oktyabrskaya (October) coalmine on the edge of Donetsk city 1,500 miners labour in filthy, freezing and often dangerous conditions for 1,200 hryvna per month -- about £160. As prime minister in 1999-2000, Yushchenko closed several mines as part of a restructuring policy and, unforgivably in the minds of the people, failed to pay salaries and pensions for as long as six months at a stretch.

Yuri Mikhailovich, a manager at the mine, said, "Yushchenko's 'restructuring' brought the mining industry and many towns to the verge of extinction. There were hunger riots.Yanukovich reversed these criminal policies. Now the workers are paid promptly, they received a 50 per cent pay increase, and we have new British and German machinery." This alone goes some way to explaining why exhausted miners are willing to stand in the falling snow on Lenin square after a day underground to show their support for Yanukovich.

So you can see why it makes a difference, in the current political context, just whose knees Yushchenko was talking about. ]

[Update: the corresponding Ukrainian item is here -- I guess I was in too much of a hurry to see it before, right there in plain sight with almost the same time stamp, and a headline that even I can read by reference to my almost-evaporated college Russian. The phrases corresponding to "bring the miners to their knees" (a direct quote whose word-by-word gloss seems to be "put miners on knees") and "kneel before the miners" (an indirect quote whose word-by-word-gloss seems to be "ready to stand/become on knees before miners") don't offer any obvious enlightenment as to how the audio editing could have been done any more simply in Ukrainian than in English. ]

[By the way, this Ukrainian dictionary site is quite helpful.]

[Update #2: ironically, the most famous recent Ukrainian complaint about allegedly doctored recordings came from the other side. A bit more than four years ago, the journalist Heorhiy Gongadze was abducted, tortured and murdered. Mykola Melnychenko, one of President Leonid Kuchma's former bodyguards, came forward with tape recordings of Kuchma (allegedly) telling associates to "throw Gongadze to the Chechens". Apparently "the government eventually acknowledged that it was Mr Kuchma's voice on the tapes. It insists, however, that the recordings were doctored in such a way as to put words into the president's mouth."

Melnychenko also released a recorded on which "a voice similar to Mr Kuchma's gives a green light to the sale to Saddam Hussein of Ukrainian radar capable of detecting Stealth planes". This accusation raised quite a bit of interest, since it was during the run-up to the invasion of Iraq, and experts such as Peter French, chair of the International Association for Forensic Phonetics and Acoustics, were involved in analyzing the Melnychenko recordings.]


Posted by Mark Liberman at 12:22 PM

Overpermissive quotatives: grammar change or thesaurusizing?

A few days ago, Geoff Pullum took Dan Brown to task for writing

"Terrorism," the professor had lectured, "has a singular goal."

Geoff pointed out that this not only violates Elmore Leonard's third rule of writing, it's flat-out ungrammatical in standard English. But Dan Brown is not alone. From Phil Sheridan's column in today's Philadelphia inquirer:

"We caught them on the wrong day," Reese understated.

As Geoff observed, quotative tags like "Kim said" normally require a verb that can take a direct quotation complement:

Reese said: "We caught them on the wrong day".

is fine, but

*Reese understated: "We caught them on the wrong day".

doesn't work. At least it doesn't work for me, and dictionaries like the OED and the AHD don't give any similar uses for the verb understate.

However, such (mis)uses in quotative tags are fairly common. Searching Google for "he understated", I find that most of the examples are of the kind countenanced by the dictionaries and by my own intuitions:

In three-year period, he understated sales by $600000 on tax returns ...
Applicant falsified a National Agency Questionnaire ... when he understated the full extent of his drug use ...
Perhaps he understated his doubt when he talked about putting you in touch with the Shadowed Ones organization.

However, among the first 30 returns, I found 4 quotative tags:

(link) But once he got going, "I know the turns pretty well," he understated, Carpentier started passing everyone, sometimes a laughable six cars in one turn.
(link) "The idea that we could connect all our networks together and create one global super carrier is dead, buried, and rubbish -- it’s completely impossible," he understated.
(link) "It's a cartoonist's dream come true," he understated.
(link) James was quite frank about it, "We were trying at the time" he understated, but the car flew over a crest and just failed to keep all four wheels on the tar when it landed.

This is a high enough rate to give me pause. What's going on?

It's possible that these writers have a different syntactic frame for the verb understate. We do find some examples like these:

Not wanting to say the par-72 layout owed him one, he understated that there has been chances and had never been able to pull it off.
37 of the 139 patients (27%) were ineligible to participate as they did not meet the criteria of the test protocol (he understated that this percentage is "atypical").
She understated that with Ronie is almost over,but every time they meet ,it ends in making love because"between us there is much chemistry".

But these examples seem to be rarer than the quotative tag uses of understate (though I haven't tried to count either category), and most of them seem to be written by people who have significant problems with English, well beyond whatever is going on with the professional writers who produced the "understated" tags.

It is also possible that some people have a more permissive idea about what verbs to use in quotative tags -- not just verbs that normally allow direct speech complements (like say, explain, insist, whisper, shout, etc.), but also other verbs of information-transmission, like lecture and understate. Unfortunately, it's hard to distinguish this from a less naturalistic explanation: thesaurusizing.

That's a word I just made up -- though as usual in such cases, I learn from Google that several people have been there before me:

(link) Don’t try to impress someone by thesaurusizing your email with terms you wouldn’t use in person– it sounds diaphanous, limpid, and transpicuous.

(Robert Merton once wrote that "Anticipatory plagiarism occurs when someone steals your original idea and publishes it a hundred years before you were born". One of the less widely recognized consequences of internet search is a significant increase in the rate of (a shorter-term variety of) anticipatory plagiarism.)

Anyhow, I first learned about thesaurusizing from a friend in junior high. He would write a composition assignment in his normal way, and then use a thesaurus to replace a few words or phrases in every paragraph with fancier "equivalents". Since he usually didn't know the meaning of the substitutions, or at least didn't think carefully about the consequences of sticking them in, the results were often incoherent as well as pretentious.

I strongly suspect that a variant of this practice remains common among some published writers. And this could explain how writers like Dan Brown and Phil Sheridan come to misuse verbs of information-transmission in quotative tags. They want to spice up their writing, by conscious and systematic violation of Elmore Leonard's third rule. This often puts them on the spot to think up yet another appropriately nuanced "synonym" for one of the common and natural quotative verbs, like say, insist, explain. They might refer to a thesaurus, or they might just rely on their own sense of word associations. When there's another communication word in the context, it's natural for them to thesaurusize in that direction -- the professor is lecturing his audience, or the linebacker is understating the extent of a mismatch, so why not tag the quote with "the professor lectured" or "Reese understated"?

If a word is inserted while editing rather than while composing, its lameness in context is easier to miss. This might be because the writer never re-reads the result, or it might be because the grammatical soundness of the original masks the problems with the revised version. People who have worked on speech synthesis or speech coding know from painful experience that it's a bad idea to listen to a higher-quality version of a phrase before listening to a lower-quality version -- the low-quality version always sounds much better than it should in that context, to the point where a phrase that is unintelligible when presented in isolation can sound pretty good after the listener has been primed.

For now, I'm satisfied with this explanation. But there's a problem in the other direction that still puzzles me -- there are many increasingly-common ways of introducing direct quotation that have no counterparts as quotative tags. As far as I can tell, no one is tempted to write:

"Oh my God", she was like.
"Whoa", he was all.
"Wow, dude", we were all like.

Why? Maybe John Rickford's project at Stanford has the answers?

[Update: Ray Girvan points out that said-substitution (at least when grammatical) is called "said" bookism in the Turkey City Lexicon. ]

[Update #2: Jonathan Lundell offered an alternative explanation by email:

I read '"We caught them on the wrong day," Reese understated.' as a modest essay at humor, modestly successful. Sheridan is committing a sports column here, after all, not reportage.

Maybe so. Obviously the joke, if there was one, went right over my head. ]

[Update #3: Eli Bishop sent in a case of on-purpose "said" bookisms:

The funniest short story by the British horror writer Ramsey Campbell, "Next Time You'll Know Me", manages to use said-bookisms ("'Take no notice of them,' my mother countermanded") exclusively for 13 pages straight - "say" is used in a quotative sense only once, in the first paragraph. The narrator is a homicidally insane writer who has the psychic ability to know the plots of other people's novels before they're published, but has still failed to achieve literary success for reasons that are inexplicable to him but obvious to the reader. Campbell writes that when the story was being anthologized, a humorless copyeditor tried to do him a favor by changing the verbs to "said" throughout.



Posted by Mark Liberman at 10:03 AM

December 05, 2004

A "boxing day election" -- or not?

Some British media have written about the 12/26 Ukrainian election re-run as taking place on boxing day. For example, The Scotsman's headline was "Ukraine Supreme Court Orders New Boxing Day Election". I knew that "boxing day" is December 26 -- or I thought I knew that -- but I didn't know the origin of the phrase. Though I'd never given it much thought, I guess I connected it instinctively with fisticuffs, dimly imagining some traditional festival of the manly art.

Wrong. The Oxford English Dictionary glosses box, v. as

1.b. To give a Christmas-box (colloq.); whence boxing-day.

and Boxing-day, in turn, is glossed as

The first week-day after Christmas-day, observed as a holiday on which post-men, errand-boys, and servants of various kinds expect to receive a Christmas-box.

The American Heritage Dictionary agrees:

The first weekday after Christmas, celebrated as a holiday in parts of the British Commonwealth, when Christmas gifts are traditionally given to service workers.

But this year, December 26 falls on a Sunday. So is boxing day really December 27 this year? Or is the OED's definition out of date? Inquiring minds want to know...

A bit of googling adds information without entirely clearing this up:

  • The highest-ranked site is a Canadian Heritage page that offers up some other theories: "The term may come from the opening of church poor boxes that day; maybe from the earthenware boxes with which boy apprentices collected money at the doors of their masters' clients."
  • Second is a Snopes page that debunks another false etymology:"boxing day" does not refer to "the need to rid the house of empty boxes the day after Christmas".
  • A page at is third; it gives several theories about the etymology of the term and concludes unhelpfully that "the actual origin of this holiday is debatable and has been debated, one idea being more popular than the other at a given time". With respect to scheduling, it says that "Some places observe Boxing Day on December 26th and some celebrate it on the first weekday following Christmas."
  • Elaine's Boxing Day Page, in fourth place, puts in the first sentence the information that boxing day is "celebrated in Britain, Australia, New Zealand, and Canada", and goes on to claim that "[t]he holiday may date from as early as the Middle Ages".

I'm skeptical of the speculations about ancient origins -- surely the most parsimonious explanation for the fact that there's no trace of the term (or the concept) in U.S. culture would be that it's a 19th-century innovation? And indeed the earliest citations in the OED are from the 1830s:

1833 in A. MATHEWS Mem. C. Mathews (1839) IV. viii. 173 To the completion of his dismay, he arrives in London on boxing-day.
1837 DICKENS Pickw. xxxii. 343 No man ever talked in poetry 'cept a beadle on boxin' day.

It's true that these feel like uses of a long-established word, sure to be understood by the reader. But that would be consistent with an origin at some time during the 50-odd years since 1776.

[Update 12/6/2004: Des von Bladet explains via email that

It doesn't get more official than the Department of Trade and Industry, which gives (in a browser for which they have failed to pessimise) the 27th December as a Bank Holiday in lieu of the 26th:

So I think Boxning Day is still where it should be. (And, impressively, its day in lieu is _followed_ by the day in lieu of Twinkletree herself.)

doesn't guarantee to observe them in that order

OK, is that clear to everyone? This year, Boxing Day is December 27, and Christmas Day is December 28, according to the UK Department of Trade and Industry. ]


Posted by Mark Liberman at 01:31 PM

Semen, green rice and the rate of internet decay

Hanzi Smatter is a blog "[d]edicated to the misuse of Chinese characters (Hanzi or Kanji) in Western culture". For those of us who are ignorant of Chinese characters in all their forms, it's especially nice that characters cited are identified with links to the Unihan database. For example, an entry from December 1 pictures someone who meant to tattoo (jing1) "essence, semen, spirit " on his elbows, but by splitting the character into the two radicals (mi3) "uncooked rice" and (qing1 or jing1) "blue, green, black; young", managed instead to display "green rice".

You can find the same character discussed on the website here, where the semantic part (mǐ , i.e. mi3, mi with third tone) is glossed as "rice" or "kernel", and the phonetic part (qīng, i.e. qing1, quing with first tone) is glossed as "color of lush growth that burns red", "green, blue", or "young".

The site also provides a list of relevant cross-reference links for each character, e.g. here for the original jing1 "essence", but unfortunately most of the links are broken. Of the 16 links given as cross-references for jing1, only 4 worked for me: an animated display of the character being drawn, a corresponding Cantonese entry, the entry in an etymological database, and AltaVista search for the character.

The Foreword to Web Version says that "was created in the fall of 1996". If 25% of links are still active after 8 years, a simple model of internet decay would say that the average rate of link preservation per year is .25^(1/8) = 0.84. That's quite a bit better than the rate of link rot that I generally see when I update the links in my on-line course lecture notes each year (say for the intro linguistics course ling 001), but the links are mostly to big lexicographical reference sites, which are likely to be more stable. I guess it's also likely that the author of the site, Rick Harbaugh, has updated some of the references since 1996 -- the site's copyright notice says 1995-2003 -- which would bring the yearly retention rate back down towards the .6 or so that I'm used to seeing.

All the same, even a link retention rate as high as .84 means that internet cross-references become useless on a time scale that's small compared to the traditional life cycle of scholarship. After 10 years, only 18% of references would still be valid. After 16 years -- that's how long it's been since the 2nd edition of the OED was published in 1989 -- only 6 percent of the links would still work. After 76 years -- the time elapsed since the first edition of the OED in 1928 -- only about 2 links in a million would be valid.

In my opinion, it's past time for the creators of serious content on the web to start using something like the DOI system to establish stable links. This would solve a portion of the problem in a way that doesn't require authors of sites with cross-reference links to run fast just to stay in one place. Some of the dead links at are to content that is still on the web, but can't be accessed at the old URLs because sites have been moved or internally reorganized or both. For example, the links to the the CEDICT and unicode database entries are of this type. There's still another piece of the problem, though -- sometimes content just goes dark, because the provider moves on, in some sense or another of that phrase. We don't evaporate all copies of a book when the author retires or dies, and the Internet Archive offers one model of how to retain web content in a library-like fashion. However, it's far from clear (at least to me) how to integrate one or more systems of stable links with one or more systems of archival storage. I also worry about other problems, for example the trend towards dynamically-composed rather than static content, where effective archiving may require access to a compatible version of time-varying programs of various sorts.

So the optimistic side of things is that a site like Hanzi Smatter is now easy to set up, and can easily display not only photographs but also Chinese characters in any reasonably compliant browsing environment, and can link to marvelously informative external pages on each of the characters discussed. was obviously harder to set up -- for example, the author had to use gifs instead of character codes because browser and OS technology was not reliably able to deal with Chinese characters as text rather than as images. Nevertheless he did it, and also provided an extraordinary range of systematic external as well as within-site links. The worm in the apple: most of the external links are now dead, just a few years later.


Posted by Mark Liberman at 11:58 AM

December 03, 2004

Linux in Marathi

A while back we had some discussion of how important it is to make software available in languages other than the big international languages such as English and French. Mark was a bit skeptical of the need for this in Africa, whereas I saw more of a need. I just came across an interesting comment on Slashdot on the value of making Linux available in Marathi, as Red Hat has recently announced its plans to do. The poster says:

As an example of how useful this would be, I used to be a technical consultant and trainer to the Mumbai Cyber Crime Lab. Most of the officers I trained there speak only Marathi, a language spoken in Maharashtra. Their acceptance of information technology would be much better if the operating system was in their own language.
[Mumbai is what used to be called Bombay.]

Posted by Bill Poser at 04:08 PM

It's like a glimmer on the horizon

It's amazing. After years of exaggerating the snow-vocabulary of arctic peoples, suddenly journalists everywhere are obsessed with the allegedly gaping holes in northland lexicons. Now Graeme Smith in the Globe and Mail of 12/2/2004 is going on about light on the horizon at night:

Inuit have no word for twilight. For centuries, people felt no need for such a word in the Arctic communities where the sun stays in the sky all summer and disappears below the horizon all winter.

New vocabulary became necessary in the past few years, however, as hazy light started lingering on the southern horizon deep into the months when the sky normally would contain nothing but stars.

Residents of Nunavut struggle for words when asked to describe the phenomenon.

"It's like dawn," said Marty Kuluguqtuq, a municipal worker in Grise Fiord. "It's like a glimmer on the horizon."

That sounds like a pretty successful struggle to me. The next time I'm out in a public place after dark, I'll ask some Philadelphia locals what they call the sky-glow on the horizon, and see if they do as well.

[Update 12/4/2004: Ray Girvan at the Apothecary's Drawer weblog points out that the rest of the reporting in Smith's article is even more confused than the linguistics is. First, as Ray put it in email, "Within inhabited regions, ... arctic winter night *is* only a twilight by nautical and astronomical definitions." Second, the effect under discussion ("Extremely High Horizon Refraction") is nothing at all like twilight, as experienced at whatever latitudes. Finally, the effect has always been around, but may have become more common recently because of global warming. At least that's the view of Wayne Davidson, the Environment Canada station operator in Resolute Bay, and the focus of Smith's article. Wayne Davidson has other interesting ideas about the effects of climate change on celestial observation. He argues that Stonehenge was a " mirage machine", built at a time when the climate of England was radically different from what it's been like in historical times. He writes:

It was simply colder then or for some unknown reason, the Gulf Stream ceased to flow towards Northern Europe, in this case, UK weather would have turned to be very much sub-arctic in nature. Cold dense air caused extreme refraction effects, particularly near the horizon. At sunset or sunrise, the sun disk transformed itself to several fascinating geometric structures, causing it to briefly look like the lintels on top of the Sarsen pillars. Thus, the very existence of Stonehenge was inspired from the sky, copied by large megaliths in order to capture the magic of a round sun transforming itself into a rectangle. Dedication for permanency, ode to the sun, drove the builders to use large stones.



Posted by Mark Liberman at 03:42 PM

Writing style and dementia

Bill Poser cites a study of Iris Murdoch's last novel which "raises the possibility that changes in a person's vocabulary could be used to diagnose Alzheimer's disease while it is still in its early stages". In this case, the text analyzed was written a year or two before the disease was diagnosed, and four or five years before the author died. Some studies published over the past decade suggest an even more intriguing connection between writing style and Alzheimer's. It's claimed that a simple index of stylistic complexity, measured in short texts written at about age 20, is correlated with low cognitive test scores and neuropathologically-confirmed Alzheimer's disease, 50 to 70 years later.

It's not clear how to assign cause and effect here: maybe a condition that predisposes people to Alzheimer's also affects writing; or maybe certain life-long habits of writing actually tend to ward off the disease. As we'll see, there's an intriguing connection to popular stylistic nostrums. Geoff Pullum will be happy to learn that Strunk and White's stylistic advice may actually rot the brain. Well, at least it's correlated with neurodegenerative pathology, and might even cause it.

In 1986, David Snowdon started a broad epidemiological study among 678 nuns from the School Sisters of Notre Dame. The study looked at many influencing factors and many measures of health, as well as age and cause of death. Along the way, he discovered that some of the nuns, as young women, had written short texts on a consistent topic and in a consistent context:

In September 1930, the leader of the School Sisters of Notre Dame religious congregation in North America requested that each sister write a short sketch of her life and include parentage, interesting and edifying childhood events, schools attended, and influences that led her to the convent.

These biographical sketches were preserved in the order's files, and Snowdon was given access to them. Some of the early results are described in this 1996 abstract:

OBJECTIVE--To determine if linguistic ability in early life is associated with cognitive function and Alzheimer's disease in late life. DESIGN--Two measures of linguistic ability in early life, idea density and grammatical complexity, were derived from autobiographies written at a mean age of 22 years. Approximately 58 years later, the women who wrote these autobiographies participated in an assessment of cognitive function, and those who subsequently died were evaluated neuropathologically. SETTING--Convents in the United States participating in the Nun Study; primarily convents in the Milwaukee, Wis, area. PARTICIPANTS--Cognitive function was investigated in 93 participants who were aged 75 to 95 years at the time of their assessments, and Alzheimer's disease was investigated in the 14 participants who died at 79 to 96 years of age. MAIN OUTCOME MEASURES--Seven neuropsychological tests and neuropathologically confirmed Alzheimer's disease. RESULTS--Low idea density and low grammatical complexity in autobiographies written in early life were associated with low cognitive test scores in late life. Low idea density in early life had stronger and more consistent associations with poor cognitive function than did low grammatical complexity. Among the 14 sisters who died, neuropathologically confirmed Alzheimer's disease was present in all of those with low idea density in early life and in none of those with high idea density. CONCLUSIONS--Low linguistic ability in early life was a strong predictor of poor cognitive function and Alzheimer's disease in late life.

[D. A. Snowdon, S. J. Kemper, J. A. Mortimer, L. H. Greiner, D. R. Wekstein and W. R. Markesbery. Linguistic ability in early life and cognitive function and Alzheimer's disease in late life. JAMA Vol. 275 No. 7, February 21, 1996.]

The measure of "idea density" is footnoted to a 1973 paper influenced by early transformational grammar: Kintsch, W. & J. Keenan. Reading rate and retention as a function of the number of propositions in the base structure of sentences. Cognit. Psychol. 5: 257-274 (1973). It's defined in Snowdon et al. 2000 as follows:

Idea density was defined as the average number of ideas expressed per ten words for the last ten sentences of each autobiography. Ideas corresponded to elementary propositions, typically a verb, adjective, adverb, or prepositional phrase. Complex propositions that stated or inferred causal, temporal, or other relationships between ideas also were counted. Without the linguistic coder's knowledge of the age or cognitive function of each sister during late life, each autobiography was scored for idea density. The following sentence from an autobiography illustrates the method used to compute idea density: "I was born in Eau Claire, Wis., on May 24, 1913 and was baptized in St. James Church." The ideas (propositions) expressed in this sentence were (1) I was born, (2) born in Eau Claire, Wis., (3) born on May 24, 1913, (4) I was baptized, (5) was baptized in church, (6) was baptized in St. James Church, and (7) I was born...and was baptized. There were 18 words or utterances in that sentence. The idea density for that sentence was 3.9 (i.e., 7 ideas divided by 18 words and multiplied by 10, resulting in 3.9 ideas per 10 words).

Note that 3.9 is a very low measure of "idea density", in the context of the study. According to the study's summary table, the mean "idea density" in early life autobiographies for nuns whose autopsied brains "met neuropathologic criteria for Alzheimer's disease" was 4.9 (95% confidence interval 4.6-5.3), while for nuns whose brains were free of Alzheimer's symptoms, the mean "idea density" was 6.1 (95% confidence interval 5.6-6.6).

No further details are given, but my guess is that under this definition, "idea density" will depend strongly on "density of adjectives and other modifiers". Here "idea" means something like "elementary predication", in a certain way of thinking about the meaning of sentences, so every time you add a modifier, you add an "idea". It's going to go something like this:

Buck Mulligan came from the stairhead.
[6 words; 2 "ideas" (B.M. came; B.M. came from the stairhead); "idea density" = 10*2/6 = 3.3]

Plump Buck Mulligan came from the stairhead.
[7 words; 3 "ideas" (as before, plus B.M. is plump); "idea density" = 10*3/7 = 4.3]

Stately, plump Buck Mulligan came from the stairhead.
[8 words; 4 "ideas" (as before, plus B.M. is stately); "idea density" = 10*4/8 = 5]

There are several other obvious ways to increase "idea density", most of them deprecated by Strunk and White. For example, connecting two simple sentences with and adds one word and one "idea", as the metric is defined, and brings the text asymptotically closer to the maximum "idea density" of 10, just as adding an adjective does.

The 2000 Snowdon et al. paper surveys stylistic predictors of neuropathology for a larger number of patients:

Findings from the Nun Study indicate that low linguistic ability in early life has a strong association with dementia and premature death in late life. In the present study, we investigated the relationship of linguistic ability in early life to the neuropathology of Alzheimer's disease and cerebrovascular disease. The analyses were done on a subset of 74 participants in the Nun Study for whom we had handwritten autobiographies completed some time between the ages of 19 and 37 (mean = 23 years). An average of 62 years after writing the autobiographies, when the participants were 78 to 97 years old, they died and their brains were removed for our neuropathologic studies. Linguistic ability in early life was measured by the idea (proposition) density of the autobiographies, i.e., a standard measure of the content of ideas in text samples. Idea density scores from early life had strong inverse correlations with the severity of Alzheimer's disease pathology in the neocortex: Correlations between idea density scores and neurofibrillary tangle counts were -0.59 for the frontal lobe, -0.48 for the temporal lobe, and -0.49 for the parietal lobe (all p values < 0.0001). Idea density scores were unrelated to the severity of atherosclerosis of the major arteries at the base of the brain and to the presence of lacunar and large brain infarcts. Low linguistic ability in early life may reflect suboptimal neurological and cognitive development, which might increase susceptibility to the development of Alzheimer's disease pathology in late life.

Snowdon DA, Greiner LH, Markesbery WR. "Linguistic ability in early life and the neuropathology of Alzheimer's disease and cerebrovascular disease." Ann N Y Acad Sci. 2000 Apr;903:34-8.

There have been some other interesting outcomes from the study of these old biographical sketches:

Handwritten autobiographies from 180 Catholic nuns, composed when participants were a mean age of 22 years, were scored for emotional content and related to survival during ages 75 to 95. A strong inverse association was found between positive emotional content in these writings and risk of mortality in late life (p < .001). As the quartile ranking of positive emotion in early life increased, there was a stepwise decrease in risk of mortality resulting in a 2.5-fold difference between the lowest and highest quartiles. Positive emotional content in early-life autobiographies was strongly associated with longevity 6 decades later. Underlying mechanisms of balanced emotional states are discussed.

Danner DD, Snowdon DA, Friesen WV "Positive emotions in early life and longevity: findings from the nun study." J Pers Soc Psychol. 2001 May;80(5):804-13.

So my advice is: "use adjectives, be happy, avoid dementia."

This may not be a joke. As I said in the beginning, the causal relations are not clear here. There are some pieces of research that point to underlying physiological differences that might contribute to cognitive deficits long before there is any overt neurodegeneration. But there is also research suggesting that "exercise" may affect neurodegenerative disorders, and some of this literature cites Snowdon et al. as evidence:

...exercise can extend somewhat the survival of transgenic mice that express SOD1 enzyme mutations that are also found in some cases of human familial ALS (Kirkinezos et al. 2003). In Parkinson's Disease, there is evidence that physical activity may be beneficial, and that lack of activity may be detrimental (Tillerson et al. 2002), and cognitive activity is thought to reduce the risk of Alzheimer's disease (Snowdon et al. 2000).

Carrasco, Rich, Wang, Cope and Pinter. Activity-Driven Synaptic and Axonal Degeneration in Canine Motor Neuron Disease. J Neurophysiol 92: 1175-1181, 2004.

Posted by Mark Liberman at 08:30 AM

December 02, 2004

Jackson's Dilemma and Alzheimer's

The BBC has a report on an interesting study that has just appeared of changes in the writing of the author Iris Murdoch that appear to be associated with Alzheimer's disease. The study, "The effects of very early Alzheimer's disease on the characteristics of writing by a renowned author" by Peter Garrard, Lisa M. Maloney, John R. Hodges, and Karalyn Patterson, appears in the electronic edition of Brain [doi:10.1093/brain/awh341], for which you'll need a subscription. Here's the abstract:

Iris Murdoch (I.M.) was among the most celebrated British writers of the post-war era. Her final novel [Jackson's Dilemma - WJP], however, received a less than enthusiastic critical response on its publication in 1995. Not long afterwards, I.M. began to show signs of insidious cognitive decline, and received a diagnosis of Alzheimer's disease, which was confirmed histologically after her death in 1999. Anecdotal evidence, as well as the natural history of the condition, would suggest that the changes of Alzheimer's disease were already established in I.M. while she was writing her final work. The end product was unlikely, however, to have been influenced by the compensatory use of dictionaries or thesauri, let alone by later editorial interference.
These facts present a unique opportunity to examine the effects of the early stages of Alzheimer's disease on spontaneous written output from an individual with exceptional expertise in this area. Techniques of automated textual analysis were used to obtain detailed comparisons among three of her novels: her first published work, a work written during the prime of her creative life and the final novel. Whilst there were few disparities at the levels of overall structure and syntax, measures of lexical diversity and the lexical characteristics of these three texts varied markedly and in a consistent fashion. This unique set of findings is discussed in the context of the debate as to whether syntax and semantics decline separately or in parallel in patients with Alzheimer's disease.
The paper raises the possibility that changes in a person's vocabulary could be used to diagnose Alzheimer's disease while it is still in its early stages. Who would have thought that "literary computing" might prove to have a medical use?

Posted by Bill Poser at 07:53 PM

Thank God for film: Dan Brown without the writing

It had to happen: they are going to film The Da Vinci Code. Tom Hanks will play Robert Langdon, Harvard "professor of symbology". As long as they don't use author Dan Brown's hopelessly inept writing in putting together the screenplay, I'm sure the film will be a blockbuster success.

However, I have to say that I think the action of Brown's earlier (and even more badly written) Robert Langdon adventure Angels and Demons would be more suited to making into an action film.

In The Da Vinci Code you'll only get Langdon and a female co-star jumping out of a window of the Louvre onto a moving truck on the road below. In Angels and Demons, you get Langdon jumping out of a helicopter with no parachute, his sedentary academic lifestyle notwithstanding. (Though I should mention that we grammarians are also capable of staggering feats of agility and strength when roused. Just the other day I threw a copy of The Cambridge Grammar of the English Language a full twelve feet. And I nearly hit that rat, too.)

More importantly, in Angels and Demons, you get Langdon racing across Rome trying unsuccessfully to prevent anti-religious terrorists from perpetrating bizarre and ghastly murders of important cardinals at landmark churches (there are always clues to the upcoming murder, but Langdon never manages to decipher them quickly enough), and there's big special effects at the end. In The Da Vinci Code Langdon just rushes around France and England with a cryptographer babe, tracking down coded clues to where a secret society associated with the bloodline of Jesus might or might not have buried something of spiritual significance, and it all gets a bit cerebral and biblical.

I'm still watching the mailbox daily as I wait for delivery of my copy of Secrets of Angels and Demons, a collection of essays about the factual background to Angels and Demons, to which I contributed a piece I called ‘Adverbs and demons’. Working through Brown's wretched prose looking for interesting cases of botched clauses and and other linguistic train wrecks was actually very satisfying. I came up with all sorts of observations that there weren't room for in the essay.

Here's one that didn't make the cut, for example. At one point Langdon is recollecting his quieter life at Harvard, and a seminar on terrorism he once attended, and Dan Brown writes this:

"Terrorism," the professor had lectured, "has a singular goal."

But the verb lecture is, despite its meaning, not a verb of saying, in the sense of taking direct quotation ("direct speech") complements. That is, although lecturing involves saying things, you can't use the verb lecture in what the fiction writers call a dialogue tag. Strings such as these are ungrammatical:

"But this," she lectured, "is not the only reason."
"And thus we have Fermat's Last Theorem as a corollary," lectured Wiles smugly.
Leaning closer to the microphone, the exobiologist lectured solemnly: "We are probably not alone in the cosmos."

I'm so confident that such sentences are ungrammatical that I would be prepared to lecture it to a hostile audience. Dan clearly wanted to avoid using "say" too much in dialogue tags, and looked (perhaps in his thesaurus) for a synonym without checking whether it had the appropriate syntactic properties to be allowed in the relevant context.

The great thing about filming Dan Brown's novels will be that it will get rid of his execrable expository prose. With a bit of improvement on the dialogue from some professional scriptwriters in Hollywood, we'll be able to just sit back and enjoy the action on the screen instead of trying to picture what Dan is attempting to describe.

Posted by Geoffrey K. Pullum at 03:13 PM

Even in their own language

A story on NPR's Morning Edition this morning reports on a public debate over "[p]roposed recreational changes along the Colorado River and through the Grand Canyon". Active participants in this debate include members of the Hualapai tribe of Native Americans who live "on the middle course of the Colorado River", about whom NPR's Ted Robbins says (emphasis added):

Tribal members don't have kind words for the proposed plan, even in their own language.

What does "even in their own language" mean here? Are words in Hualapai expected to be kinder, in general, than words in English? (And what does that mean?)

[ Comments? ]

Posted by Eric Bakovic at 12:04 PM

Folk etymologies and eggcorns in Riddley Walker

A couple of months ago, Ray Girvan introduced the term eggcorn to the collaborative Riddley Walker annotation site that Eli Bishop maintains:

(67:31) "I bes put the red cord strait"
Put the record straight. This is a particularly good example of Hoban's deftness at puns that accurately reflect how idioms really change: someone mis-hears a phrase in a way that seems to make a little more sense than the real phrase, after the original meaning has become unclear. (That is, even if you have no idea what the "red cord" might be, it's easy to imagine pulling a cord straight; and in Riddley's world, rope is a lot more common than written records.) An example of a phrase that has changed in a similar way is "spitting image," which used to be "spit and image."

This is referred to as folk etymology if it becomes widespread; for newer instances that have not yet passed the test of time, linguist Geoff Pullum has coined the term eggcorn. [RG]

In the world of Russell Hoban's novel Riddley Walker, it's not clear which phrasal re-analyses like "put the red cord strait" are part of everyone's English, and which are sporadic or particular to the 12-year-old narrator.

Ray posted a terrific list of real-world examples of folk etymologies on his Apothecary's Drawer Weblog: place names ("Richborough" for Rutupiae), ship nicknames ("Billy Ruffian" for Bellerophon), soldiers' slang ("Alleyman" for Allemagne), public-house names ("Bag o' Nails" for Bacchanals), and so on.

As Dave Awl put it, Riddley Walker

[...] is set in an unspecified, post-apocalyptic era in the future, when dogs have become humanity's enemies, and history is a rubble of allegory. It's told in a language that recalls the "smashed mess of mottage" of Finnegan's Wake [...]

although unlike Finnegans Wake, Riddley Walker won the John W. Campbell Award. In addition to maintaining a Russell Hoban site called The Head of Orpheus, Dave Awl is a former member of the Neo-Futurists, who are responsible for one of my favorite pieces of speech act analysis synthesis.

Among the many other folk etymologies (or eggcorns) in Riddley Walker are arper sitting, axel rating, comping station, deacon terminations, farring seakert tryer, inner acting, inner G, pry mincer, some poasyum, and spare the mending. There are also blends like Plomercy (from diplomacy and mercy), some new ideophones like arga warga, some phonological reanalyses like "nindicator", and some evocative re-spellings like "addom" and "Chaynjis".

I usually find tricksome ways of writing English troublesome. For example, I had a hard time making it through Iain Banks' Feersum Endjin, though I like his other works a great deal. And I've never been able to do more than dip here and there into Finnegans Wake. However, Riddley Walker is an exception, where the wordplay drew me into the story rather than distracting me from it. Reminded by Ray's post, I recently read it again, and enjoyed it even more than I did when it first came out. If you haven't read it, you should.

[By the way, the 9/26/2004 Language Log post that Ray cites claims that {" eggcorn|eggcorns"} gets 3,680 hits on Google", as indeed it did when I wrote it. But when I click on the link now, I only get 2,230. Whether this is because some cache of eggcorn-rich pages has meanwhile drifted off the web, or because of some change in Google's algorithms, I don't know.]


Posted by Mark Liberman at 11:15 AM

December 01, 2004


In competition with Google Scholar, there's Elsevier's Scirus, which calls itself "the most comprehensive science-specific search engine on the Internet". Their Advisory Board includes linguists Tony Aristar and Helen Dry, the folks who brought you the Linguist List. Scirus' list of sources is impressive but not all openly accessible: several key resources, like Elsevier's Science Direct and the American Institute of Physics' Scitation, will only be available to those with subscriptions. (The same is true of some of the content indexed by Google Scholar.)

Scirus has been around since 2001, as this 10/15/2004 article from Library Journal explains. I think it's fair to view it as an attempt by Elsevier to lessen Open Access pressures, or at least the OA pressures that are due to the desire for more sophisticated and integrated searching. Of course, it increases as well as protects the value of Elsevier's extensive inventory of scientific journals. Whatever the motives behind it, my experience is that it's an excellent search tool.

Their web site explains that the name Scirus was taken from a passage in Pausanius' The Description of Greece:

"To the Eleusinians who were warring against Erechtheus, came a man, Scirus by name, who was a seer from Dodona, and who also established at Phalerum the ancient temple of Athena Sciras. After he had fallen in the battle, the Eleusinians buried him near a winter-flowing river and the name of the region and the river is from that of the hero."

This strikes me as one of those examples where additional information weakens rather than strengthens a case. Scirus is a perfectly fine English brand name in phonetic terms, with an appropriate echo of science and citation, and a certain amount of neo-classical dignity. It doesn't improve my opinion of the service to learn that Scirus was the name of a minor religious figure who made his living by hearing the words of Zeus in the rustling of oak leaves, and who got himself killed fighting the Athenians on the side of the Eleusinians in a minor battle during the 5th century BC. Is this the spirit that I want tracking down scientific citations on my behalf? I don't think so.


Posted by Mark Liberman at 08:24 PM

"Blog" wins

For the year 2002, the American Dialect Society chose blog as "the word most likely to succeed". Now, according to Reuters and others, Merriam-Webster has determined that the "number one word of the year" for 2004 was -- you guessed it -- blog. In this case, as I understand it, the "voting" is simply the count of requests on M-W's open web site.

More from the AP, the BBC, the Philippine Inquirer, Kerala Next, Internet Week, and CBS News, among many others.

Meanwhile, Microsoft has announced its own weblog service, called MSN Spaces.

"This is for the masses," said Blake Irving, an MSN corporate vice president. He predicted that the new MSN Service will expand the blogging category "at a pace that has not been seen before."

Amazing, if true, since blogging has recently undergone a period of exponential growth. Livejournal alone claims 5,316,446 users, of whom 2,360,747 are "active in some way"; then there's (Google's) blogger, Radio, typepad, xanga, and many other services, not to speak of the many individual and group or organizational sites. None of these publish statistics that I can find, but several of them seem to be pretty large operations -- blogger in particular may be bigger than livejournal. Technorati claims to index 4,830,726 weblogs. Anyhow, if blogging is now going to "expand at a pace that has not been seen before", that can only mean that the exponential rate will increase, and soon every literate person will have several weblogs. Perhaps our culture is really entering a unprecedented era of self-reflection and journal writing. Or maybe we'll see the development of autoblogging software, whereby a digital record of one's experiences and attitudes is created with little or no conscious effort. Say, a system that automatically transcribes cell phone conversations (and associated photos), adds in IM logs and email, compiles a thematically-organized sampler, and lets the user select passages for deletion or friends-only access.

Well, the underlying speech and language technology is not quite good enough for that yet. Until it is, I'll be pleasantly surprised if there are really an order of magnitude more people who are interested in actively maintaining an online journal , so that the total would go to 100 million rather than 10 million American bloggers. More likely, the whole MSN Spaces thing is a "me too" marketing effort, jazzed up with some inflated rhetoric by a big company that's late to the party, and it'll settle down as just another competitor to livejournal and blogger and radio and typepad and all. (But of course with some special anti-competitive aspects -- apparently the MSN Spaces blogging interface only works in IE6, and you need a Microsoft .NET passport in order to be able to post a comment.)

Microsoft introduced some weblogging software called "Sharepoint" back in 2002 or so, which doesn't seem to have gone anywhere. I spent a few minutes poking around on and didn't find a lot that seems inspiring. Some people will be excited by the ability to "upload photos from your mobile phone to your blog" and "create and manage your space from a mobile phone"; but moblogging is an idea that's been in the air for a year and a half or so, in a variety of forms. Most of the linked-to blogs on MSN Spaces so far seem to be "let's see what this is like" experiments, many of them from people who are obviously already experts. But we'll see what happens.


Posted by Mark Liberman at 02:28 PM

More tall tales

Maryellen MacDonald has emailed some additions to our knowledge of coffee size name lore.

Madison has many alternatives to Starbucks, each with different size-naming conventions.  A good local chain is Ancora, which has a vaguely nautical theme.  Their three sizes are Regular, Tall (as in tall ships), and Clipper.  Note that Tall here equals Medium, 16 oz, rather than Small, as in the Starbucks dialect.  The locals tend to cope with this conflict, but there are occasional amusing exchanges between uninitiated customers and Ancora staff, such as

Customer: "I'll take a grande latte,"
Staff (calling to barista): "Tall Latte!"
Customer: "No, I said Grande,"
Staff: "A Tall here is a Grande."
Customer: "But I don't want a small one, I want a medium one,"

Then there's Indie Coffee, which names their three cup sizes for the three lakes of Madison:  Wingra, Monona, and Mendota (the smallest lake is for small, etc.)

[Update: and there's more to the Starbucks story -- they have five sizes, not four, according to an email from Dana Watson: "short, medium, tall, grande, venti". Dana writes "when I was back in the US for a week of vacation from teaching in Japan, I was absolutely shocked by the size of my 'medium' coffee in the airport, which was about 3 times the size of a Japanese medium."

Now I'm really puzzled, since I'm pretty sure that what I've gotten by asking for a "short coffee" at Starbucks in the U.S. is an 8-ounce cup, and this doesn't leave much space before the 12-ounce "tall" size that's the smallest one advertised on their posted price lists. I'll do some research in a local outlet at a slack time, and report back. And perhaps Dana, or another reader in Japan, will supply measured capacities for Japanese Starbucks cup sizes. ]


Posted by Mark Liberman at 01:30 PM

A mixter o aal an new

The SCOTS project ("Scottish Corpus of Texts and Speech") has opened up an internet search facility. A story from The Scotsman of 11/30/2004 is here. Abnu at Wordlab, from whom I learned about this, offers this sample:

Faar I wis brocht up, e only seabirds we'd see wis e seamaas. In my time we caad em seagulls, bit aaler fowk wid say seamaas, makin't soon like 'simaaze'. Ere's ay change goin on in e dialect, an ye get a mixter o aal an new, bit it's e life o language tae be aye adaptin tae different generations an different times. It's naething tae greet aboot. Naething staans still, bit gin a wye o spikkin's richt hannlet, fa's tae say bit fit it michna leave its mark tee on fit ey caa e standard language? - for ere's nae doot at e standard language sair needs a bit o revitalisation noo an aan. Bit I'm on aboot seagulls, nae hobbyhorses.

The current inventory is 531K words in 385 documents, with some audio (seven conversations, four interviews) and video (two lectures).

Posted by Mark Liberman at 05:33 AM