January 31, 2004

Kennedyism of the day

William Saletan's 1/27/2004 Slate column on John Kerry, says in passing:

I couldn't decide whether to laugh or wince as Kennedy, the lifelong legislator, exalted Kerry's "two terms" in Vietnam—then corrected his description, incorrectly, to "two sessions." (Pssst, Senator … the word is tours.)

I can see that English time period words would be confusing to those who are not native speakers, and sometimes an occasion of error for those who are. At least in recent U.S. armed forces lingo, a period of service in a foreign country is a tour (of duty), while in American political parlance a period of elected service in a legislature is a term, and a period of activity for the legislature as a whole is a session, whereas in American academia the quantum of instruction for college students is a semester (except when it's a quarter), sometimes informally called a term, and a period of practical instruction for a medical student at a certain stage is a (clinical) rotation, and so on.

Like Saletan, I'm not too surprised that Kennedy assimilated Kerry's Vietnam tour to the legislative term. Unlike Saletan, I'll also point out that his colleague Jacob Weisberg is not collecting Kennedyisms, just as he's not collecting Kosisms.

By the way, I think we can assume that Saletan's half-laugh, half-wince was a symptom of igryness (otherwise known as vergüenza ajena or plaatsvervangende schaamte).

Posted by Mark Liberman at 11:16 AM

Retrospective: the semantics of harpers.org

Last November, I wrote a piece called Ontologies and Arguments, discussing the prospects for digital text to be "given well-defined meaning" via the Semantic Web and similar ideas. I cited an anti-semantic web piece by Clay Shirky and a reasoned response by Paul Ford.

Ford promised that "on December 1, on this site, I'll describe a site I've built for a major national magazine of literature, politics, and culture. The site is built entirely on a primitive, but useful, Semantic Web framework, and I'll explain why using this framework was in the best interests of both the magazine and the readers, and how its code base allows it to re-use content in hundreds of interesting ways."

At the time, I expressed a sincere interest in seeing that site in action, but what with one thing and another, I never checked back to Ford's site until now, though it's well worth reading for other reasons.

On 12/1/2003, Ford did exactly as he promised. The site that he built on "a primitive, but useful, Semantic Web framework" turns out to be harpers.org, the online presence of Harper's Magazine. His discussion of the role of the Semantic Web on the Harper's site is interesting.

The funny thing is, I've read (things on) the Harper's site several times in the past month, without noticing the Semantic Web bits, which are mainly visible in the Connections area. I'd never clicked there before. If I'd seen the top-level menu (Human Beings, Human Endeavor, Human Attributes, Human Needs, Ideas, Supernatural Beings, Nature,, Geopolitical Regions, Organizations & Bureaucracies), I might have suspected something.

I like the overall layout of harpers.org -- it's graphically and navigationally crisp. But in a few minutes of poking around in the connections hierarchy, I didn't have a conversion experience. I'm referring to the sort of quasi-religious sense of awe at the possibilities of a newly-experienced technology that I felt the first time I used a computer, or the first time I created an html page and viewed it with a browser, or the first time I searched with Altavista. Maybe I'm just too dense to get it. But so far, I don't see any reason to update Borges on metadata.

Posted by Mark Liberman at 08:38 AM

January 30, 2004

Sasha Aikhenvald on Inuit snow words: a clarification

Oh, dear. It had to happen. People are so convinced that language is all about words. The New Scientist's interview with Alexandra Aikhenvald about working with endangered languages, cited recently by Mark Liberman, even got assigned "For want of a word" as its headline -- the familiar nonsense about language being a question of how many words you've got. Aikhenvald (known as Sasha to her friends, i.e., just about everybody who's ever met her) has done most of her fascinating work on grammar (and some sociolinguistics), not lexicography. So faced with a question about a favorite difference between languages she picked evidentials (required sentence marking of the evidential basis for the statement made). But the interviewer, Adrian Barnett, knew about (and probably shares) the general public's lust for word lore, so of course he forced vocabulary into the conversation: "And what about different types of vocabulary?" And so it was that, knowing what was expected of her, Sasha dutifully commented on the Eskimoan languages:

The story about Inuit words for snow is completely wrong. That language group uses multiple suffixes, so you can derive not 50, but 150 words for snow.

Sasha speaks fast; sometimes too fast. I think I see what she might have meant, but what she said here (or what Barnett scribbled down in his notes, perhaps) is highly misleading at best: it actually suggests there is an answer to the perennial question, namely 150. Not so.

Here's a replacement answer that she could have given. It's a bit closer to the extremely complex truth (for which you should consult a proper Eskimologist; I have merely an interested onlooker's acquaintance with this topic, but I've done a little reading in widely available sources like the Comparative Eskimo Dictionary).

The story about Inuit (or Inuktitut, or Yup'ik, or more generally, Eskimo) words for snow is completely wrong. People say that speakers of these languages have 23, or 42, or 50, or 100 words for snow --- the numbers often seem to have been picked at random. The spread of the myth was tracked in a paper by Laura Martin (American Anthropologist 88 (1986), 418-423), and publicized more widely by a later humorous embroidering of the theme by G. K. Pullum (reprinted as chapter 19 of his 1991 book of essays The Great Eskimo Vocabulary Hoax). But the Eskimoan language group uses an extraordinary system of multiple, recursively addable derivational suffixes for word formation called postbases. The list of snow-referring roots to stick them on isn't that long: qani- for a snowflake, api- for snow considered as stuff lying on the ground and covering things up, a root meaning "slush", a root meaning "blizzard", a root meaning "drift", and a few others -- very roughly the same number of roots as in English. Nonetheless, the number of distinct words you can derive from them is not 50, or 150, or 1500, or a million, but simply unbounded. Only stamina sets a limit.

That does not mean there are huge numbers of unrelated basic terms for huge numbers of finely differentiated snow types. It means that the notion of fixing a number of snow words, or even a definition of what a word for snow would be, is meaningless for these languages. You could write down not just thousands but millions of words built from roots that refer to snow if you had the time. But they would all be derivatives of a fairly small number of roots. And you could write down just as many derivatives of any other root: fish, or coffee, or excrement.

And the derivatives wouldn't all be nouns. If you wanted to say "They were wandering around gathering up lots of stuff that looked like snowflakes" (or fish, or coffee), you could do that with one word, very roughly as follows. You would take the "snowflake" root qani- (or the "fish" root or whatever); add a visual similarity postbase to get a stem meaning "looking like ____"; add a quantity postbase to get a stem meaning "stuff looking like ____"; add an augmentative postbase to get a stem meaning "lots of stuff looking like ____"; add another postbase to get a stem meaning "gathering lots of stuff looking like ____"; add yet another postbase to get a stem meaning "peripatetically gathering up lots of stuff looking like ____"; and then inflect the whole thing as a verb in the 3rd-person plural subject 3rd-person singular object past tense form; and you're done. Astounding. One word to express a whole sentence. But even if you choose qani- as your root, what you get could hardly be called a word for snow. It's a verb with an understood subject pronoun.

Of course, you can make lots of noun derivatives too. But although various lists of supposed snow words are passed around (public libraries in Alaska compile them, Canadadian Indian affairs bureaux hand them out, skiing magazines publish them, that sort of thing), they fail to back up the familiar myth. These lists tend to cite multiple derivatives of the qani- root; they usually have a bunch of derivatives of the api- root; they often include a word for a sort of rain-pockmarked snow that looks like herring scales, only that word is visibly based on the root meaning "herring"; they include a word for soft snow that is clearly based on the root meaning "soft"; and so on.

So, Eskimoan languages are really extraordinary in their productive word-building capability, for any root you might pick. But that very fact makes them exactly the wrong sort of language to ask vocabulary-size questions about, because those questions are virtually meaningless -- unless you ask them about basic non-derived roots, in which case the answers aren't particularly newsworthy.

That's the sort of thing Sasha would probably have said in the interview if she'd had another few seconds.

[Thanks to Mark Seidenberg for a comment by email that enabled me to make this clearer.]

Posted by Geoffrey K. Pullum at 01:14 PM

Tariana tales

This week's New Scientist has an interesting interview with Alexandra Aikhenvald.

The editors' spin is to invite the reader to "[i]magine how different politics would be if debates were conducted in Tariana, an Amazonian language in which it is a grammatical error to report something without saying how you found it out." I suppose that this is a sly dig at Andrew Gilligan and the whole BBC "who's your source for that" scandal.

The implication seems wrong, though, since (in the systems of this kind that I'm familiar with) the epistemic status "somebody told me ..." is always available, and would always serve the needs of both honest and dishonest journalists. The linguistic system that would really improve journalism would be one in which witless topical framing and silly pandering were impossible to express. In pursuit of this Leibnizian dream, we'll scour the Amazon basin in vain. We can't, it seems, even hope for syntactic/semantic coherence in XML streams.

Myself, I think it's more interesting that Tariana speakers practice linguistic exogamy, that is, they "traditionally marry someone speaking a different language." This generalization of incest taboos (?) is apparently common in Eastern Amazonia, and I gather is also found in New Guinea.

Anyhow, it's nice that the documentation of endangered languages has become a popular topic.

Posted by Mark Liberman at 06:19 AM

The memetic phylogeny of "our new * overlords"

Michael Leuchtenburg emailed (much) more "I-for-one-welcome" lore, following up on this discussion. I, for one, wonder whether a simple mathematical model of memetic dissemination of strings (similar to the models of mutation and inheritance in computational biology) would enable a more systematic version of Michael's observation (that "[i]t seems obvious that 'for one ... overlords" is the original,while the others are variations. The number of hits decreases as the number of changes ... increases".

Here's Michael's note:

In addition to "I, for one, welcome our new * overlords", you might also consider the variations on this theme, listed below with google hits for *, * *, * * *, * * * *, and total. For convenience, the numbers for the root snowclone are also listed. I don't get the same numbers as you do, even using the link you provided, so I have listed the numbers I get.

I, for one, welcome our new * overlords (1880, 334, 213, 36; 2463)
I, for one, welcome our new * overlord ( 39, 13, 12, 2; 66)
I welcome our new * overlords ( 25, 40, 0, 0; 65)
I welcome our new * overlord ( 0, 0, 0, 0; 0)

I, for one, welcome our new * masters ( 152, 65, 36, 12; 265)
I welcome our new * masters ( 11, 0, 0, 0; 11)
I, for one, welcome our new * master ( 8, 0, 0, 0; 8)
I welcome our new * master ( 0, 0, 0, 0; 0)

It seems obvious that "for one ... overlords" is the original, while the others are variations. The number of hits decreases as the number of changes from "for one ... overlords" increases. At the same time it's evident that it's the meaning behind the words and not blind textual copying that is behind the pattern. This is evidenced by the much greater prevalence of "masters" over other words as the object of the sentence.

Looking a bit deeper, I can find other variations, such as this one:
"I, for one, welcome our new robot fundraisers"
"I, for one, welcome our new robot grad students"
"I, for one, welcome our new robot employees."

These are the only non-overlord/master hits in the first hundred on Google, and they all refer to robots. Interestingly, this page from the Jargon File says that the original value for X was "insect". That is certainly the correct quote, but I can't say with absolute certainty that this is the correct original source.

Hopefully you found this at least half as interesting as I did.

The patterns are neat in themselves, and suggest an interesting modeling exercise that might be the first example of genuinely scientific memetics (excuse my ignorance if I'm slighting some other work).

Linguistic data of this kind ought in principle to be richer in several dimensions than the available biological data , since we can have easier access to past-time patterns (via the wayback machine for example), and even to some independent evidence about the social networks involved. On the other hand, the networks along which the linguistic patterns spread may be harder to model, because there is no counterpart to the constraints imposed by geography and physiology.

This research direction might also count as the first-ever example of computational Bakhtinian analysis, since the uptake patterns of quotes from The Simpsons (and Buffy, and so on) might have been invented on purpose to illustrate "the interaction of different social values being registered in terms of reaccentuation of the speech of others".

Posted by Mark Liberman at 05:22 AM

January 29, 2004

In Soviet Russia, snowclones overuse you

Bob Fleck wrote in to point out another variety of the ironic phrasal templates exemplified by "I,for one, welcome our new ___ overlords". The origin for the pattern Bob cites is a set of jokes by the emigré comedian Yakov Smirnoff, most famously "In America, you can always find a party; in Russia, the party can always find you." The pattern is generalized to a a frame of the form "In [America or a substitute], you [verb group headed by X] Y; in [(Soviet) Russia or a substitute], Y [verb group headed by X] you." The first clause is now often omitted.

Here's a wiki page with lots of examples and discussion. Many of the examples are not funny -- or even interpretable -- except in the limited social sense in which out-of-context repetition of a currently-hot tag line always gets a sort of phatic yuk. However, there are a few thoughtful examples like Tim Lesher's "In ExtremeProgramming, you continually test your code. In WaterFall, your code continually tests you."

These phrasal templates are like the patterns for which Glen Whitman proposed the term snowclone. They're slightly different, because these are self-conscious and ironic evocations of a pattern viewed as intrinsically funny, rather than serious uses of a pattern that is felt to be intelligent and perhaps even original.

The Soviet Russia template has an interesting linguistic aspect: the paired contrastive accents that indicate role reversal. This phenomenon was first (as far as I know) pointed out by George Lakoff in 1971 ("Presupposition and Relative Well-formedness", in Steinberg & Jakobovits, eds., Semantics. CUP), using the example sentence

John called Mary a Republican, and then SHE insulted HIM.

You can see that George was busy framing political discourse, even then. Quite a bit of ink has been dispensed since then in proposing and debunking models for the constrastive role-reversal phenomenon, for example here.

Posted by Mark Liberman at 08:57 PM

igry: serendipity or glemphy?

Language Hat ranks igry as a coinage on the level of Walpole's serendipity, provoking several insightful comments on his site, including the observation that the Spanish term "vergüenza ajena" has a similar meaning, and has resulted in Spanish shame being used as a psychological term of art. This whole distributed igry thread should thus be enlightening to the philosopher José Antonio Marina, who says here that "[s]omos los únicos que sentimos 'vergüenza ajena', por eso en los libros de psicología se la conoce como 'spanish shame', vergüenza española." (Translation: "[We Spanish] are the only ones who feel 'other's shame', so that it is known in psychology texts as 'Spanish shame'.")

I wonder, by the way: are sentences of the form "We [insert ethnic group here] are the only ones who feel [insert specific emotion here]" ever true?

Inspired by Language Hat's post, Kerim Friedman at keywords.oxus.net adds a note of class to the enterprise by bringing in Bourdieu's theory of Symbolic Violence. If I understand him, this construes feelings of embarrassment as a form of violence committed by those who maintain the standards that provoke the feelings. That seems to me to get things backwards, in a characteristically French-social-theory sort of way, but I guess that's the point.

Google has already indexed this day-before-yesterday Language Log post as its first result for igry, with Francis Heaney's 12/12/2003 post (from which I learned about the word) unfairly in second place. There are 20,400 other results, but pretty much all of them seem to be in various slavic languages. So if igry is going to turn out like serendipity rather than like glemphy, we've got our work cut out for us.

[Update: here is a MonkeyFilter thread on the topic (via Language Hat).

Also, I wonder whether 'Spanish shame' is actually used -- or even has ever been used -- as a technical term in English-language psychology books. I checked the indices of two introductory texts without finding it. And the PsychINFO database (which indexes more than 8,000,000 references in "psychology and related fields" from 1840 to 2004) returns nothing for the string "Spanish shame".

It seems clear that Spanish people believe that English-speaking psychologists use the term 'Spanish shame', since Language Hat's commenter aa and the philosopher José Antonio Marina independently report the same thing, in almost the same words. But maybe it's just a sort of translingual "idée reçue"]

[Update 1/30/2004: Trevor emails that

Feeling a sense of embarrassment on behalf of someone else is expressed by the Dutch using the phrase plaatsvervangende schaamte ("place-replacing shame"). They (and cats) also probably believe that they are unique in this respect. Plaatsvervangende schaamte is very widespread in Holland and I suspect that its use goes back to the C19th. It certainly seems the kind of thing might come up with quite naturally in an intrinsically urban, bourgeois society in which curtains are commonly eschewed. Much Dutch emotional vocabulary comes from the German, but I can't think of a German equivalent.

Empiricists may look to the historical connection between Spain and the Netherlands, but in this case, my money's on innateness. Or at least, natural development from universal experience -- as usual, we can't tell the difference.]

Posted by Mark Liberman at 06:25 PM

Stupid fake pet communication tricks

I'm perfectly well disposed toward parrots; I have mingled with them in the wild on a friendly basis, and I have pictures to prove it. And N'kisi the grey parrot whose press coverage Mark recently discussed is said to have a sense of humor. Well, I'm normally all in favor of laughter and merriment, but I don't have a sense of humor when it comes to stupid animal communication stories like the one by Alex Kirby (another candidate for an early resignation from the BBC, I'd say). I'm just appalled at the kind of ridiculous, credulous garbage that sails out into the media universe the moment anyone claims they have located a communicative animal. People seem to completely lose their critical faculties when a bird with a brain the size of a macadamia nut creaks out a few imitated syllables, or (we've seen this before, with Koko) a gorilla waves its hairy hand vaguely in the air in a way that its trainer thinks resembles the very sign she was expecting. What is going on? Are we so desperate for communication with other intelligences that we will throw away our own the moment some dumb creature gives us an imitative squawk or a hand sign?

"PARROT'S ORATORY STUNS SCIENTISTS" burbles Kirby's headline. The scientists would have to be stunned to accept this slop. Set aside the claim about N'kisi being telepathic, which just shows Kirby is a shameless hack who will follow Jayson Blair into the annals of journalism without facts. (The website of the N'kisi project admits: "Aimee found that her state of mind was critical, and if she intentionally tried to "send" the information, it wouldn't work. N'kisi responded best when Aimee's full attention was genuinely immersed in exploring the images, without any thought of the experiments": yeah, right -- when it works it's telepathy, when it doesn't it was Aimee's fault for thinking wrong!) Forget that. Just consider the communication claim.

I will state a view here that I don't believe has been explicitly set out in public very often: I am prepared to voice doubt that there has ever been an example anywhere of a non-human expressing a single opinion, or even asking a question, ever.

I don't mean blurting out "pretty smell medicine" when in the presence of aromatherapy oils. I mean actually saying things, not just naming things in the vicinity or appearing to do so. Here is what would convince me: N'kisi drops some fecal matter into the water dish while sitting above it, looks down, and says "Oh, dear, I pooped in my drinking water" (without having been carefully trained to do so, of course). Or N'kisi sees his owner going out shopping and says "If you're going to the store, don't forget to buy some more birdseed", and when the owner comes back with it the bird says, "Oh, good, you remembered."

Kirby's parrot story is much more ridiculous than the familiar signing ape stories, of course. But the ape stuff is absurdly over-blown too. Don't get me wrong, I think bonobos in particular are wonderful creatures: a society that has figured out how to use sex to make peace is a society we could learn from. But language use? If just one bonobo would ever say (or sign) to her keeper: "I'm bored; I think I'll go have sex with Mandy again", that would be interesting. If a bonobo or any other ape would even just say, "How are you feeling today?", I would sit up and take notice. It just doesn't happen. Apes do learn to produce utterances designed to get the investigators to give them bananas. What they don't do is state opinions, or ask what your opinions are, or comment in even the most trivial way on what it's like to be an ape.

If you truly imagine that American Sign Language has ever been taught to an ape of any species other than ours, just stop relying on the trainer to tell you what the gesticulating creature is saying, and ask a native user of ASL to view a videotape and pass judgment. It isn't there. Apes cannot control ASL, or come anywhere close. (Joel Wallman's book Aping Language is a very useful aid for those who find them slipping into credulousness). And parrots can't talk. All the press stories about this topic are just hokum.

Even Alex Kirby (he of the deathlessly moronic opinion "About 100 words are needed for half of all reading in English, so if N'kisi could read he would be able to cope with a wide range of material") goes way beyond any non-human animal or bird in his linguistic capacities. We all do. I know you animal-lovers will all hate me for saying this, but it's true.

Update, January 15, 2007: For a detailed critique of the ridiculously fraudulent (yet nonetheless published) experiment on N'Kisi's telepathy, see Robert Carroll's article in The Skeptic's Dictionary. And note also that David Beaver recently discovered that the BBC retrospectively altered their article to remove all reference to telepathy (see the article at the same link referenced above), without altering the 2004 post date ("last updated: Monday, 26 January 2004") — an astonishing act of dishonesty for a mainstream science news source. The BBC sinks lower and lower in one's estimation as time goes on.

Posted by Geoffrey K. Pullum at 06:14 PM

"I, for one, welcome our new * overlords"

John Hardy of the wonderful blog Laputan Logic writes that

Apparently another of N'kisi's remarkable abilities is to uncover yet more snowclones.

This instance was spotted on Metafilter yesterday:

"I, for one, welcome our new telepathic parrot overlords "

Plugging this snowclone template

"I, for one, welcome our new * overlords"

into Google returns 1940 hits.

Indeed it does! And "welcome our new * * overlords" garners another 442, while "welcome our new * * * overlords" nets 235, and "welcome our new * * * * overlords" snares another 38. Perhaps someone with a few spare minutes will compile a histogram of the 2600+ substitutions in these templates, thus giving us all new insight into the mythic fears of the net (or at least of the net segments that have picked up this meme from slashdot, Jonah Goldberg and similar sources).

Danyel Fisher wrote in with the original Simpsons quote (from Deep Space Homer) in context:

News announcer Kent Brockman mistakes a floating ant in a space shuttle experiment floating close to the camera for a giant space ant:

"Ladies and gentlemen, uh, we've just lost the picture, but what we've seen speaks for itself. The Corvair spacecraft has apparently been taken over -- 'conquered' if you will -- by a master race of giant space ants. It's difficult to tell from this vantage point whether they will consume the captive earth men or merely enslave them. One thing is for certain: there is no stopping them; the ants will soon be here. And I for one welcome our new insect overlords. I'd like to remind them that as a trusted TV personality, I can be helpful in rounding up others to toil in their underground sugar caves."

[Update 11/8/2006: note that Goldberg's use of this phrasal template is distinctly not submissive, as explained here with respect to the 2006 midterm elections.]

Danyel's own list of snowclone candidates is here .

[Update: more on variants of this pattern is here.]

[And more on the phrasal templates that some of us call snowclones is here.

Posted by Mark Liberman at 12:46 PM

January 28, 2004

Parrot telepathy at the BBC

I hate to pile on, what with the "sense of shell shock" and "a bit of a meltdown" at the BBC. However, I have to take BBC News Online environment correspondent Alex Kirby to task, for sexing up this story about N'kisi the African grey parrot.

I yield to no one in my admiration for parrots' communicative efforts, and N'kisi does sound like a remarkable fellow, with a vocabulary said to number 950 words, but you have to wonder what is happening at the BBC when Mr. Kirby writes that:

N'kisi's remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine.

[Update: if you've noticed that BBC News has removed the "telepathy" reference, see here for discussion and a link to an earlier version of the article.]

As a mere linguist, I'll leave this one to the experts at the Skeptical Inquirer, but let's just say that throwing in a claim about pet telepathy doesn't do a lot for my confidence in the rest of the story. It's like reading about a hypothetical engineering genius whose remarkable new windmill design and perpetual motion machine are both covered enthusiastically in the latest issue of National Geographic.

But I do feel that I have some standing to comment on Mr. Kirby's observation that

About 100 words are needed for half of all reading in English, so if N'kisi could read he would be able to cope with a wide range of material.

Mr. Kirby seems to think that since the commonest 100 words cover about half of typical English text (about 47% of the New York Times, for instance), those 100 words would allow you to read half the material. Say, the international news and the sports page, but not the national news or the wedding announcements. Or in terms of more enduring texts, you could read Pride and Prejudice but not Sense and Sensibility.

Well, needless to say, that's not how it works. If you could read just the commonest 100 words in English, Mr. Kirby's own text would begin like this (replacing the letters in unknown words with x's):

Xxxxxx'x xxxxxxx xxxxx xxxxxxxxxx
By Xxxx Xxxxx
XXX News Xxxxxx xxxxxxxxxxx xxxxxxxxxxxxx
The xxxxxxx of a xxxxxx with an xxxxxx xxxxxxxxxxxx xxxxx to xxxxxxxxxxx with people has xxxxxxx xxxxxxxxxx up xxxxx.

The xxxx, a xxxxxxx Xxxxxxx xxxx xxxxxx X'xxxx has a xxxxxxxxxx of xxx xxxxx, and xxxxx xxxxx of a xxxxx of xxxxxx

He xxxxxxx his xxx xxxxx and xxxxxxx if he is xxxxxxxxxx with xxxxx xxxxx with which his xxxxxxxx xxxxxxxxxx xxxxxx xxxx -- just as a xxxxx xxxxx would do.

X'xxxx'x xxxxxxxxxx xxxxxxxxx, which are said to xxxxxxx xxxxxxxxx. xxxxxxx in the xxxxxx XXX Xxxxxxxx Xxxxxxxx.

If N'kisi's 950-word vocabulary were the 950 commonest words, the same passage would look to him like this:

Xxxxxx'x xxxxxxx xxxxx xxxxxxxxxx
By Xxxx Xxxxx
XXX News Xxxxxx xxxxxxxxxxx xxxxxxxxxxxxx
The xxxxxxx of a xxxxxx with an almost xxxxxxxxxxxx power to xxxxxxxxxxx with people has brought xxxxxxxxxx up short.

The xxxx, a xxxxxxx Xxxxxxx xxxx called X'xxxx, has a xxxxxxxxxx of xxx words, and shows xxxxx of a sense of xxxxxx.

He xxxxxxx his own words and xxxxxxx if he is xxxxxxxxxx with xxxxx xxxxx with which his xxxxxxxx xxxxxxxxxx xxxxxx xxxx -- just as a human child would do.

X'xxxx'x xxxxxxxxxx xxxxxxxxx, which are said to include xxxxxxxxx, xxxxxxx in the xxxxxx XXX Xxxxxxxx Xxxxxxxx.

No doubt N'kisi's vocabulary choice is not strictly frequentistic -- but that will just mean that fewer of the words in the passage will be covered. For your convenience, the original is below:

Parrot's oratory stuns scientists
By Alex Kirby
BBC News Online environment correspondent
The finding of a parrot with an almost unparalleled power to communicate with people has brought scientists up short.

The bird, a captive African grey called N'kisi, has a vocabulary of 950 words, and shows signs of a sense of humour.

He invents his own words and phrases if he is confronted with novel ideas with which his existing repertoire cannot cope - just as a human child would do.

N'kisi's remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine.

At first, I thought that "Alex Kirby" might be a pseudonym for Andrew Gilligan, hiding out on the talking parrot beat and perhaps a bit out of his depth. But no, there really has been for some time an Alex Kirby who is the Beeb's "environment correspondent". Let's hope that his coverage of global warming is less credulous and more mathematically sophisticated than his coverage of talking parrots.

[Update: see this post by Geoff Pullum for more on N'kisi and talking animals in general.]

[Update: Ray Girvan has also blogged on this, with links to other Parrot tall tales.]

Posted by Mark Liberman at 11:14 PM

Google frequency agen (um, agin) (oops, again)

Apparently linguists aren't the only ones out there using Google frequencies to make inferences about language. So there's a wider community than we thought of people who need to be warned to do such things with care. Here's an example, plus a few constructive ideas.

Jason Eisner pointed me to a story in today's NY Times, which describes how some sellers on eBay are losing money by failing to spell their offers correctly. As an example, one eBay auctioneer was unable to sell a pair of chandelier earrings. (Linked in case you, like I, had no idea what these are. Useful information, come to think of it, what with Valentine's Day approaching. Curiously, though, my wife did the same search and came up with a different page.)

The problem was failure to use Google properly in doing a frequency comparison. From the Times article:

    Ms. Marshall, who lives in Dallas, said she knew she was on shaky ground when she set out to spell chandelier. But instead of flipping through a dictionary, she did an Internet search for chandaleer and came up with 85 or so listings.

    She never guessed, she said, that results like that meant she was groping in the spelling wilderness. Chandelier, spelled right, turns up 715,000 times.

Apparently eBay does try to warn users when they are using a common misspelling, but

    wrong spellings can also turn up similar misspellings, so that buyers and sellers frequently read past the Web site's slightly bashful line asking, by any chance, "Did you mean . . . chandelier?"

For what it's worth, three solutions are worth noting. First, a simple Google search may result in Google's version of "Did you mean...". If the suggested correction is the same as eBay's suggested correction, that should increase your confidence that it's right -- two data points are better than one.

Second, if you're going to use search engine frequencies, at least try a bunch of different alternatives. And if the tallies are a closer call than 715,000-to-85, it's easy to do an on-line statistical test to see if the difference is significant. Read details about one such test, or just try one of the many on-line applets that let you do the computation. For example, here's a link to one particularly easy to use tool. Should you trust a Google frequency difference between, say, 5000 hits versus 6000 hits? Enter X1=5000, X2=6000, and both N1 and N2 as 3307998701 (from the bottom of http://www.google.com). Click "Submit". If the p value is less than 0.05, conventional statistical wisdom says you can trust that the difference between 5000 hits and 6000 hits did not happen by chance. (I'm sure someone can suggest a better statistical test, taking advantage of the fact that N1=N2. And of course beware that just because something is statistically significant, it doesn't necessarily mean it's meaningful! With such a huge N, even relatively small differences will give you significance on this test.)

The third alternative, of course, would be to use the dictionary.

Posted by Philip Resnik at 11:55 AM

Snowclones are the dark matter of journalism

Paul Boutin in Wired magazine quotes Glen Reynolds to the effect that "Email is the dark matter of the blogosphere." Now where have I read something like that before?

Probing Google with "is the dark matter of", we learn that "The PC is the Dark Matter of the Internet", "Global technoscience is the dark matter of social theory", "Networking is the dark matter of high-speed internet", "Terrorism is the dark matter of the civilized world", "The extraterrestrial hypothesis is the dark matter of political science and science policy in the second half of the twentieth century", "Euroscepticism is the 'dark matter' of German politics", "the Boswell Co. now stands revealed for what it is: the dark matter of 20th century California history", "Intellectual property is the “dark matter” of the corporate universe," and quite a few others.

Like other snowclones, X is the dark matter of Y is more than a fixed phrase or cliché. It's a pointer to a little conceptual universe, bringing along with it a metaphorical framework that structures the surrounding chunk of discourse. If X is the dark matter of Y, then X is crucial to Y, is even the biggest part of Y, but it is not directly visible, and must be inferred because of the strong effects it has on visible things.

In the original citation, Reyolds goes on to say that "I have a few readers who function as virtual stringers, sending me several links throughout the day. Professional journalists sometimes send me links to articles or topics they can't get assigned to write about, in hopes that I might get the story more attention." By prefacing this with the dark matter business, he's positioning his own experiences with emailed leads as characteristic of a universal phenomenon, suggesting that weblog publication is the visible manifestation of underlying social networks that operate via hidden email connections. This is a pretty efficient use of five little words.

Is there anything wrong with this? Why do we sometimes make fun of journalists who recycle these little conceits?

Well, there's certainly nothing wrong with using metaphors. That's how most thinking and writing works. Nor is there anything wrong with using metaphors that have been used before, or with echoing the words previously used to express them. That's culture, and where would we be without it?

It comes down to thoughtfulness. Take a look at this little corpus of examples of the trope "If Eskimos have Q words for snow, [some discussion of the importance of topic X to group Y]" Ask yourself how many of these are likely to reflect any real thought about language and culture, even with respect to X and Y much less the poor Eskimos. In most of these examples, the whole Eskimo/snow frame is nothing but a fancy way to say that X is important or salient for Y. Far from adding any useful insight, the talk about snow vocabulary is just a stereotyped rhetorical distraction.

Reynold's dark matter reference, in contrast, really does add something essential to his paragraph. He really means to imply that email is unseen, that it is nevertheless a crucial cause of the visible motions of web publication, etc. I'm not sure how far he wants to take it -- is email really, by analogy to dark matter's role in the universe, 87% of the mass of the blogosphere? But that's a quibble.

My own interest in this kind of thing is mainly descriptive, not prescriptive. I don't have strong feelings about how often writers should use formulaic language, whether traditional or trendy, or even how carefully they should think through the metaphors they invoke. But as Geoff Pullum (paraphrasing Dwight Bolinger) has recently reminded us, "people remember whole chunks of language, even as they recognize how to chop those chunks up into individual lexical items; we store both the parts and the wholes, and retrieve them when we need them."

I'd add that we retrieve chunks of different levels of abstractness as well as different sizes and shapes. And these chunks are not just structured word strings with meanings, they also project a sort of conceptual and rhetorical aura into their textual neighborhood. Figuring out how to investigate and model these chunks and their interactions is a key research problem these days, because of the very large corpora that we can now access, index and analyze.

[Update: Oops. Right after the "dark matter" paragraph, the very next line in the article is the signature "-Paul Boutin", to whom I originally attributed the quotation. However, a slightly more careful reading suggests that the author of the "dark matter'" quote is almost certainly Glen Reynolds, whose "reading list" it concludes.]

Posted by Mark Liberman at 08:12 AM

January 27, 2004

Non-Bushism of the day

From the middle of the New Hampshire primary, the well-known political blogger Kos writes:

And btw, I don't have any compulsion in posting mid-day tracking polls. They are the half-time score in the Super Bowl. Democracy will march onward. Heck -- if you're a Kerry voter, it means get your ass to the poll, since your guy doesn't have it in the bag. If you're a Dean voter -- get your ass to the poll since your guy has a serious shot. And if you're a Clark or Edwards voter, you have every motivation to get your guy that important third place finish.

This is a garden-variety malapropism, substituting compulsion for the similar-sounding word compunction, though the meanings are radically different. If George W. Bush said this, Jacob Weisberg would be all over him.

This particular substitution is pretty common. For example, Walter Rogers, "CNN Senior International Correspondent", says here about an IED explosion on 12/5/2003 in Baghdad that

It shows, once again, that the Iraqi guerrillas are totally indiscriminate in their violence and killing. While targeting an American convoy here, they apparently had no compulsion about setting off their charges while a civilian bus was going by.

Sometimes such malapropisms happen because the speaker (or writer) genuinely doesn't know the difference, and sometimes it's just a little neural noise that causes one word to pop out when you really had another one in mind. Everybody makes both kinds of mistakes. It's hard to know whether George W. Bush really does this kind of stuff more often than the rest of us do, because there's nobody collecting Kos-isms or CNN-isms, or for that matter Jacob Weisbergisms.

For my part, I'm glad that nobody's keeping score on me. Well, almost nobody. I'd like to take this opportunity to say that Geoff Pullum has not asked his department chair for teaching relief in compensation for the amount of time he spends correcting my spelling errors. Whatever you may have heard.

Posted by Mark Liberman at 04:32 PM

Minding and not minding ambiguity

In a recent note to a linguistics-related mailing list, a student is looking for a sentence ambiguator (he means sentence disambiguator). This puts me in mind of the Amelia Bedelia series of children's books, where the title character is forever plagued by sense ambiguity. (She dusts the furniture by spreading dust on it, puts out the lights by taking them outside, etc. Wonderful stuff, linguistically and otherwise. Makes for fun exam questions.)

It also occurs to me to wonder if there might be some government funding right now for development of a good sentence ambiguator. I'm sure that there would be a great deal of utility in a tool to subtly rewrite government communications in order to ambiguate official statements concerning the reasons for going to war in Iraq, the anticipated size of the budget deficit, etc.

By the way, the student concludes with a request to "please mind my English". In addition to being very conscientious, this is kind of neat, because he could just as easily have said "please don't mind my English". Sense ambiguity again: mind as "be on one's guard; be cautious or wary about; be alert to" (WordNet sense 5) versus "be offended or bothered by; take offense with, be bothered by" (sense 1).

Posted by Philip Resnik at 12:17 PM

Eggcorns make us igry

Doug Orleans, commenting on a note by Jason McIntosh, writes that "Eggcorns make me igry."

Eggcorn is Geoff Pullum's recently suggested term for sporadic or idiosyncratic lexical re-analysis, like "egg corn" for acorn, "wedding vowels" for wedding vows, "reigns of power" for reins of power, and so on.

Igry was recently coined by Francis Heaney and others with the meaning "painfully embarrassed for or uncomfortable about someone else's incredibly poor social behavior, or descriptive of such poor social behavior."

The core example of igriness is not quite right in this connection, because it suggests a moral failing rather than a misunderstanding: "Like, say you're at a restaurant, and one of the people at your table summons the waiter by snapping their fingers." However, it's a small extension of the meaning of igry to cover the embarrassed sympathy we feel for the linguistic cluelessness behind a malapropism or an eggcorn. And Doug deserves points for using two words coined by others on known occasions within the past four months within the same (contextually apt) four-word sentence.

Speaking for myself, I do try to transcend my natural feelings of igriness in the face of eggcorns, by admiring the boundless creativity with which people constantly reinvent our common language. But it's hard.

[Update: Francis Heaney writes:

Igry was actually coined quite a few years ago (I see now that I neglected to mention that detail in the blog). I've just been keeping the fire alive.

The "eggcorns make me igry" usage doesn't seem wrong to me. I mean, when I see someone I respect writing about "reigning in one's impulses" or something, it does make me feel embarrassed for them, and it definitely generates a little of that dying-inside feeling that is the core of igriness. Limiting my definition to merely reactions to poor behavior might be too narrow. Like, here's another f'rinstance: watching the trailer to the new Ben Stiller movie makes me igry, not because the subject matter of the movie seems offensive, but because it just pains me so much that Ben Stiller keeps taking such embarrassing roles in crappy movies. So that doesn't really fall under the "bad behavior" umbrella either. I welcome further refinements to the definition.

OK, we're on the same lexicographic page here: what I saw as an extended sense, bleached of the connection to an associate's bad behavior, is revealed as part of the core constellation of senses, linked phenomenologically by the common emotion that is experienced. It's notoriously hard to get people to agree about the ontology of feelings, but igriness (igritude?) should be a welcome addition to anyone's set of basic emotional categories, in my opinion.]

[Update: more on igry and its translations in Spanish and Dutch here.]

Posted by Mark Liberman at 06:29 AM

January 26, 2004

Unbridled eggcorns

In Straight Man, Richard Russo's 1997 comedy of academic manners, we find this passage:

My not having concrete information to report was evidence to Finny, were any needed, that I was attempting to scuttle the search for a new chair, a search that I've not been in favor of from the beginning. My position has been that our department is so deeply divided, that we have grown so contemptuous of each other over the years, that the sole purpose of bringing in a new chair from the outside was to prevent any of us from assuming the reigns of power. We're looking not so much for a chair as for a blood sacrifice. [p. 17]

English is full of horse-harness metaphors, most of them having to do with establishing or losing control ("it's time to rein in ICANN"; "carnal excess, unbridled lust and limitless perversity"; "the new Johnston Board of Education chairman took the bit in his teeth last week") or with struggling against curbs and burdens ("manufacturing is saddled with an image problem"; "economic gains hobbled by spending spree").

All of these expressions are moribund now, because so few of us have anything to do with horses in our daily lives. The result? Eggcorns. Is there another word that sounds like the name of a piece of horse harness, with a meaning that resonates in any way with the force of one of these expressions? Then it's likely to get substituted.

Google has 22,900 instances of "reins of power" (the original horse-harness metaphor) and 7,120 instances of "reigns of power" (the eggcorn substitution). Aside from novelist Russo, reign fans include Human Rights Watch ("When President Yoweri Museveni ... took over the reigns of power in Uganda in 1986 ..."), the BBC ( "Having emerged from relative obscurity, General Suharto carefully set about grasping the reigns of power ..."), the New York Observer ("... the prematurely middle-aged, finally, somewhat belatedly taking the reigns of power for which they had long practiced"), and many other reputable sources.

We can also find "packed full with moments of unbrided genius", "unbrided fury of winter frost", "it brings unbrided joy to my heart", "carnalised unbrided crazyness", and "Blair called for unbrided access to genetic data". And "If anything's guaranteed to blight a celestial coaching career, being yolked to a rotting corpse of a club like Tottenham is it", and "the colonised displaying the colonial maladies that stem from being yolked to the coloniser". OK, yokes are for oxen, but the idea is the same.

I'd prospect on the web for more unbridled eggcorns, but I need to brave the "unbrided fury of winter" to walk across campus for a meeting. I could get to like that expression. If you think about it, the aggression of a young man who hasn't been mellowed by marriage is more familiar to most of us, these days, than the willfulness of a horse without a headstall.

Posted by Mark Liberman at 01:01 PM

January 25, 2004

Teaching the difference between right and wrong

Having spoken out for the right of indigenous peoples such as the Parents Television Council to defend their traditional culture against the forces of globalization, I want to dissociate myself from one of the PTC's arguments. In his weekly syndicated column of 01/21/04, the PTC's president, L. Brent Bozell III, says:

NBC ... lobbied furiously that the F-word was but an "adverbial intensifier" that fit well within the fine legal points of not referring to a sexual or excretory act. That's a laughable argument against common sense. And it's an insult to any parent trying to teach a child the difference between right and wrong. Just imagine:

"Hey Mom, what a f---ing awful day I had at school!"
"WHAT did you say, Tommy?!"
"Mom, get with it. It was an adjective, not a noun."

If I were the sort of person who makes formal complaints about things that are none of my business, I'd be tempted to report Mr. Bozell to another beleaguered indigenous group, the Traditional Grammar Project. Although I doubt that the TGP have the ear of any government agencies with the power to impose fines or even community service, perhaps they can persuade Mr. Bozell to make amends by working with them on a lesson plan that America's moms can use, in the poignant scene depicted in his column, to teach our children the rights and wrongs of grammatical terminology.

Posted by Mark Liberman at 11:29 PM

The Ngadjonji and the PTC

In response to my characterization of the Parents Television Council's complaints about "indecent" language as "stupid", Mark Liberman suggests that:

...a decent respect to the cultural norms of indigenous populations requires that we should avoid using the S-word in discussing such taboos, whether among the Ngadjonji of north east Queensland, or the Parents Television Council of south California.

While I agree that we shouldn't go around beating up on every cultural institution we find irrational, it seems to me that there are several differences between the behavior of the the Parents' Television Council and that of the Ngadjonji which make the former more worthy of criticism than the latter:

  • The Ngadjonji, and other peoples with similar practices, don't as far as I know pretend that there is a rational basis for them.

  • Though there may be social pressure to conform, they're not as extreme as invoking the law. The PTC isn't just expressing its disapproval; it is trying to get the FCC to fine NBC a large amount of money.

  • There is apparently a consensus among the Ngadjonji and other such peoples as to the desirability of observing their linguistic taboos. In American society, I see no such consensus. The taboos in question are violated constantly. It seems to me that there is a large segment of the population that doesn't care, a small segment that really does, and a large segment that doesn' t really care but tends to acquiesce in the complaints of those who do. It's one thing for people to agree on social conventions. It is quite another thing for one group to impose its prejudices on everyone else.

  • The use of mother-in-law languages doesn't do any damage. Prissiness about sex causes real harm.
Posted by Bill Poser at 11:18 PM

The FCC and the S word

Bill Poser suggests that that the FCC's renewed interest in rules about indecent words is "stupid."

This reminds me of a conversation among three 4-year-olds in the back of a car that I overheard a few years ago. To protect the innocent, I'll call the speakers A, B and C. Their exchange went something like this:

A: Do you know the bad words?
B: Yes. My mom says them all the time.
C: Mine too.
A: I know the S word.
C: [covering her ears] Don't say it! Don't say it!
B: [trying to put his hands over A's mouth] That's the worst one! Don't say it, we'll get in trouble!
A: I'm going to say it! "STUPID." There, I said it.
C: No! No! You can't say that! Don't say it again!

Their (admirably kind and caring) preschool had a strict rule against calling people names, and stupid was high on the list of proscribed insults. The kids had assimilated this prohibition into the natural class of lexical taboos.

In this context, I need to come to the defense of the FCC. It's common and natural for cultures to impose strongly-felt restrictions on who can be heard by whom to use which words when. The standard extreme example is the "mother-in-law languages" of the Australian language family Djirbal:

All dialects of Dyirbal had two separate languages, everyday language and "mother-in-law language" which was used in the presence of certain 'taboo' relatives. While these languages shared phonology and grammar, they had entirely different vocabularies.

Though harder on learners (especially second language learners), this is more systematic and perhaps more logical than the FCC's regulations.The distinction represented by feces vs. shit is extended to split the entire lexicon, and rather than distinguishing between broadcast and cable, prime time and late night, and so on, the distinction is simply based on kinship relations. Still, I feel that a decent respect to the cultural norms of indigenous populations requires that we should avoid using the S-word in discussing such taboos, whether among the Ngadjonji of north east Queensland, or the Parents Television Council of south California.

I'd also like to point out that there are some contexts in which certain vocabulary may be required rather than forbidden, though anthropologists have not studied such cases as extensively. As an example, I can cite a joke that was current when I was in the U.S. Army a few decades ago. It involves a conversation between two mechanics in the motor pool, A and B:

A: Hey, could you pass me the pliers?
B: Say what?
A: Please pass me the pliers.
B: Pass you what?
A: The pliers.
B: What did you say?
A: Pass me the fucking pliers!
B: Oh, why didn't you say so in the first place?

The exchange is pretty realistic. At least, I can certainly imagine that if one of my friends in those days had asked me to "please pass the pliers", in just those words, I would have responded by asking him what the fuck I'd done to piss him off.

Posted by Mark Liberman at 10:42 PM

Some people should get a life

In a bit of surprisingly clear thinking for a government agency, which it appears to be on the verge of recanting, the Federal Communications Commission has already pointed out one linguistic error in the complaint by the Parents' Television Council about the use of the word fucking on television discussed in Mark Liberman's post. In the sentence:

This is really, really fucking brilliant.

fucking is an adverb meaning more-or-less the same thing as really and does not refer to sex.

But there is another error of a psycholinguistic nature in the premise underlying both the FCC's regulations and the complaint, namely that it is somehow harmful to children to be exposed to discussion of sex or excretion. This idea is to my knowledge wholly unsupported by evidence. The Parents' Television Council web site, doesn't seem to contain any. On the other hand, acting prissy about such matters pretty definitely contributes to feelings of shame and, in layman's terms, gives children complexes. That's bad enough in itself, but even worse, it discourages openness and free discussion. That leads to unwanted pregnancy, failure to report and address sexual abuse, and the spread of sexually transmitted diseases.

What is even odder is that it is only "indecent" language referring to sex and excretion that is supposed to be harmful. Apparently, there is no objection to references to copulation or feces. Is this because they figure children won't understand such words? I don't think so. They object to bitch, but not to girl dog, though in this case it is the "indecent" word that children are less likely to understand.

Different languages and cultures seem to taboo different words, and to different degrees. French merde, for example, is considered off-color, but it is much milder than English shit. The Japanese equivalent kuso is stronger. Carrier tsan is not tabooed at all. There is no distinction between a tabooed term like shit and acceptable terms like excrement and feces. There are vulgar and euphemistic terms for sex.

What amazes me is that people can devote so much energy to such utterly trivial, if not actually wrong-headed, causes and waste taxpayer money on them, and in the process infringe on the fundamental and fragile right of freedom of speech, when there are real problems that deserve our attention. This is really, really fucking stupid.

Posted by Bill Poser at 09:28 PM

19th-century asterisks: at last it can be told

At the end of the movie "Play It Again, Sam", Woody Allen's character makes an unusually moving speech, selflessly urging a woman he is in love with to get on a plane and fly away to be with some other man who will be better for her. She comments on how beautiful the speech was, and he replies, "It's from Casablanca; I've been waiting my whole life to say it."

And I have been waiting my whole life (or at least, twenty years of it) for someone to say, "I am unsure who brought the [*] symbol into theoretical linguistics." Now that Chris Potts (God bless him) has finally raised that question, I can give the answer that I have been waiting so long to give.

The late Fred Householder once claimed in an article in the defunct journal Foundations of Language that he introduced the asterisk into modern linguistics in a course he taught on syntax at Indiana University in the 1960s, but he was wrong. The asterisk, I claim, makes its very first appearance in syntax in Henry Sweet's New English Grammar, Logical and Historical: Part II, Syntax (Oxford: Clarendon Press, 1898), on the top line of page 3, where the ungrammatical order *dogs big black is contrasted with the grammatical big black dogs, the asterisk prefix being used without comment to mark ungrammatical strings. Later, on page 9, it is used again: "we cannot make old sage into *old wise man," Sweet remarks. Notice that this claim has nothing diachronic (historical) about it: he is not talking about a pattern from an earlier stage of the language (indeed, he explicitly introduces a different prefix, a dagger, to mark sentences that are literary or somewhat archaic as of 1898); he is talking about the frozenness of old man as being sufficient to block separation by another intervening adjective like wise. He's identifying the ungrammatical strings that the grammar should not describe; he's doing modern empirical synchronic syntax.

This is not the only way in which Sweet (who was such an appallingly grouchy individual that Oxford University never did give him a professorship) showed a prescient brilliance concerning what modern linguistics would become. As Lisa Selkirk noted in her 1972 dissertation, he also was the first to describe the prosodic behavior of the unstressed grammatical words of English -- for example, the four pronunciations of have (count them: (1) like the first syllable of havoc in I already have; (2) like the first syllable of Havana in I have often thought so; (3) like the first syllable of avoid in I'd've thought so; (4) just a [v] in I've forgotten).

Posted by Geoffrey K. Pullum at 08:01 PM

Nor'easter considered fake

In the Dec. 21, 2003 The Word column in the Boston Globe, Jan Freeman cites an interesting alleged mispronunciation: "nor'easter" (scroll down past the Sir and Lady stuff).

The snow kept coming, and so did the mail, during the region's recent storms: The Globe doesn't (wittingly) use nor'easter for a disturbance blowing from the northeast, but in other newspapers, and especially among TV weatherpeople, it's common. How, asked reader Bill La Pointe, did this "bogus term" gain acceptance?

It's not, after all, a regional pronunciation, as many journalists outside New England now believe. "I grew up on Cape Cod when there still existed a pronounced local accent," wrote George Hand. "The word -- spelled phonetically -- was nawtheastah." Sailors disclaim it, too: They may say sou'wester, but never nor'easter.

The facts, however, have not slowed the advance of nor'easter: Even in print, where it's probably less common than in speech, it has practically routed northeaster in the past quarter-century or so. From 1975 to 1980, journalists used the nor'easter spelling only once in five mentions of such storms; in the past year, more than 80 percent of northeasters were spelled nor'easter. It's no more authentic than "nucular" for nuclear or "bicep" for biceps, but it would take a mighty wind, at this point, to blow nor'easter back into oblivion.

In rural eastern Connecticut, where I grew up, locals also pronounced northeaster without any tendency to drop the final consonant of north. However, the OED cites a bunch of examples from 1837 onwards:

1837 B. D. WALSH tr. Aristophanes Knights I. iii in Comedies I. 175 Slack your sheet! A strong nor'-easter's groaning. 1891 A. AUSTIN Lyrical Poems 9 Nobody..could ever dream of holding up as the model of a delicious climate that alternation of swirling, dusty nor'-easters and boisterous, drenching sou'-westers which we in England recognise as spring. 1931 A. J. CRONIN Hatter's Castle II. ix. 368 Did you see that shot of mine, cocky?.. It was a regular nor'easter, a pickled ripsnorter. 1972 F. MOWAT Whale for Killing (1988) x. 99 By Monday morning a bitter nor'easter..had shrouded Burgeo under a low and scudding overcast. 1997 A. R. AMMONS Glare 193 Well, it's Easter morning right now, with a nor'easter, out-of-whack, whipper-jawed, eight-inch dump load of snow on the ground.

I'm not very impressed with the credentials of these writers for establishing the pronunciation (or spelling) of a characteristically New England meteorological phenomenon: A. J. Cronin is from Scotland; Farley Mowat comes from inland western Canada (though he lived in Nova Scotia for some time); A. R. Ammons is from North Carolina. Of course, as A. Austin indicates, they have (very different things called) northeasters in England as well, but I'm not sure that should count.

I'm no weatherman, but as Gertrude Burnham explained it to me when I was a kid, a northeaster is a winter storm that travels (from southwest to northeast) up the coast, with its center off shore, so that the counter-clockwise circulation of the storm blows in off the ocean full of moisture (from the northeast), dumping the load of moisture as snow and ice when it cools down over the land. Here's a page with pictures that seems to be talking about the same thing.

Subject to correction, the picture that seems to be emerging is that nor'easter is a literary affectation. This would make it something like e'en for even and th'only for the only, which I have been told originated as an indication in spelling that two syllables count for only one position in metered verse, with no implications for actual pronunciation. The comparison with sou'wester is interesting. There are two obvious differences between the two words: the theta in southwester is preconsonantal, whereas in northeaster it's prevocalic; and southwester is also an old word for a common article of clothing, namely an oilskin rain hat, with equivalents in Dutch zuidwester and German südwester.

I wonder if there have ever been American speech communities -- other than journalists -- in which nor'easter was the normal pronunciation? Maybe nor'easter should be added to yourDictionary.com's list of 100 Most Often Mispronounced Words. Or a new list of 100 most inauthentic pronunciations.

Posted by Mark Liberman at 07:49 PM

Maybe better make that "freaking brilliant"

Jan Freeman, in the Boston Globe's The Word column today, alerts those of us who missed the AP and Dow Jones stories to a new chapter in the story of the FCC and Bono's Golden Globe acceptance speech last fall. Bono said "This is really, really fucking brilliant"; the Parents' Television Council complained; and the FCC decided not to fine NBC, arguing that

The word "f---ing" may be crude and offensive but, in the context presented here, did not describe sexual or excretory organs or activities. Rather, the performer used the word "f---ing" as an adjective or expletive to emphasize an exclamation [and that] ... is not within the scope of the commission's prohibition of indecent program content.

After three months of protests, FCC chairman Michael Powell "has scheduled a meeting for Wednesday to revisit the issue, hoping to reinstate the f-word ban."

Freeman cites Language Logger John McWhorter's Washington Post op-ed, where John wrote:

We obsess over the encroachment of vulgar words into public spaces on pain of a stark inconsistency, one that will appear even more ridiculous to future generations than some Victorians calling trousers "nether garments" does to us.

Based on the wire stories, it seems to be a foregone conclusion that the FCC will reverse itself, and the only question is how big NBC's fine will be.

Posted by Mark Liberman at 07:47 PM

#ing out @ language Log

Are English speakers growing more adept at realizing lexical items as individual symbols? Are you familiar enough with the forms in (1) to use them in sentences like those in (2)?

(1) #         @         *
(2)a. Okay, I'm #ing out now.
b. My address is president @ whitehouse . gov.
c. The following example is therefore *ed.

Virtually everyone has a pronunication for @ these days. It might not be common knowledge that it has a fairly literal interpretation (you're at a particular server), but speakers are able to articulate its appropriate conditions of use. This is quite common for lexical items. (How would you define I, as, or but? Probably you would mutter something incoherent or false and then resort to giving some examples of the word in action.)

At my father's office, the voice-mail system permits callers to leave a message or "press the pound (#) sign" to speak to a secretary. In November, my father told me that a broker from Merrill Lynch finished his message by saying that he was "pounding out" to see if he could get the information from a secretary. Or should that be "#ing out"?

In linguistics, an asterisk, *, affixed to the front of an example indicates that the example is ungrammatical. Linguists often speak of "starring examples", or, rather, "*ing examples".

Note: The synchronic theoretical linguist's * is adapted and imported from historical linguistics, where it means "reconstructed but unattested" --- it marks cases that are missing from the historical record but that the theory predicts to be grammatical. Synchronic theoretical linguists retained the "unattested" part, but the sense is quite different in that subfield, where the * marks a form that doesn't exist (according to the author).

Davies (1992:185 nt. 31) attributes the first use of * to mean "reconstructed but unattested" to the linguist Pott, who used * ("ein Sternchen" ('little star')) in 1833. I am unsure who brought the symbol into theoretical linguistics, though.

Davies, Anna Morpurgo. 1992. History of Linguistics. Volume IV: Nineteenth-Century Linguistics. London and New York: Longman.

Posted by Christopher Potts at 04:58 PM

When did you first hear this pattern?

The topic of snowclones is related to the more general topic of non-spontaneous fixed formulae in putatively spontaneous language. Andrew Pawley of the Research School of Pacific and Asian Studies at the Australian National University has written a few papers about this, using the term speech formulas. I haven't found anything of his that is web-accessible, but his 1985 paper "On speech formulas and linguistic competence" in Lenguas Modernas 12: 84-104 would be one place to start.

One of the things that Andrew has noted in papers that I've read is that an astonishingly large number of the subordinate clauses in spoken conversation are in fact included in such speech formulas; in other words, ordinary people hardly ever make up novel subordinate clauses spontaneously when speaking. Dwight Bolinger also noted a long time ago (1975) that people remember whole chunks of language, even as they recognize how to chop those chunks up into individual lexical items; we store both the parts and the wholes, and retrieve them when we need them.

Snowclones, as defined here on Language Log, are simply an extension of this aspect of speech into journalistic writing and advertising copy. We're not quite as creative as the standard generativist dogma says we are.

I suspect that anyone who tries to make a comprehensive collection of snowclones is going to find that the collection is very large. Another one has just been pointed out to me in correspondence, this one by Anthony Hope. The pattern is this:

the X  that put the  Y   in(to)  Z

What reminded Anthony of it was an emailed advertisement from Virgin Atlantic:

Virgin Atlantic, the carrier that puts the "v" into "value" for money, brings you these truly tantalising treats - a veritable cornucopia of fabulous fares.

The pattern is not that common (there appear to be no examples of it in the Wall Street Journal corpus, for example), but it shares with nearly all snowclones the interesting property that although everyone knows immediately they have heard the pattern before, hardly anyone can remember where or when they first heard an instance of it.

Posted by Geoffrey K. Pullum at 12:20 PM

Studying child language in the wild

Harper's Magazine (February 2004, p. 18) reports that children attending the Chicago International Children's Film Festival were given a survey including a question that said, "What new and interesting things did you learn from seeing these films?" One child wrote this answer:

Because when he turned his self into a chicken the two chicken.

I don't really have anything to say about this. Except perhaps that people who study child language by actually looking at things children say or write face an extremely difficult task and deserve our sympathy and respect.

Posted by Geoffrey K. Pullum at 11:34 AM

Star-struck coeds

Non-linguists often don't have a clear view of what its like to be one, but from time to time, someone shows a proper appreciation. Here's how author Dan Brown characterizes the leading man in his techno-thriller Digital Fortress (St. Martin's Press, 1998, p.8):

His university lectures on etymology and linguistics were standing room only, and he invariably stayed late to answer a barrage of questions. He spoke with authority and enthusiasm, apparently oblivious to the adoring gazes of his star-struck coeds.

I'd better go work on my tan.

Posted by Bill Poser at 10:11 AM

January 24, 2004

Speaker change in mid sentence

While preparing for the speaking visit Barbara Scholz and I just made to UC San Diego (which was really great -- a superb intellectual community for cognitive science, and wonderful hospitality), I had occasion to call up my friend Farrell Ackerman in his office. His deep voice answered the phone: "Farrell Ackerman," he said, rather languidly.

But then something shocking happened.

Without warning, a totally different excited-sounding female voice seized control of the clause and completed the predicate for him: "is not available!" she said in tones of bubbling delight; "Please leave a message after the tone!"

I think what was so horrible about it for a syntactician like me was not just that for half a second I thought I was hearing Farrell answer the phone live and then he was snatched away; it was that you just don't expect there to be a speaker change between the end of the subject noun phrase and the beginning of the immediately following verb phrase which agrees with it in person and number.

There are plenty of other horrible things about voicemail systems, of course. One example is the ghastly practice of recording a message that begins slowly and patronizingly, "You have reached the office of . . .". No live human being ever picks up the phone and says anything that begins with "You have reached." (And of course, it is always a lie: you only get to hear "You have reached X" when you haven't reached X because X isn't there.

But the speaker-change device is a particularly nasty one for grammarians. Next time I need Farrell, I'll use email.

Posted by Geoffrey K. Pullum at 02:55 PM

Homing in on honing in on

Language Hat wrote yesterday, inviting me to be less incoherent about the origin and status of the phrase honing in on:

I'm not quite sure what you're saying. Surely you're not implying that "hone in" is possibly the original? According to the M-W Concise Dict of Eng Usage, it's attested only from 1965 (George Plimpton, Paper Lion), and since then mainly in reported speech. "Home in" is much earlier. There's no question that "hone" is a mishearing/misunderstanding. It is, of course, rapidly gaining ground and may well wind up the victor, but that's a separate issue.


Steve's right, of course. And several people, David Nash first, wrote to point out that I somehow missed the OED's history of "home in on."

Let me try to clarify what (little) I now know about the history, and what I think some of the forces at work are, and why I'd like to know more. Though I'll confess to a recreational curiosity about the expressions themselves, there are a couple of larger issues here that may excuse an indulgence in lexicographical minutiae. (And if not, it's our policy to refund your subscription fees, cheerfully and in full!)

I had guessed that home in on was a recent expression, probably from WWII, and the OED's history supports this. Since Bush 41 was a bomber pilot in the pacific theater, he would probably have learned the phrase just as it was coming into existence, and if he really learned it as hone in on, then the two variants must have co-existed pretty much from the beginning.

And as is usually the case with common eggcorns, there is quite a bit of semantic support for the hone in on variant. There's so much semantic support in this case that hone in on is by no means implausible as a coinage on its own. Google suggests that the two variants are about equally common today, and the hone in on variant now occurs in places like MIT press releases and Washington Post presentations of Reuters newswire stories.

So maybe hone in on appeared as an independent (if echoic) coinage, just at the time when home in on began to be used in general discourse. If all that is true, then the whole thing is an interesting example of the interplay of social, semantic and phonetic forces in vocabulary development. The process is likely to be well documented in texts (e.g. newspapers from the last half of the 20th century) that are increasingly available in digital form and searchable on line.

For those who still haven't had enough and too much of homing and honing, here are some of the details.

The OED has homing used for "the faculty possessed by animals (e.g. pigeons, turtles, etc.) of returning home from a distance". There's one marginal citation from 1765 ("When they come to be trained for the homing part") but the real uses start in 1875. The first citations for the verbal use of home with animals is likewise 1875 Live Stock Jrnl. 23 Apr. 57/3 Pigeons home by sight and instinct. The first extended use with humans is 1893 Nat. Observer 14 Oct. 559/1 Your tourist is homing from abroad. So the verb to home, in the relevant sense, seems to be a late-19th-C innovation, at least in general use.

The extended use meaning "[o]f a vessel, aircraft, missile, etc.: to be set, or guided, to its target or destination, by use of a landmark or by means of a radio beam" is first attested in 1920 Wireless World Mar. 728/2 The pilot can detect instantly from the signals, especially if ‘homing’ towards a beacon. Other citations continue to have scare quotes through 1947. The first cited use of home on is 1940 Jrnl. R. Aeronaut. Soc. XLIV. 569 The tanker must be equipped with D.F. gear, so that the two aircraft may ‘home’ on each other if visibility is poor. And the first citation for home in on is 1956 Amer. Speech XXXI. 228 A good officer could even ‘home in on a bottle of whisky’ placed on the landing field.

Several other citations are given during the 1950s and 1960s using home on where I would have written home in on, e.g. 1958 ‘P. BRYANT’ Two Hours to Doom 58 Infra-red missiles which homed on the radiations given off from jet engines. 1962 F. I. ORDWAY et al. Basic Astronautics ix. 386 The guided vehicle then homes on the reflected signals as in the active case. The usage home in on doesn't seem to become the norm until the mid to late 1960s -- the OED's first citation for home in on without scare quotes is from 1971.

Steve's Plimpton citation for hone in on (in 1965) is barely 9 years after the earliest home in on citation in the OED (1956), and is apparently more or less from the same time when home in on was becoming a common expression in general usage. And as I said, if Bush 41 learned the expression as hone in on, some people must have adopted that variant pretty much from the beginning. So while I agree that home in on was almost certainly the original pattern, hone in on apparently followed it almost instantly.

Now what about the route for the (semi-) independent development from hone?

Let's ignore the old versions of hone meaning "to delay" and "to grumble" (which were news to me), and zero in on hone as "to sharpen on a hone". Though the noun hone is old (citations from 1300), its verbing is apparently more recent. The phrase "grinder or honer" is cited from 1824, and the first citation of hone as a straight-out verb is from 1826 (in Webster's dictionary!) with the meaning "to hone a razor."

Unfortunately, the OED's entry is very short and does not give any examples with extended meanings. However, at some point since Webster, hone began to be used in the various metaphorical senses that are now widespread and even clichéed: practicing to perfect a skill; adjusting the content or form of something to make it work better; etc. (Headline writers are especially fond of hone because it's short!)

Dennis Miller to hone his HBO special material at Irvine Improv this week
Poet heads for Virginia to hone his craft
Big names help Kerry hone his foreign policy message
Volunteer Work Helps Joyce Moran Hone Her Leadership Skills

Sometimes the meaning seems to be nothing more specific than "develop" or "improve" or "perfect": "Hogan began to hone her mellifluously spooky welter of torch songs and honky tonk anthems when she fronted the legendary peg-legged cabaret quartet, The Jody Grind."

But (consistent with its origins) hone often continues to have connotations of sharpening, suggesting that the development or improvement is accomplished by increases in precision and decreases in size or scope. It's often used with "down", perhaps as a kind of blend of hone "sharpen, improve" and pare down "remove excess material":

Bush has tried to hone down his message as much as possible to just two issues: tax cuts and education.
The eGGsters first heard about Raku in early 1995, when Mark was just beginning to hone down the concept.
You hone down your ideas so that when you go into the studio you're just going right for it.

Though some of these examples feel to me like they should have been "pare down", I reckon that's just a matter of stylistic choice. After all, we're talking about which metaphoric cliché is the appropriate one. And some examples don't trouble me at all, such as this passage from Alicia Jones' poem Anorexia:

Trying to undo all
the knots the female body has
tied, all the cyclical obligations,
to gush, to feed, she chooses

to hone her shape down,
her scapulae prepared like
thin birds, to fly away from
the spine.

You also see examples with the preposition in, like the shortshop at Mississippi State who is said to have "developed into a good situational hitter while honing in his offensive skills in fall practice", or the young woman in San Francisco who "is focusing on her new digital media company, Steakhaus Productions, after honing in her digital design skills at SFSU Multimedia Studies Program." Examples like this are clearly developments from hone, not echoes of any likely use of home.

From the uses of hone down X to mean "improve X by sharpening focus on the essentials and eliminating or ignoring extraneous material", it's not a very big step to hone in on Y = "reach Y by a process of successively sharpening focus while eliminating extraneous material." The independent plausiblity of hone in suggests that this step might sometimes take place spontaneously. Of course, the fact that home in on is also floating around in the world doubtless often helps to motivate the step. And there is surely also a sort of resonance effect, with home in on thereby playing a causal role in increasing the frequency of hone in on to the tipping point. But I'd still like to know when people started talking about someone "honing her skills" and "honing down his message" -- and "honing in on his target". It seems possible to me that hone in on began to develop out of hone down (and related sources) in parallel with the transfer of home in on from the specialized vocabulary of aeronautics to its popular use as a metaphor for refinement by successive approximation.

Finally, let me point out that there is a bit of leakage in the other direction, from hone down to home down. For example, one "Professor J. Crowley" of University College, Dublin, writes (for the Pompidou Group at the Council of Europe) that

"There is undoubtedly need for more experimental work and more epidemiological investigation, all the time homing down and fine tuning the precise impacts of the target drugs ... "

Malapropism or metaphor? Eggcorn or expressive originality? We report, you decide.

Posted by Mark Liberman at 12:29 PM

January 23, 2004

Shoe in and hone in

Neal Whitman writes:

"I was recently thinking about how annoying it is when people write that someone is a "shoe-in" and not a "shoo-in," when all of a sudden I realized that I had an eggcorn on my hands. I checked to verify that it really was "shoe-in" and not "shoo-in" that was the eggcorn, at http://www.word-detective.com/100297.html#shoe-in."

It looks to me like Geoff Pullum's coinage is catching on.

While we're on the subject, maybe someone can enlighten me about the history and status of home in on vs. hone in on.   I've always assumed, based partly on the meaning and partly on the fact that home in was what I learned first, that home in is the original construction, and hone in is an alternative based on misconstruing it.

Google has 33,000 hits for "hone in on" and 56,000 for "home in on." The returns for "hone in on" include an entry in the Columbia Guide to Standard American English telling us that "hone in on is an erroneous version of home in on, attributed to George Bush among others." (From the 1993 date of publication, this must be Bush 41 not Bush 43). The OED doesn't have either home in on or hone in on. The American Heritage Dictionary has hone in on without any usage note (though the etymology field does call hone in "an alteration of home in"), and of course gives the verbal meaning of home from which the phrasal verb is derived: "to go or return home ... to be guided to a target ..."   The Word Detective (cited by Neal Whitman above) is silent on the subject.

Either expression requires a metaphorical broadening of the core meaning of the base word (home or hone), and both metaphors are somewhat plausible. I personally still find the home usage more persuasive -- the hone version grates on me, as evidence of such differences often does -- but the expression hone in on has recently been used in an MIT news release, a Reuters news story (picked up and printed in the Washington Post among other places), and many other reputable outlets.

So in the end, I'm not sure what's going on here.

Perhaps Columbia should change their entry to read "hone in on is considered by some to be an erroneous version of home in on, attributed to MIT and Reuters among others," and give the Bush bashing a rest already.

[Update: more here. ]

Posted by Mark Liberman at 06:47 AM

Challenge as negation

A couple of days ago, I cited a quotation with an apparently extra or redundant not: "I challenge anyone to refute that the company is not the most efficient producer in North America." It's occurred to me since then that the speaker may have been confused not only by "three phones going nonstop for more than 12 hours each day" for "39 consecutive days", but also by the fact that challenge can function as a sort of negative. Phases of the form "I challenge anyone to X" are often used to mean "No one can X" or "No one will X".

Looking for "I challenge anyone" on the web, we find some (perhaps more basic) cases where the anyone means "anyone at all": "I challenge anyone that's willing to a Mechasummon duel." However, there are other cases where anyone seems to be the negative polarity version, and may be followed by other polarity-sensitive items:

"I challenge anyone to find a school or department at UF or any other major Florida state supported university that comes anywhere close to the decadence of this law school."

"I challenge anyone to find another net company that offers this great of a net at an affordable price.

I don't think this really makes the original quotation semantically balanced. "No one can deny that the company is not the most efficient producer" is still not what the speaker meant -- he meant "no one can deny that the company is the most efficient producer." But it helps explain why it was hard for him to figure out whether he needed another not or not. The difficulty of calculating the meaning of sentences with multiple negations and a scalar predicate or two is hard to underestimate.

[Update: Kai von Fintel writes:

I posted a related item on my blog on the old example "No head injury is too trivial to ignore", which illustrates your point rather well, I believe.

Kai's post is here.]

Posted by Mark Liberman at 06:03 AM

January 22, 2004

Cats -> naps, get it?

Another cartoon about how cultural salience creates vocabulary diversity. The Eskimos are off the hook for once.

No doubt this is one of a series: the squirrels have 79 words for nuts, the robins have 93 words for worms ...

[via Wolf Angel]

Posted by Mark Liberman at 09:42 PM

Skepticism about the SARS--phonology connection

I earlier reported on Dr. Sakae Inouye's hypothesis that aspiration in Chinese and English might be among the causes of the SARS outbreaks in areas in which these languages are dominant. Inouye's support for the hypothesis rests on a comparision with Japan: the Japanese have yet to suffer a SARS outbreak, and Japanese lacks aspiration.

My colleague John Kingston, who originally brought this item to my attention, has this to say in response to my report of Inouye's letter in The Lancet:

English, Chinese, and Japanese all share sounds that involve very high rates of air flow out of the mouth -- the sibilant fricatives. In fact, air flow continues at a very high rate for a very long time in these sounds relative to aspirates. More importantly, a Chinese speaking Japanese is most likely to substitute aspirated stops for Japanese voiceless unaspirated stops, while using unaspirated stops for the Japanese voiced ones. That is, even if Inouye's hypothesis about the languages shopkeepers use in talking to tourists is correct, there's no reason to expect that the Chinese speaker will actually pronounce the sounds of the foreign language in a way that accurately reflects the foreign language's phonetics rather than his own language's.
Posted by Christopher Potts at 12:19 PM

Deny, disprove, refute

Here's some more on the verb refute, which Geoff Nunberg and I have written about here recently. It's more than any sane person would care to know about refute, actually, but I hope I can convince you that some more general issues emerge.

Originally, I cited a case where a college newspaper reporter wrote that "[i]n his weekly radio address last Saturday, Bush refuted that the law sets unreasonable standards." This differs from standard usage in two ways: first, the context suggests that the writer means only to say that Bush denied something, not that he presented arguments to prove it wrong; second, refute is here used with a "sentential complement" rather a "noun phrase object", i.e. refuted that such-and-such is so rather refuted such-and-such. Geoff assured us that the American Heritage Dictionary's august usage panel has this problem in its sights..

In my earlier post, I did a bit of googling, and concluded that refute with sentential complement is a pretty rare bird: "refute/refutes/refuted the" is about 115 times commoner than "refute/refutes/refuted that the". By comparison, claim/claims/claimed the is only about 1.17 times commoner than claim/claims/claimed that the, a difference in ratios of two orders of magnitude. Philip Resnik pointed out that Alta Vista (unlike Google) allows both start and end times for searches, making it easier to track changes over time, so I redid the same searches on Altavista for three two-year periods starting at the beginning of 1998. The results were pretty stable, or at least don't show any consistent trend (except for the volume of indexed material to grow!):

refuted that the
refute the

So if the sentential-complement use is a change in progress, it's not happening fast enough to be seen at this temporal resolution. Of course, this is what we expect if it's happening on the time scale of a generation or two. But another possibility is that this is just a sporadic difference in how learners of English generalize from the examples they hear and read.

Now, this particular usage question is of little real interest in itself. However, it exemplifies a range of interesting questions about how and why ideolects vary, and how individual variation is related to large-scale change across space, time and communicative connectivity. The technology of networked computing might have been designed by a cosmic sociolinguist specifically for the purpose of instrumenting these patterns in increasingly minute detail, and over the next few decades, we'll learn a lot about how this works.

Some other issues seem to be lurking here. In general, intrinsically negative verbs having to do with attitudes towards propositions seem to occur with nominal objects much more often than with "that S" complements. In some cases, the preference is pretty much categorical: for me it seems impossible to say "I refuse that S" or "I reject that S". In other cases, it's more of a statistical preference. By comparison, intrinsically positive verbs of similar sorts seem much more likely to permit "that S" complements, and to use them more often when they are possible. The table below glosses over a multitude of problems, but may be enough to support the plausibility of the generalization:

  ___ the ___ that the Ratio

So "refute that S" is fighting a larger-scale battle, it seems. Is this apparent pattern real? Is it part of a larger one? What's the cause, if any?

Here's another issue. At least in principle, the choice of syntactic frame for refute -- sentential complement vs. nominal object -- is independent of the choice of meaning -- "prove to be wrong" vs. "deny". Some of the sentential complement uses clearly mean "prove to be wrong", e.g. this statement from a document on the EPA's web site:

The On-site Verification test should be able to be completed within an hour, and is used in the
field to verify or refute that the waste is behaving as predicted from the Characterization and
Compliance testing.

and similarly, this sentence from a dissertation on "Vertical Integration in Commercial Fisheries":

Speculation aside, however, there is no empirical evidence to confirm or refute that the use of quota management actually leads to increased vertical coordination.

Other uses of the same syntactic pattern clearly mean nothing more than "deny" or "dismiss":

Pavkovic did not admit - but neither did he refute - that the Battallion was filled by members and supporters of Momir Bulatovic's Party.

I have no doubt that this woman was the sneak who called security and hotel management and I absolutely refute that the shout "Look out, here come the Indians! Circle the wagons!" was made.

But are the two variables really uncorrelated? Since deny is one of the most sentential of the negative verbs in the list above, whereas disprove is about five time more nominal, and falsify is the most nominal of all, might we not expect that that refute would be more likely to become sentential for people who use it to mean deny, as opposed to people who retain the standard meaning of falsify? I didn't tally things up to check on this, but as I was poking around, I got the impression that it's not (very) true.

Finally, here's another kind of evidence about usage standards, brought to my attention by David Nash.

According to John Quiggin, the New York Times ran an AP wire story under the headline "Powell Refutes Report saying U.S. Overstated Iraq Threat", where "[t]he body of the article makes it clear that Powell said he disagreed but produced nothing that would prove the report false." However, within hours of his post, the headline had been changed to read "... dismisses ..." This kind of correction of on-line news stories is very common, though the changes are more often to add or fix information than to fix language. Still, if you automatically tracked the changes and aligned the versions, it might be possible to see how often various usages are corrected -- a sort of automated editorial usage panel.


Posted by Mark Liberman at 08:24 AM

Why there should be more scholarly and scientific weblogs

I have to admit that the folks who complain about "blognoise" in Google's page rank and similar measures have got a bit of a point. Checking on our referrer logs, as I do from time to time, I discovered a few minutes ago that Language Log provides the #1 page returned by a query for "philosophy lessons." I yield to no one in my admiration for John McWhorter's writing, and his post on false exoticism in linguistic description is a gem, but should it really be Google's top guess for a source of philosophical instruction?

I've already pointed out that Geoff Pullum's bit of deadpan humor about universities named after linguists is the top answer to the question "who is Harvard University named after?" We also come out #1 for "psycholinguistics career", "what does outcall massage mean?", "bush mispronunciations", "talking seals", "cold reading", "same procedure as every year", "CPEB", "'new yorker' judicial nominations", "invention of pickle", "words made up by mistake", "Edward Sapir's beliefs" and "sic meaning", #2 for "causes of speech impediments", "italics rule grammar", "right-justified", "different kinds of human races", "strange alphabets" and "history of emo", #3 for "ordinary language philosophy", "separation of subject and verb", "how to say English sentences in French" and "Pete Rose's sorry",#4 for "examples of sarcasm", #5 for "how new words are added to the english language", and so on. Except for the Harvard thing, all of these examples are from searches conducted within the past couple of hours -- at this point, we get about 250 visitors referred from search engines per day. [Note -- since the search indexes are constantly updated, your results may vary from what is asserted above, which reflected the results as of roughly 11:30 p.m. on 1/21/2004].

My point in listing these examples is not to trash the search algorithms, which on balance are doing a great job of helping people to find the information they are looking for (along with the odd serendipitous gem from our little enterprise here, of course). Instead, I'd like to draw my colleagues' collective attention to the fact that we scholars and scientists are missing an opportunity here. Aside from all the personal and discipline-internal benefits of weblogs, they're a wonderful way to reach the public at large! This is especially true for the general audience of high school and college students, who spend a lot of time online and rely heavily on search engines. And the more mutually-connected weblogs a discipline maintains, the higher the average rank of its postings will be.

This is not a proposal to subvert search engines by having researchers engage in google-bombing on a grand scale. On the contrary, my point is that we have a chance for our voices to be heard by the public at large, on the topics we know and care about, without journalistic intermediaries. Most of us spend much of our time communicating with one another on these topics anyhow, by email, in mailing lists, and so on. All we need to do is to move some (more) of this activity into a slightly different medium.

I'll close with a quote from the Wired article I started with, suggesting that

the trick to achieving prominent search rankings is fairly straightforward: "update frequently and provide good content."

If you're looking for a philosophy lesson, you could do worse.

Posted by Mark Liberman at 12:10 AM

January 21, 2004

I challenge anyone to refute that this negative is not unnecessary

In the course of further poking around on the refute that X story, I stumbled over one of the curious non-negative nots that Chris Potts has recently discussed. In today's Oswego Daily News, Nicole M. Reome tells us about Richard M. Duffy's plans to start a new company to make chocolate in an abandoned Nestle's plant in Fulton.

"If this weren't a Nestle plant, we probably wouldn't have pursued it," Duffy, a consultant to Nestle, noted. "Because it was, we didn't even have to look at it. Nestle is revered as being the best in the business. I challenge anyone to refute that the company is not the most efficient producer in North America."

We all know what he means. But we would have been at least as happy if he had said "I challenge anyone to refute that the company is the most efficient producer in North America." (Substitute deny for refute if the usage bothers you, that's not the point here).

Is this a case where the force of the sentence is logically the same with or without the extra not?

Or did Mr. Duffy just get confused? He tells us that reviving the factory was pretty linguistically hectic for him:

"For 39 consecutive days, I had three phones going nonstop for more than 12 hours each day," Duffy explained. "It was exciting. I just got into it."

That's about 39*12*3*60*150 = 12,636,000 words. After all that verbal action, what's another not more or less?

Posted by Mark Liberman at 09:48 PM

SARS spread via aspiration

Dr. Sakae Inouye (Otsuma Women's University, Tokyo) has published a letter in the medical journal The Lancet arguing that we should in part blame the phonology of Chinese and English for the outbreak of SARS in China and among English speakers.

Inouye observes that SARS is "transmitted via droplets spread by infected individuals. Droplets are generated when patients cough and, to a lesser extent, when they talk during the early stages of disease". She goes on to note that both English and Chinese make use of aspiration in their consonantal systems. This aspiration is, she claims, a likely source of droplet spread. Her support for this linguistic explanation is that Japanese lacks aspirated constants and, "as of mid June 2003, the number of probable cases of SARS in Japan remained zero", though millions of Japanese people visit China each year.

So we have a nice argument for the old "say it, don't spray it" mantra:

A Chinese attendant in a souvenir shop probably speaks to American tourists in English, and to Japanese tourists in Japanese. If the shop assistant is in the early stages of SARS and has no cough, I believe American tourists would, hence, be exposed to the infectious droplets to a greater extent than would Japanese tourists.

A story about Inouye's hypothesis appeared in yesterday's Guardian (here). It was also picked up by the Annals of Improbable Research (here). The original letter in the The Lancet is not available freely on the Net, but here is the reference for those of you who have subscriptions of the e- or paper kind:

Inouye, Sakae. 2004. SARS transmission: language and droplet production, The Lancet, Volume 362, Issue 9378, 12 July 2003, Page 170.

Many thanks to John Kingston for bringing this item to my attention.

Posted by Christopher Potts at 09:40 PM

Names and descriptions

Barbara Scholz and I are travelling to UC San Diego tomorrow to give a seminar on the problem of explaining first language acquisition at the Center for Human Development. We need a rental car, so I just looked up a few prices on the web via Travelocity. Ten proposals with prices came up. The lowest price on the list was $27 (not that Travelocity would let me actually book it: they insisted on knowing the name of the airline we're flying in on, and you had to pick it off their popup list, but Southwest Airlines wasn't on their list, so the transaction was impossible to complete; but don't get me started on the subject of crappy web interfaces that prevent business from being done).

However, my eye was attracted to the other end, where one company was quoting $99.89 for exactly the same rental period and size of car. With $11.24 in taxes and fees, that's $111.13 for an economy car for one 24-hour rental period. The name of that company was Payless.

So that, in case you needed a really clear example, is the difference between a name and a description.

Posted by Geoffrey K. Pullum at 05:43 PM

Negated, or not

There's a handful of English constructions in which, quite surprisingly, one can add or remove a negation without change of meaning. Paul Postal discusses my favorite of these in 'The structure of one type of American English vulgar minimizer', from his new collection Skeptical Linguistic Essays. (I am pretty sure that this paper was once called 'The grammar of squat', but I guess that was too skeptical even for Paul.) In (1)-(2), the vulgar minimizer, the item to watch, is squat:

(1) Eddie knows squat about phrenology. (jack, beans, diddley)
(2) Eddie doesn't know squat about phrenology.

I'm delighted to report that both (1) and (2) mean that Eddie knows nothing about phrenology. This is potential trouble for the hypothesis that natural language negation is just like the logician's negation, which takes any statement and reverses its value. That hypothesis predicts that if (1) is true then (2) is false (for example). But in fact they are semantically equivalent.

The vulgar minimizers aren't the only items that refuse to have their polarity reversed by negation.

In (3)-(6), I exemplify the other negation-indifferent items that I know about. This is a heterogenous bunch, and in some the negation might effect a subtle shift of meaning. But in no case does the logician's negation seem to be much help.

(3)a. That'll teach you not to tease the alligators.
b. That'll teach you to tease the alligators.
(4)a. I wonder whether we can't find some time to shoot pool this evening.
b. I wonder whether we can find some time to shoot pool this evening.
(5)a. You shouldn't play with the alligators, I don't think.
b. You shouldn't play with the alligators, I think.
(6)a. I couldn't care less about monster trucks. (see Skeptical Linguistic Essays, page 361, footnote 3)
b. I could care less about monster trucks.

Example (6b), could care less, comes in for a hard time from some prescriptivists. But the others haven't caused a stir, as far as I know.

Posted by Christopher Potts at 02:12 PM

January 20, 2004

The Dangling Conjunct

I've come across a candidate for something that might reasonably be called a "dangling conjunct", analogous to the much more familiar "dangling modifier".

A free adjunct is a sentence adverbial of the form

(Intro) X

where Intro is some sort of introductory element and X is an expression that denotes a predicate; the (semantic) subject for this predicate is supplied from context. On one hypothesis -- an incorrect one, I maintain -- a free adjunct is in fact a modifier of the (grammatical) subject of the clause the free adjunct is associated with, so that on the further assumption that modifiers must (except in very special circumstances) be adjacent to their heads, it's predicted that the semantic subject of the free adjunct is simply the denotation of the (grammatical) subject of the main clause. Cases in which the grammatical subject of the main clause does not provide the needed denotation -- cases where some other phrase in the main clause does so, or where the denotation comes from the larger (linguistic or non-linguistic) context -- are labeled "dangling modifiers" and are widely said to be unacceptable.

The crucial bits that I want to pull out of this are (a) the intuition that something's missing from a free adjunct, and (b) the claim that the semantics for the missing stuff is provided by a specific phrase within the sentence.

Now consider constituent coordination, in particular things of the form

(1) Su Pr1 and Pr2
(like My cousin walked up to the penguin and kissed it).
Su Pr1 and Pr2

(1) is interpreted like

(2) Su Pr1 and Su Pr2

So, the intuition is that the second part of (1) is missing something (specifically, a semantic subject), which is, however, provided by a specific phrase within the sentence, namely the Su in the first part of (1). If some other phrase in the sentence provided the missing stuff, or if it came from the larger context, then we'd have a "dangling conjunct" (Pr2).

This is, I think what we have in the following sentence from the Menlo Park (CA) Police Department's printed flyer "DRINKING AND DRIVING IS A DEADLY COMBINATION" (which is provided to drivers stopped at sobriety checkpoints):

We anticipate that a substantial benefit will be gained from the use of Sobriety Checkpoints by increasing the drunk drivers perception of the risk of being detected

and consequently may deter him or her from driving while impaired at any level.

[I've let caps and missing apostrophe stand. And of course the elevated diction. I've also split off Pr2 visually.]

What phrase, then, denotes the thing that may deter him or her from driving while impaired? One possibility is that it's the main-clause subject "we"; but surely that's not what the police department meant to say. Let's agree to discard "we anticipate that" and just look at the embedded material. This has a grammatical subject a substantial benefit and has only two finite VPs afterwards -- what's clearly Pr1 ("will be gained...of being detected") and what looks like it has to be Pr2 ("consequently may deter...at any level"), except that the cops surely didn't mean that benefits deter. So Pr2 is a dangling conjunct.

There are three candidates for the phrase supplying a Su for Pr2: "the use of Sobriety Checkpoints" (unlikely, since the following material "by increasing..." doesn't belong with "the use of Sobriety Checkpoints"), or either "the drunk driver's perception of the risk of being detected" or "the risk of being detected" (which at least make sense, but don't have a Pr1 parallel to Pr2).

Maybe the writer was aiming, sort of, at "by increasing the drunk driver's perception of the risk of being detected and consequently deterring him or her from driving while impaired at any level", but wanted to introduce the modal semantics of "may" and then got stuck with a (non-parallel) finite Pr2. Or perhaps the writer framed a thought along the lines of "because the use of Sobriety Checkpoints increases..." or "because Sobriety Checkpoints increase..." and compacted it into an all-purpose "by increasing...", but then forgot that Pr1 was non-finite.

But this is mind-reading. In any case, the result is a dangling conjunct. A stunner, in fact.

Posted by Arnold Zwicky at 06:44 PM

Linguist's Search Engine

I'm happy to announce that the Linguist's Search Engine is now up and running and available for use.

As I mentioned in an earlier post, we've been at work on an easy-to-use Web tool that permits linguists to do searches they could not easily do on Google or Altavista -- for example, searches involving syntactic structure, non-contiguous constructions, and the like. (I myself am interested in phenomena having to do with verb-argument realization, and there's just no way to ask a standard bag-of-words search engine for, say, all inflections of such-and-such verb used without a direct object NP.)

If we've done it right, what you'll find at http://lse.umiacs.umd.edu/ should be pretty self-explanatory. For those who prefer explanations of the non-self variety, we also have a Getting Started Guide. For those prefer to RTFM, feel free to R TFM. Finally, there is a discussion forum that will, we hope, give rise to a genuine LSE user community.

Please bear with us if there are technical glitches as we get started!

(One such glitch I just discovered: once you've registered, if your confirmation message sends you to http://sprinkles.umiacs.umd.edu, don't go there: that's an old message. Instead use the Log In link at http://lse.umiacs.umd.edu.)

Posted by Philip Resnik at 06:30 PM

January 19, 2004

Derivation and Disparagement

Mark Liberman's mention of Audhumlan Conspiracy's puzzle about when adjectives can double as vocative nouns (as in "Come here, gorgeous") put me in mind of another adjective-to-noun conundrum I've never been able to come up with a plausible solution for. This involves the names of social groups that are derived from monosyllabic adjectives, like black and gay. These words are routinely used used as bare plurals without any stylistic or affectual import:

Blacks (gays) have generally opposed the new policy.

Many blacks (gays) find the depictions offensive.

What's odd is that the words sound either disparaging or condescending when they're used in quantified NP's like the following (where the dollar sign denotes a usage that many would find socially unacceptable):

$ There are only two blacks (gays) on the faculty.

$ Would you object to having a black (a gay) for a roomate?

$ He's the first black (gay) to serve on the city council.

Note in fact that the same effect is present with white -- "Southern whites vote Republican" is fine; "She's dating a white" is odd at best. But the effect is absent with denotatively equivalent group terms that aren't derived from monosyllabic adjectives, like African American, Caucasian, or even the de-adjectival homosexual. (Note also that this effect is independent of the various complications associated with Jew and Jewish, which people seem to want to bring up when I ask about this. "Many Jews are opposed to the plan" is no better or worse than "There are only two Jews on the board.")

I have no idea why this should be, and am not even sure how to characterize the exact quantificational differences that are relevant here (it doesn't seem to come down exactly to a difference between individual- and kind-denoting terms). But given the nature of the regularity, the effect surely follows from the interaction of the semantics associated with this derivational process with some semantic entailment of these quantifiers rather than from an arbitrary convention. It's a mystery to me why things should fall out this way, but a tip of the hat to anyone who can nail this.

Posted by Geoff Nunberg at 12:55 PM

gloof, spooce, gloof twain, spooce, gairk

ShortTalk, invented by Nils Klarlund (formerly of AT&T and the University of Aarhus), is the jaw-dropping-est thing I've seen so far this year. It's "a new method for composing text by speech," designed to be "fluently interspersed with dictation," by "[codifying] natural and universal editing concepts that can be combined in command phrases, typically consisting of only two syllables." For example, speece truo "applies the no-space operator to the three identifiers preceding the cursor", and skoop cam "skips backwards until before the first comma."

Here's a more elaborate example, taken from Nils' page of audio demos:


z = x+y|


z = x + y|

ShortTalk solution (2.4s)

gloof, spooce, gloof twain, spooce, gairk


" gloof" means "press the left arrow" and "spooce" means "press the spacebar". The modifier "twain" means do it twice.

The demo page doesn't gloss gairk, but according to the ShortTalk Quick Reference it means "unravel" (with an optional positive integer, meaning "go to the nth last mark"), so I think it puts the cursor back where it started.

Since ShortTalk has been around for a couple of years without making it to Slashdot or Metafilter, we can safely predict that it is not a strongly infectious meme. However, I feel that there is an opportunity here for a science fiction writer. Text editing is a bit tame as an application, but analogous concepts could be applied in any task where limbs and eyes are occupied, users are highly trained, and rapid interaction is at a premium. So I look forward to reading about warriors of the future shouting things like "go ooft strange, clam ane, push lairk!" as they stride into battle. Or perhaps it'll be cyber-knitters, chanting some elaborated version of "knit one, purl two" as they create mythic tapestries or heal rifts in the fabric of space-time. In any variation, ShortTalkish has a nice sort of incantatory quality to it, and you could add chanted melodies in order to increase the information rate beyond the average 16 bits per second that Nils Klarlund says he can achieve, thus creating a more complete approximation to traditional stereotypes of spell casting.

Let's hope that Nils' invention doesn't have to wait more than three centuries to be picked up by a novelist, as John Wilkins' Philosophical Language did.

Posted by Mark Liberman at 10:37 AM

Space talk: Breathless speech

As human beings head into the cosmos what will happen to language and speech? In the thin atmosphere of other planets, perhaps it would be useful to be able to speak without breathing. In fact, I fancy that I just heard an example of this `breathless speech' in none other than President Bush's address about his new space policy.

Here's the recording as it was relayed here in Australia. Listen around the 35 minute mark, and decide for yourself: is he speaking without breathing, or was the breathing just edited out? (Is it circular breathing?) Whatever happened, this certainly makes for uncomfortable listening. I couldn't help being distracted, imagining him getting very red in the face and wondering when he would pause for breath!

Posted by Steven Bird at 06:51 AM

January 18, 2004

Another snowclone

Jonas Söderström, an information architect in Sweden (and major Procol Harum fan), reports another snowclone:

an  Xer  shade  of   Y

He notes that a whole page on the variations in common parlance stemming from the title line of the 1967 classic "A whiter shade of pale" can be found on a Procol Harum fan site called Beyond the Pale.

Posted by Geoffrey K. Pullum at 09:29 PM

The shadow of stylometry

Margaret Marks at Transblawg ( here and here) and The Discouraging Word have commented insightfully on an article at Science News Online about advances in stylometry.

In this connection, I think we need to consider carefully the consequences for living writers of turnkey software for statistical stylistics that they can apply to their own digital archives. Life may already be imitating the art of David Lodge, who dealt with this problem presciently in his 1984 novel Small World. I don't happen to have a copy of this novel at hand, but the same passage has clearly impressed others as well, since I was easily able to find it quoted on the web, for example in this July, 2003 post by Dr. Weevil entitled "Be Careful What You Wish For." Here's the passage that Dr. Weevil quotes:

“How did you come to lose faith in your style?” Persse enquired.

“I’ll tell you. I can date it precisely from a trip I made to Darlington six years ago. There’s a new university there, you know, one of those plateglass and poured-concrete affairs on the edge of the town. They wanted to give me an honorary degree. Not the most prestigious university in the world, but nobody else had offered to give me a degree. The idea was, Darlington’s a working-class, industrial town, so they’d honour a writer who wrote about working-class, industrial life. I bought that. I was sort of flattered, to tell the truth. So I went up there to receive this degree. The usual flummery of robes and bowing and lifting your cap to the vice-chancellor and so on. Bloody awful lunch. But it was all right, I didn’t mind. But then, when the official part was over, I was nobbled by a man in the English Department. Name of Dempsey.”

“Robin Dempsey,” said Persse.

“Oh, you know him? Not a friend of yours, I hope?”

“Definitely not.”

“Good. Well, as you probably know, this Dempsey character is gaga about computers. I gathered this over lunch, because he was sitting opposite me. ‘I’d like to take you over to our Computer Centre this afternoon,’ he said. ‘We’ve got something set up for you that I think you’ll find interesting.’ He was sort of twitching in his seat with excitement as he said it, like a kid who can’t wait to unwrap his Christmas presents. So when the degree business was finished, I went with him to this Computer Centre. Rather grand name, actually, it was just a prefabricated hut, with a couple of sheep cropping the grass outside. There was another chap there, sort of running the place, called Josh. But Dempsey did all the talking. ‘You’ve probably heard,’ he said, ‘of our Centre for Computational Stylistics.’ ‘No,’ I said, ‘Where is it?’ ‘Where? Well, it’s here, I suppose,’ he said. ‘I mean, I’m it, so it’s wherever I am. That is, wherever I am when I’m doing computational stylistics, which is only one of my research interests. It’s not so much a place,’ he said, ‘as a headed notepaper. Anyway,’ he went on, ‘when we heard that the University was going to give you an honorary degree, we decided to make yours the first complete corpus in our tape archive.’ ‘What does that mean?’ I said. ‘It means,’ he said, holding up a flat metal canister rather like the sort you keep film spools in, ‘It means that every word you’ve ever published is in here.’ His eyes gleamed with a kind of manic glee, like he was Frankenstein, or some kind of wizard, as if he had me locked up in that flat metal box. Which, in a way, he had. ‘What’s the use of that?’ I asked. ‘What’s the use of it?’ he said, laughing hysterically. ‘What’s the use? Let’s show him, Josh.’ And he passed the canister to the other guy, who takes out a spool of tape and fits it on to one of the machines. ‘Come over here,’ says Dempsey, and sits me down in front of a kind of typewriter with a TV screen attached. ‘With that tape,’ he said, ‘we can request the computer to supply us with any information we like about your ideolect.’ ‘Come again?’ I said. ‘Your own special, distinctive, unique way of using the English language. What’s your favorite word?’ ‘My favorite word? I don’t have one.’ ‘Oh yes you do!’ he said. ‘The word you use most frequently.’ ‘That’s probably the or a or and,’ I said. He shook his head impatiently. ‘We instruct the computer to ignore what we call grammatical words—articles, prepositions, pronouns, modal verbs, which have a high frequency rating in all discourse. Then we get to the real nitty-gritty, what we call the lexical words, the words that carry a distinctive semantic content. Words like love or dark or heart or God. Let’s see.’ So he taps away on the keyboard and instantly my favourite word appears on the screen. What do you think it was?’

“Beer?” Persse ventured.

Frobisher looked at him a shade suspiciously through his owlish spectacles, and shook his head. “Try again.”

“I don’t know, I’m sure,” said Persse.

Frobisher paused to drink and swallow, then looked solemnly at Persse. “Grease,” he said, at length.

“Grease?” Persse repeated blankly.

Grease. Greasy. Greased. Various forms and applications of the root, literal and metaphorical. I didn’t believe him at first, I laughed in his face. Then he pressed a button and the machine began listing all the phrases in my works in which the word grease appears in one form or another. There they were, streaming across the screen in front of me, faster than I could read them, with page references and line numbers. The greasy floor, the roads greasy with rain, the grease-stained cuff, the greasy jam butty, his greasy smile, the grease-smeared table, the greasy small change of their conversation, even, would you believe it, his body moved in hers like a well-greased piston. I was flabberglasted, I can tell you. My entire oeuvre seemed to be saturated with grease. I’d never realized I was so obsessed with the stuff. Dempsey was chortling with glee, pressing buttons to show what my other favourite words were. Grey and grime were high on the list, I seem to remember. I seemed to have a penchant for depressing words beginning with a hard ‘g’. Also sink, smoke, feel, struggle, run and sensual. Then he started to refine the categories. The parts of the body I mentioned most often were hand and breast, usually one on the other. The direct speech of male characters was invariably introduced by the simple tag he said, but the speech of women by a variety of expressive verbal groups, she gasped, she sighed, she whispered urgently, she cried passionately. All my heroes have brown eyes, like me. Their favourite expletive is bugger. The women they fall in love with tend to have Biblical names, especially ones beginning with ‘R’—Ruth, Rachel, Rebecca, and so on. I like to end chapters with a short moodless sentence.”

“You remember all this from six years ago?” Persse marvelled.

“Just in case I might forget, Robin Demspey gave me a printout of the whole thing, popped it into a folder and gave it to me to take home. ‘A little souvenir of the day,’ he was pleased to call it. Well, I took it home, read it on the train, and the next morning, when I sat down at my desk and tried to get on with my novel, I found I couldn’t. Every time I wanted an adjective, greasy would spring into my mind. Every time I wrote he said, I would scratch it out and write he groaned or he laughed, but it didn’t seem right—but when I went back to he said, that didn’t seem right either, it seemed predictable and mechanical. Robin and Josh had really fucked me up between them. I’ve never been able to write fiction since.”

He ended, and emptied his tankard in a single draught.

“That’s the saddest story I ever heard,” said Persse.

Posted by Mark Liberman at 08:19 PM

Of limes and racial epithets

In an early Language Log post, Geoff Nunberg commented on the D.C. District Court's decision to allow the Washington Redskins to retain their name, though the Court acknowledged that redskin is a derogatory, offensive term for Native Americans. The court's decision seems to rest on the following premise:
(A) A word W is inappropriate as a name for a product or corporation in a speech community C just in case every speech community within C regards W as offensive on every meaning that W can have in C.
I followed-up on Geoff's post with examples strongly suggesting that this isn't how people use racial epithets. I'd like to offer some new evidence for my position.

I think (S) is much closer to the relevant convention:
(S) A word W is inappropriate as a name for a product or corporation in a speech community C just in case some speech community within C regards W as offensive on at least one of the meanings that W can have in C.

My father, Art Potts, found additional support for generalization (S), in a rather unlikely place: a recent New York Times article on limes:
Karp, David. Latest green fashions come in many styles. The New York Times, Wednesday, January 14, 2004.
The relevant passage is about the Kaffir lime. It reads:
The fruit's name, however, remains problematic, because kaffir, originally an Arabic word for unbeliever, is used by whites in South Africa as a derogatory term for blacks. The name kaffir lime derives from Asia rather than South Africa, perhaps from Indian Muslims who encountered the fruit as an import from Thailand and Sri Lanka, where non-Muslims predominated. Nevertheless, the term is offensive to some, and the Thai name, makrut, is sometimes used as a substitute.
So, it seems that there exists a speech community in which the word kaffir has meanings (uses) on which it is offensive and meanings (uses) on which it is not offensive. If generalization (A) were correct, people in this community would not be shy about using the word in commercial transactions. But the truth, as reported in the article, is that they are shy about using it, just as generalization (S) predicts.

If people do in fact operate under condition (S), then this should be the court's guiding principle. The conventions of the relevant speech community are our only coherent standard for linguistic issues such as this one.
Posted by Christopher Potts at 02:24 PM

Pullum in the Times

Language Logger Geoff Pullum contributes many of the insights, and nearly all the one-liners, to a piece in the Week in Review section of today's New York Times:
Glater, Jonathan D. Cold, Colder, Coldest: Then What?. The New York Times, Sunday Week in Review, January 18, 2004.
Geoff's quips are brisk, and sometimes sexy, but they nonetheless work to establish that languages are not just big bags of words, and that, in particular, Eskimo languages are not just big bags of words for snow.
Posted by Christopher Potts at 12:43 PM

Funding terrorism is, like, wrong

According to an AP story about financial probes in the United Arab Emirates:

Sultan bin Nasser al-Suweidi, head of the Central Bank, told AP that financial officials are working closely with the U.S. government to block terror funding avenues. Central Bank officials have trained with U.S. investigators and other international experts to help them identify money launderers and suspicious transactions, he said.

"We don't want wrongdoers. We totally don't need them," said the U.S.-educated al-Suweidi, who arrives at his office before 8 a.m. and is often seen working until 10 at night in the fortress-like Central Bank headquarters in the Emirates' capital, Abu Dhabi.

I think it's great that World English is assimilating the expressive resources of America's youth.

Posted by Mark Liberman at 11:26 AM

C'mere, pedantic

Ryan Gabbard at The Audhumlan Conspiracy has asked for help in figuring out "when [it is] grammatical to call someone by an adjective." He says

These all sound fine to me: Come here

But these sound horrible: "Come here

My judgments are the same. The context is a discussion of the proper way to address a solicitor general.

The set of adjectives that work in T-shirt phrases like "I'm with stupid" seems to be similar. There's also the question of which adjectives can be used as nicknames, like "Slim", "Red", and so on -- these can of course also be used as vocatives.

Posted by Mark Liberman at 08:28 AM

January 17, 2004

Public virtue and historical subversion

I'm feeling virtuous about my wedding vowels public service announcement --I trust that it's been helpful to the 19 internet pilgrims who have reached our site so far this morning by asking various search engines about "wedding vowels", "marriage vowels", "renewal wedding vowel announcement", "renewing wedding vowels", "renewing vowels", "vowel renewal ceremonies", and "vowel renewal ceremonies in vegas". If you're looking for a good name for a new phonology-related weblog, you could do worse than "Renewing Vowels", you know?

I'm also feeling good about the 10 individuals who found Geoff Pullum's piece on (non-) recursive sic by searching for "sic meaning", "(sic) meaning", "[sic] meaning", "meaning of sic", or "the meaning of the (sic) notation". They probably got more than they bargained for, but it's all good, and the basic information is right up there at the beginning. Maybe we should take this as a cue to say something enlightening about e.g., i.e., etc., et al., ibid. and whatnot.

I'm even happy about the 7 souls who have had the opportunity to be saved from error by reaching our site on the basis of a search for "names for snow", "snow words in eskimo", "meaning snow other languages", or "eskimo snow", though I suspect that during the same few hours, another 700,000 innocents have picked up the false Eskimo snow meme.

Finally, it's nice to realize that another hundred-odd folks have found something interesting and useful in our archives this morning by searching for "history of emo", "languages of middle earth", "barnum statement", "malapropism", "paracingulate cortex", "larry horn negation", "phineas gage", and so on.

Amid all this flood of public virtue, it doesn't bother me at all that a few people have struck out by finding irrelevant Language Log postings somewhere down the list of pages returned by difficult searches like "american indian language containing the word wax", or "evolutionary tree of the rattlesnake".

However, I do feel rather guilty about the person who found us first on the list of (647,000) pages returned when they asked Google "who is harvard university named after", or the (apparently different) individual who asked Yahoo "who was harvard university named for", and found, at the top of the list of 436,000 answers, the same Geoff Pullum piece on Universities named after linguists (now altered to include a public service announcement). I have a nagging worry that these folks may have gone off feeling satisfied, believing Geoff's deadpan assertion that "Harvard University was named after Sir Walter Montmorency Belgrave Harvard, who in 1689-1691 traveled by donkey through much of what is now western Massachusetts and parts of upper New York State, recording food terms in the languages of the local Indians."

The problem here is that linguists have no conventionalized equivalent to the canid play bow. And even if Geoff had crouched on his forelimbs while leaving his hind legs fully extended, wagging his tail, and barking, it would probably have been misinterpreted by anyone watching him compose his post, and certainly would have been missed by those reading it.

Posted by Mark Liberman at 10:59 AM

Hi Lo Hi Lo, it's off to formal language theory we go

In my advertisement for Fitch and Hauser's new Science paper, I suggested that "one should be careful not to overinterpret these results." I'd like to explain what I meant. The experiment is a very interesting one, but Fitch and Hauser describe it in terms that are likely to mislead many readers.

Fitch and Hauser write:

Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

They are careful to say that their monkeys "were unable to master a grammar" at the phrase structure level. However, they assert in a more general way that "monkeys can spontaneously master such grammars", referring to finite-state grammars as a class. But the experiment was symmetrical -- it showed that tamarins could recognize deviations from the pattern imposed by one particular grammar, not all grammars of that class.

However, the interpretive problem is a much deeper one. The two particular grammars that F & H used in their experiment were so simple -- effectively generating only two short sentences each -- that it seems wrong to elevate the discussion to the level of distinctions among grammar types at all. Their (very interesting) result could alternatively be described as follows:

Given exposure to instances of the patterns ABAB and ABABAB, tamarin monkeys showed increased interest in patterns AABB and AAABBB, perhaps because these contained two to four copies of the salient (because repeated) two-element sequences (bigrams) AA and BB, which they had not heard before. By contrast, given exposure to instances of the patterns AABB and AAABBB, other tamarins did not show significantly increased interest in the patterns ABAB and ABABAB, perhaps because they contained only one or two copies of the previously-unheard bigram BA, which may also be less salient because it does not involve a repetition.

Given the same stimulus sequences, human subjects were able to categorize the new patterns as different, regardless of the direction of training and testing, perhaps because their threshold for noting statistical sequence differences was lower, and perhaps because they were able to remember longer sequences, thus noting that the training material AABB and AAABBB did not contain the four-element sequence ABAB.

Put this way, it's an experiment about memory span and/or sensitivity to statistical deviations. No talk about grammars, much less hierarchies of grammatical complexity, is required.

Here are the details. Fitch and Hauser explain about their stimuli that

The FSG was (AB)n, in which a random "A" syllable was always followed by a single random "B" syllable, and such pairs were repeated n times. The corresponding PSG, termed AnBn, generated strings with matched numbers of A and B syllables. In this grammar, n sequential "A" syllables must be followed by precisely n "B" syllables.

So the "finite state" language is (AB)n for n=2 and n=3, i.e. exactly the set of two patterns {ABAB, ABABAB}, while the "phrase structure" language is AnBn, for n=2 and n=3, i.e. exactly the set of two patterns {AABB, AAABBB}.

F&H motivate the lower limit on n (but not the upper one) as follows:

Because previous work demonstrates that tamarins can readily remember and precisely discriminate among strings up to three syllables in length, we restricted n to be two or three in both of the above grammars.

So it seems that these two "languages" -- intended to represent whole classes of formal grammatical power -- consisted of just two strings each, one four symbols long and the other six symbols long? Well, superficially, no -- the languages are much bigger than that, though still finite. A and B represent classes of syllables, with A being one of {ba di yo tu la mi no wu}, while B is one of {pa li mo nu ka bi do gu}. There are eight options for each class, and strings of syllables are formed by random selection without replacement, so the number of possible syllable strings in the FSG language is

8*8*7*7 + 8*8*7*7*6*6 = 116,032

and the number of possible syllable strings in the PSG language is the same.

Much better! Or maybe not... As Fitch and Hauser explain,

The A and B classes were perceptually clearly distinguishable to both monkeys and humans: different syllables were spoken by a female (A) and a male (B) and were differentiated by voice pitch (> 1 octave difference), phonetic identity, average formant frequencies, and various other aspects of the voice source.

In other words, the listener (human or tamarin) could forget about all the ba di yo tu stuff, and just pay attention to whether the syllable was spoken by a high-pitched female speaker or a low-pitched male one. To make it easier, there was just one female speaker and one male speaker, so you could also distinguish the classes by speaker identity.

Now the languages are down to two sentences each again. The "finite state grammar" language contains the two sentences

{ Hi Lo Hi Lo , Hi Lo Hi Lo Hi Lo }

and the "phrase structure grammar" language contains the two sentences

{ Hi Hi Lo Lo , Hi Hi Hi Lo Lo Lo }

Fitch and Hauser consider and reject a version of the alternative memory-span interpretation given above:

An alternative explanation for these results might be that tamarins fail the PSG because their ability to differentiate successive items is limited to runs of two. If this were true, it would account for the asymmetric results we obtained because they would be able to encode AB AB AB patterns but be unable to process the longer runs of AAA BBB. However, a subanalysis gave the same pattern of results even when n was limited to two (ABAB versus AABB).

This addresses a different alternative interpretation from the one I offered (in red above). It doesn't affect my suggested alternative. In the n=2 case, the FSG deviation (AABB given experience with ABAB) involves two novel bigrams, both repetitions; while the PSG deviation (ABAB given experience with AABB) involves only one novel bigram, not a repetition. This is plenty of differentiation to base an explanation on. It's also possible, as they suggest here, that the alternating sequences are grouped as (AB)(AB), which would make bigrams starting in odd-numbered positions more salient than those in even-numbered positions. This would make my account work even better, since the novel bigram BA might not even be registered as a unit. It's plausible that such binary grouping of alternating sequences is done by humans, and if it were true of monkeys as well, that would be interesting.

The familarization/discrimination paradigm is a promising one for animal studies, as it has been for studies of human infants, and the results so far are interesting, but let's face it, it's really stretching things to claim that we've learned anything about embedding, recursion, etc. -- or even about the various kinds of dependencies that finite-state grammars can express. It would be very unwise, for example, to place any wagers on the ability of tamarins to learn to recognize exactly those symbol sequences that an arbitrary egrep-style regular expression matches -- though the implication of F & H's claim that "monkeys can spontaneously master [finite state] grammars" is that they should be able to do this.

You could use the familiarization/discrimination paradigm, with appropriately varied patterns in training and testing, to explore the limits of monkeys' abilities in that area. Similarly, the plausibility of my alternative explanation in terms of sequence statistics could be tested against a more general account in terms of grammar types. We have to recognize that it's going to be hard to design these experiments, since each will involve a necessarily finite sample from each of a small number of "grammars" or other pattern spaces, and each such sample will be subject to multiple alternative descriptions. In fact, in a mathematical sense, the problem is impossible. However, it should be possible to explore the question in a way that will lead reasonable people towards provisional acceptance of interesting general conclusions about how to characterize the abilities of different animals in such experiments.

I expect that Fitch and Hauser, who are both serious researchers with a history of excellent work, will do such things, along with other cognitive scientists. But let's hold off on the general claims until the research has been done!

I wish that I could say that I'm surprised that Science let these excessive claims pass into print. There are two forces at work here -- the desire for big results, and the vagaries of reviewing at an interdisciplinary journal -- that together lead to such outcomes all too often.

[Update 9/1/2004: A later paper by Perruchet and Rey, reporting results on human subjects that call F&H's characterization of these experiments into question, is discussed here. ]

Posted by Mark Liberman at 07:18 AM

January 16, 2004

President Roh and Dr. No

The romanization of Korean President Roh Moo-Hyun's family name, spelled Roh, pronounced [no], seems to be a cause of on-going bewilderment. Bob Woodiwiss wondered about it, en passant, in a 1997 screed in the Cincinnati alternative newspaper Citybeat devoted to the romanization of the name of the late Chinese leader Deng Hsiao-Ping. More recently, this topic has been a concern of commentators on the right.

In December of 2002 Jay Nordlinger of the National Review had this comment:

I have always chafed at nonsensical transliteration - for example, we're supposed to call President Roh - South Korea's newly elected leader - President No. (Does he have a Ph.D.? With an eye to James Bond, he could be Dr. No!) (Then again, Jesse Helms was "Senator No" - Senator Roh?)
If I had more time, I'd start a For Common Sense in Transliteration committee. And eternally, I'm reminded of Arsenio Hall's complaint about the spelling/pronunciation of Sade (moniker of the pop singer): "That's like me saying, `My name is B-o-b, but I pronounce it `linoleum'"
A month later, in his column "On Language" of January 19, 2003, language pundit William Safire raised the same issue. He reports that he was unable to obtain a satisfactory explanation from President Roh's media adviser, from a spokesman for the Korean Embassy, and from a British-born Buddhist monk at Sogang University named An Sonjae, He finally decided that the best explanation was that offered by former Korean President Roh Tae Woo, who shifted from the spelling No to Roh because the NO on the nameplate of his army uniform reminded him of the English word no, which he felt was too negative for his positive personality.

All the President's Media Advisor, Ben Limb, could come up with is that this spelling is common practice. Safire is right: that isn't much of an explanation.

The Korean Embassy spokesman explained:

The r spelling is a function of the hangul letter and how it is pronounced when that Korean initial letter is followed by that vowel. It is a weird grammar rule.
This is more in the nature of an explanation, but aside from the fact that it doesn't spell things out adequately for someone who doesn't know Korean and its writing system, it doesn't work. It would make sense if President Roh's name were spelled with the hangul letter ᄅ, which is the letter used to write [r], but it isn't. In hangul, this family name is spelled with the letter /n/ (ᄂ), as you can see on the Korean version of his web page: 노무현 대통령. (This says Roh Moo-Hyun Dae To-Ryong "President Roh Moo-Hyun".)

Here is Safire's account of what he learned from Dharma Teacher An Sonjae:

The Korean alphabet, known as hangul, contains a symbol that is usually romanized (spelled in English) as r. When this symbol comes first, it is pronounced as ''a liquid n'' if the vowel following is a simple one, but disappears completely if it is followed by a diphthong, a gliding sound like oi. So? ''The English spelling Roh,'' An says, ''reflects the original Chinese pronunciation more accurately than the spelling in Korean does, but the pronunciation Noh reflects the modern Korean pronunciation.''

The first part of this is a slightly garbled version of the same explanation as that offered by the Korean embassy. And it fails for the same reason: President Roh's name does not begin with the hangul letter ᄅ. The second part invites the inference that President Roh chose the romanization Roh because he wanted to preserve the original Chinese pronounciation, which immediately raises the question of why he would want to do that. It's easy to see why Safire found all of this confusing.

To get to the root of this, we need to understand a bit about the phonology of Korean. Korean has both an [l] sound and an [ɾ] sound. [ɾ] is a kind of r, though it isn't exactly the same as the [r] that most English speakers use. It's the sound that most American English speakers produce in the middle of words like butter and writer if they say them casually. These two sounds do not occur in the same environment; they are in complementary distribution. To be precise, the [l] occurs in syllable-final position and when adjacent to another /l/, and the [ɾ] occurs elsewhere. We therefore say that there is a single phoneme, which we somewhat arbitrarily name /l/, which has two allophones, [l] and [ɾ]. The same morpheme can be pronounced with both if we change the context in which it occurs. For example, the Korean word for "language" is /mal/. When pronounced by itself it has an [l], that is [mal]. But when we add the nominative case marker /i/ the /l/ is no longer syllable final and we get [maɾi].

In Southern Korean dialects, in most cases, /l/ becomes /n/ in word-initial position. That is why all of the old loans from Chinese that began with /l/ in Chinese have initial /n/ in Korean. (In this case, the /l/ and the /n/ are not allophones of the same phoneme. Since this rule changes one phoneme into another, it is what is called a morphophonemic rule.) Until recently, this rule applied without exception, but it has been disrupted to some extent by the influx of loans with initial /l/. The word "radio", now pronounced with an initial [ɾ], used to be pronounced with initial [n]. It has always been spelled 라디오 /ladio/. A Southern Korean reader would know to pronounce the letter ᄅ as [n] in word-initial position.

In Chinese characters President Roh's name is written like this: 盧武鉉. The character 盧, which means "vessel, pot, stove", was pronounced something like [lwo] in Middle Chinese, which is when it was borrowed into Korean. Its current pronounciation in Cantonese is [lo], in Japanese [ɾo]. So it was borrowed into Korean as /lo/. In Northern Korean dialects this is pronounced [ɾo]. In Southern Korean dialects it is pronounced [no] (assuming it is word-initial). Such /n/s not only derive historically from Chinese /l/s, but if they can appear in non-word-initial position, they actually alternate with /l/ (usually pronounced [ɾ]). For example, consider the two morphemes 理 /li/ and 論 /lon/. These can be combined in either order, yielding:


論 /lon/ is pronounced with an initial [n] in "logic" where it is word-initial, but with an initial [ɾ] in "theory" where it is intervocalic. (The reason that 論 /lon/ ends in /l/ in "logic" is that its final /n/ assimilates to the initial /l/ of 理 /li/. The absence of an [n] at the beginning of "theory" is explained below.)

So, what Dharma Teacher An and the Korean Embassy spokesman were getting at is that word-initial [n] in Southern Korean, as in President Roh's name, can have two sources, /n/ and /l/. Historically, and arguably synchronically as well, the particular [n] that begins President Roh's family name is underlyingly an /l/, not an /n/. The spelling Roh reflects the underlying form (keeping in mind that [l] and [ɾ] are allophones of a phoneme that we have been calling /l/ but with no greater arbitrariness could call /ɾ/). The reason that their explanations didn't quite work is that they made the mistake of referring to letters rather than to sounds. President Roh's name is no longer written with the hangul letter ᄅ /l/. It once was, but the spelling changed quite some time ago. It does, however, underlyingly begin with the sound /l/, and so can legitimately be romanized with an initial l or r.

If you've gotten this far, you may be wondering what the Korean Embassy spokesman was talking about when he said "when that Korean initial letter is followed by that vowel". He was referring to another rule of Southern Korean phonology, one that deletes both /n/ and /l/ immediately preceding the vowel /i/ and its semi-vowel counterpart /j/. That is why the common family name 李 (hangul 이, formerly 리) is pronounced [i] in Southern Korean dialects, and why it is variously romanized Lee, Li, Yi, Yee, Ri, Ree, and Rhee. So the embassy spokesman was referring to the fact that word-initial /l/ is pronounced [n] except before /i/ and /j/, where it is not pronounced at all. Dharma Teacher An was referring to the same thing when he said that the [n] "disappears completely if it is followed by a diphthong", though he didn't state the environment for deletion quite correctly.

You may still be wondering why President Roh would romanize his name in a way doesn't correspond straightforwardly to its pronounciation. Safire did:

Wait a minute. If the name in Korea is pronounced with what we romanize as an n, why do we write it in English to make it sound like an r? That defeats the whole idea of transliteration -- imitating the sound of one language in the alphabet of another. Makes no sense.

The short answer is that transliteration serves a number of sometimes conflicting purposes. Enabling foreigners to pronounce the words of your language is only one of them. I haven't discussed this (or anything else) with President Roh, but it's a fair guess that when he decided how to romanize his name, how foreigners would pronounce it wasn't his chief concern. The romanization may well have been intended in the first instance for other Korean speakers.

Anyhow, there shouldn't really be much need to transliterate 한글 (hangul). 한글 is a terrific writing system and is not hard to learn. Everybody should. Why bother transliterating something as easy and pretty as 노?

Posted by Bill Poser at 09:33 PM

The Canid Play Bow

I don't know about any research on doggy sighs, but there's a serious biological and philosophical literature on the canid play bow, "a highly ritualized and stereotyped movement that seems to function to stimulate recipients to engage (or to continue to engage) in social play." It was first studied by ethologist Marc Bekoff in the 1970s, and was featured in philosopher Ruth Millikan's 1984 work Language, Thought and other Biological Categories. The Cogprints archive includes a 1997 chapter by Bekoff and philosopher Colin Allen, Intentional Communication and Social Play: How and Why Animals Negotiate and Agree to Play.

Here's a quote from the Bekoff and Allen chapter that may help to explain why this is serious stuff:

From the intentional stance, if a believes that b believes that a desires to play (third-order) it would seem that ideal rationality would also require that a believes that b has a belief (second-order). But from a Millikanian perspective this more general second-order belief, if it requires a to have a general belief detector, may actually be more sophisticated than the third-order belief which supposedly entails it. A general belief detector may be much more difficult to evolve than a specific belief detector, for the detection of specific beliefs may be accomplished by the detection of correspondingly specific cues.

If this is correct, then on Millikan's account Jethro (Marc's dog) may be capable of the third-order belief that (or, at least, a state with the intentional content that) Sukie (Jethro's favorite canid play pal) wants Jethro to believe that her bite was playful not aggressive, even though Jethro is perhaps limited in his ability to represent and hence think about Sukie's second-order desires in general.

Linguists who work on Gricean accounts of what people mean by what they say should think more carefully about the analogous issues, though for our species, evolved cultural norms may be just as important as evolved biological systems.

I've observed, in an entirely unscientific way, that the canid play bow causes friction between domestic dogs and cats who may otherwise get along very well, since cats don't understand it, and dogs don't understand that cats don't understand it. I've had relationships like that -- haven't you?

Posted by Mark Liberman at 09:24 PM

Northern Exposure

The media are currently excited about the frigid termperatures in the Northeast. Most of the news reports aren't linguistically focused, maybe because it's hard to talk through frozen lips. But yesterday's New York Times had one gem, a quotation from a spokesperson for a Vermont ski resort: "People can certainly come up and ski tomorrow and have a nice day, as long as their exposed skin is covered.'' Unexposed exposed skin, now there's a concept!

But it's still interesting, because we can all interpret it accurately, even though it doesn't make much literal sense: it's obvious to all readers that "exposed skin" refers not to armpits or knees or even more private body parts, but to bits of skin that would ordinarily see the light of day if it weren't really really cold out. Still, as the man said, we're all naked under our clothes.

Posted by Sally Thomason at 07:48 PM

Language in humans and monkeys

The current (January 16) issue of Science magazine has a fascinating article by Tecumseh Fitch and Marc Hauser entitled Computatational Constraints on Syntactic Processing in a Nonhuman Primate, and an equally though-provoking "Perspective" piece by David Premack entitled Is Language the Key to Human Intelligence?

Anyone interested in the biology of language should read both. I'll see if I can find accessible copies for those without subscription access to Science. Or you could (shudder) go to the library...

Fitch and Hauser's abstract:

The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

I believe that the techniques they used grew out of methods developed by Saffran, Aslin and Newport for testing the abilities of human infants to learn stochastic grammars of various sorts. In any case, the stimuli and details of the testing procedures are available from Science online.

The basic method is to expose the subjects (whether humans or tamarin monkeys) to a body of examples generated by a certain kind of artificial grammar, and then to see how they react to new examples consistent or inconsistent with that grammar. This is actually done by using two different grammars in a symmetrical way on two different subject populations. The key result is shown in the figure below (their Fig. 2):

One should be careful not to overinterpret these results. [Update: read this to see why.] I have no doubt that the observed effect could be modeled (e.g. using connectionist techniques) as a quantitative difference in capacity rather than a qualitative difference in accessible grammar level. Still, the experments are suggestive.

Premack opens his Perspective by asking:

Dobzhansky's quip "All species are unique, but humans are uniquest" raises the question: Is it language, the symbol system that evolved only in humans, that makes humans the "uniquest"?

and closes it by answering:

Human intelligence and evolution are the only flexible processes on Earth capable of producing endless solutions to the problems confronted by living creatures. Did evolution, in producing human intelligence, outstrip itself? Apparently so, for although evolution can do "engineering," changing actual structures and producing new devices, it cannot do science, changing imaginary structures and producing new theories or explanations of the world. Clearly, language and recursion are not the sole contributors to human uniqueness.

with a lot of interesting stuff in between.

[tip from Fernando Pereira]

[More on this here and here.]

Posted by Mark Liberman at 02:42 PM

Snowclones: lexicographical dating to the second

At last a suitable name has been proposed for the some-assembly-required adaptable cliché frames for lazy journalists that have received occasional discussion on Language Log (here, in the first instance). I mean formulae like these (where the N, X, Y, Z are filled in to taste):

If Eskimos have N words for snow, X surely have Y words for Z.

In space, no one can hear you X.

X is the new Y.

Glen Whitman, who discussed this topic on Agoraphilia, taking his cue from the first example, proposes calling these non-sexually reproduced journalistic textual templates by an appealingly simple name: we can call them snowclones.

Hearing no other nominations, I now hereby propose that they be so dubbed. The clerk shall enter the new definition into the records.

Since we have a record of the exact time at which Glen hit Send and transmitted the new term to me (the first person to read it), lexicographers are in luck here: they can date the coining of snowclone to the second. So they may like to note for their future reference that this term was coined at 22:56:57 (that's 3 seconds before 10:57 p.m.) on Thursday, January 15, 2004, in Northridge, California.

[Update 10/19/2005 (myl) -- some other Language Log posts on Snowclones, added for those who find this via the Wikipedia entry for "snowclone":

Bleached conditionals (10/21/2003)
Phrases for lazy writers in kit form (10/27/2003)
Clear Thinking Campaign gives "Fogged Spectacles" Award to John Lister (12/02/2003)
Another snowclone (1/18/2004)
When did you first hear this pattern? (1/25/2004)
Snowclones are the dark matter of journalism (1/28/2004)
"I, for one, welcome our new * overlords" (1/29/2004)
In Soviet Russia, snowclones overuse you (1/29/2004)
The memetic phylogeny of "our new * overlords" (1/30/2004)
Expression's vast varieties (3/3/2004)
The right X and the right Y (4/3/2004)
The backpack of it all (4/18/2004)
Putting the X in Y (4/19/2004)
The A-er the B, the C-er the D (4/19/2004)
Is 30 the new 42? (4/23/2004)
Cuteness (4/24/2004)
X are from Mars, Y are from Venus (4/27/2004)
Have X, will travel (5/25/2004)
Not the * I know: let * be * (7/3/2004)
Snowclone sightings (10/21/2004)
Twos and threes (11/27/2004)
Homeric objects of desire (1/7/2005)
* me P and call me * (1/25/2005)
Defecated to eggcorn fans everywhere (1/31/2005)
Not everything that passes (1/31/2005)
Smart kids (2/27/2005)
Liberalism is the new communism (3/27/2005)
Once a snowclone, always a snowclone (5/17/2005)
Antique snowclones (5/17/2005)
The hounds of ADS-L (5/18/2005)
An avalanchlet of snowclones (5/21/2005)
X-ing outside the Y (6/2/2005)
That's why they call it X (6/3/2005)
Polysemy in action (6/3/2005)
What is this 'snowclone' of which you speak? (7/3/2005)
Documenting snowclones, dating them (7/4/2005)
A few players short of a side (7/14/2005)
You can call it X all you want (8/23/2005)
Two, three ... many prefabricated phrases (9/13/2005)
Wikipedia on Simpsons words (9/26/2005)
What is this Harvard? (10/12/2005)
Playing one (10/12/2005)
To snowclone or not to snowclone (10/12/2005)
Playing one 2 (10/13/2005)
Playing one 3 (10/176/2005)
Critical tone for a new snowclone (10/18/2005)
My big fat Greek snowclone (10/19/2005)
Is splanchnic just another word for schmuck? (10/28/2005)
Snowclone shortening (11/14/2005)
Eating, drinking, sleeping snowclones(11/15/2005)
Eating, drinking, sleeping snowclones, part 2: the early years (11/16/2005)
Snowclones hit the big time(12/5/2005)
The proper treatment of snowclones in ordinary English(2/4/2006)
Not your mother's snowclone(2/20/2006)
No snowclone left behind2/25/2006)
Crazy talk(3/1/2006)
Tracking snowclones is hard, let's go shopping!(1/2/2006)
The entire United State wept(1/3/2006)
Noclone (4/5/2006)
The Agenbite of Onion Wit (4/7/2006)
Brokeback generalizations (4/7/2006)
Respect (4/9/2006)
More brokeback generalizations (4/9/2006)
Best. Snowclone. Evar (4/10/2006)
A pirated Barbie-ism (4/11/2006)
X-back mountain (4/13/2006)
Snowclone Mountain (4/13/2006)
It's not hard out here for a cliché (4/17/2006)
It's X's world, we just live in it (4/21/2006)
I found my snowclone in Palo Alto (4/23/2006)
Springtime for Snowclones (6/6/2006)
Snowclones of linguification (7/9/2006)
X as the Y of Z (7/28/2006)


Posted by Geoffrey K. Pullum at 01:37 PM

Pet Communication

Well, it doesn't really merit mention as a language issue, but I'm curious: do cats sigh in a communicative way (as opposed to, for example, purr-sighs indicating contentment)? I ask because my dog most definitely does, when she fails to get what she wants -- a sound best glossed as "harumph!". A quick search confirms that others do list sighs among the forms of canine communication.

And of course, there's the famous quote from Snoopy: "Yesterday I was a dog. Today I'm a dog. Tomorrow I'll probably still be a dog. Sigh! There's so little hope for advancement."

Posted by Philip Resnik at 10:08 AM

January 15, 2004

Gender: just a quick check

When Mark Liberman recently pointed out that the Gender Genie is freely available on the web to check the sexual identity of writers, I realized I had a perfect opportunity to just run a quick double-check on my own sexual identity, not that I would need to, of course.

I chose a representative short passage to run through the algorithm -- the top paragraph of my most recent blog post, "This isn't poetry, this is abuse":

Cursed as I am with the habits of a scholar, I actually read the little bits of paper and newsletters that arrive in the envelope with every bill I get, just in case I need to know anything that they tell me. With my latest mortgage bill I received a special extra slip of paper containing a poem called "What we can do for you!". It is so bad that readers of a delicate disposition may decide they don't want to look at it.

The result: female score 155, male score 127. So I am... female.

I guess a lot of guys would be a bit shaken by a result like this. But here in laid-back, tolerant, bisexual Santa Cruz, a confident, secure guy like me can deal with it. Yes, sirree. No problem about my masculinity. Nope, none at all. Oh, no. Babe magnet, that's me. Did I mention I drive a truck?

Posted by Geoffrey K. Pullum at 05:58 PM

A month late and a bunch of links short

Well, I don't feel so bad now. We might have been a couple of weeks late on the Alphaville Herald story, but Amy Harmon in the New York Times is several weeks later. Seriously, Harmon's piece is well worth reading. It's rather detached, though -- you'd think that an intrepid band of investigative reporters would fan out into the seamier side of The Sims Online, to check out Peter Ludlow's charges first hand.

Also, I miss the hyperlinks. Harmon's article is full of references to online sources: just in the top half of the article's first browser screenful, she cites Ludlow's online newspaper The Alphaville Herald; the salon.com article; a weblog run by Yale law students. But of course, the NYT doesn't have links. You can find them with google, but it takes time and effort.

The linguistic relevance? Well, Peter Ludlow is a philosopher of language, in his day job. Or how about this: networked artificial communities offer an interesting opportunity for modeling communicative interaction. You can see the whole thing as one big Turing Test Bench. Of course, this is not how the participants see it, but that's the whole point.

Posted by Mark Liberman at 04:22 PM

A new linguistics weblog: Semantic Compositions

On January 13th, 2004, an anonymous author kicked off a weblog named "Semantic Compositions," announcing the intent "to expose and hopefully popularize some of the ideas to come out of theoretical and applied linguistics, as well as related disciplines like computer science and psychology." Welcome, neighbor!

SC's writer self-identifies as "a commercially employed computational linguist with an undergraduate degree in linguistics, a graduate degree in computational linguistics, and additional training in electrical engineering and music".

(S)he provides enough additional information that a nosy person could probably complete the identification with a bit of web searching. I'm not a nosy person, so I'll limit myself to running SC's text so far through the Gender Genie, which thinks SC is male -- Female Score: 2329, Male Score: 4186. I'll take this as license to refer to SC's author as "he", "him" and "his", subject to correction of course.

[Update: Language Hat points out that the very next sentence past the passage I quoted above reads "He also holds a patent in search engine design, and as of this writing, has a further patent pending for a related natural-language parser design." And while I'm in confessional mode, I have to add that the Gender Genie thinks that Rosanne at the X-Bar is even more male than Semantic Compositions: Female Score: 1937, Male Score: 4525. Adding it all up, the Overall Score is: Computational Linguistics 0, Careful Reading 1.]

Posted by Mark Liberman at 02:51 PM

The curious case of quasiregularity

A few days ago, I got a note from Mark Seidenberg, commenting on an earlier piece in which I cited his observation that quasiregularity is ubiquitous in language. I described a controversy among psycholinguists about where linguistic quasiregularity comes from, and Mark took me to task for leaving out some key points. With his permission, I'm reprinting his note below (in blue), interspersed with my own comments.

Hi Mark. I came across the discussion of quasiregularity (quasi-regularity?) in Language Log. Thanks for mentioning it. and positively! I think there's an important point that was misunderstood, however. The quasiregularity notion emphasizes statistical notions of language and in particular in the mappings between codes (e.g. spelling and pronunciation; the semantics of a verb and how it is realized phonologically in a language; the partial correlations between form and meaning that seem to give rise to morpheme-like units, etc).

As you said, we think these regularities emerge in connectionist networks that learn via statistical learning procedures. However, the Pinker theory does not maintain the same notion of quasiregularity but merely suggest that it has a different basis. To the contrary, he holds to a strict dichotomy between rule-governed forms and exceptions (in keeping with the traditional approach to the lexicon in which there are rules for generating various forms and the lexicon encodes the unpredictable information). Thus, the two types of forms are said to exhibit different structural properties, are learned by different mechanisms (rule-induction, rote memorization), produce different behavioral effects (e.g., exceptions are affected by frequency, rule-governed forms are not), and are represented in different parts of the brain (leading to "selective" impairments in generating one or the other type of form).

I agree, and I think Pinker, Ullman and others would as well. But I was arguing against the odd (though widespread) notion that linguistic irregularity is immoral, and I wanted to stress that everyone's psycholinguistic theory has mechanisms for creating morphophonological patterns but also mechanisms that tend to prevent these patterns from being perfectly regular.

For Seidenberg, the two mechanisms are the same, and he makes an argument (which I find persuasive) that quasiregularity is both ubiquitous as a descriptive fact and also inevitable as a theoretical consequence of a connectionist approach. From the perspective of people like Pinker and Ullman (and many others before them), irregularity arises because linguistic knowledge comes in two flavors, analogous to the distinction between semantic and procedural memory. I also find this idea somewhat persuasive: at least the mechanisms (and brain systems) involved in knowing a word and in putting words together in a sentence seem to be different.

The two-mechanism theory can account for similarities among the irregular forms (sing-sang ring-rang etc), via the "associative net" that has been part of the story since Pinker and Prince 1988. What it can't do is capture the similarities between rule-governed forms and exceptions, which are pervasive. As you know most of the irregular past tenses share structure with rule-governed forms; sent, hit, sat, bid, hid, and so on all end in a phoneme (for lack of a better word) that is one of the realizations of the regular past tense; exceptions like SLEPT pattern with regulars like STEPPED; the problem isn't with application of the rule but with deformation of the stem, and so on. I know you know this (as did Halle and Mohanan). Anyway, it is easy for our approach to capture these partial regularities, which arise from a variety of sources and can be encoded to whatever degree they are present by a multilayer net trained using a statistical learning procedure such as backprop. Pinker has to treat the overlap between rule governed forms and exceptions as coincidental at best, or the detritus of historical events like diachronic change (which of course I think can also be handled by the same mechanisms we use for acquisition and processing).

Mark is making a very important point here, highlighting something that has always puzzled me about the attitude of linguists towards the Pinker-Seidenberg controversy. To explain why this attitude is so weird, I have to beg your indulgence for a little intellectual history.

As of 1950 or so, a common (though not at all universal) view among American structuralist linguists was that the phonemic level of representation had to be connected to surface (phonetic) forms by relatively simple and transparent principles, without lexical or morphological exceptions. Other principles, qualitatively different in character and usually seen as nothing more than a sort of fossilized residue of older sound changes, related morphemes to phonemes. In the middle 1950s, a (then) young linguist named Morris Halle challenged this view, by showing (or let's say, by arguing persuasively) that these principles of analysis, applied to Russian and other Slavic languages, missed important generalizations about palatalization, which had to be seen as applying both "before" and "after" the structuralist phonemic level. The key argument was that "phonology" and "morphophonology" share sound patterns, and so should be merged into one system.

This was the opening salvo, on the phonological front, in the battle of "generative" linguists against their structuralist parents. The structuralists were quickly defeated, or perhaps I should say were driven from the plains and took refuge in a few mountain fortresses on Ivy League campuses and in Anthropology departments. The underlying issues -- how the sound patterns of morphemes, words and phrases are represented and inter-related -- of course remained to be investigated, and there have been many intricate and interesting campaigns ranging across this intellectual countryside over the past half century.

However, on the particular point in question, Halle has completely carried the day. I don't know any working phonologists today who think that morphologically-conditioned (or otherwise "irregular") sound patterns are ipso facto distinct in every way from perfectly transparent ones. There may be some, but they have roughly the status within linguistics of biologists who think that AIDS is not caused by HIV, or cosmologists who think that the big bang never happened. That doesn't mean they're wrong, just that they're intellectually isolated.

And then there's Steve Pinker. He might not be a card-carrying phonologist, since his degree is in psychology, and he doesn't design (or even use) the sorts of systematic, formal descriptions that are phonologists' stock in trade. But he wrote a whole book on one aspect of English morphophonemics -- Words and Rules -- and a carefully wrought, enormously entertaining book it is. A few years ago, Pinker gave a seminar on this book's material to a bunch of literary scholars here at Penn, and by the end of the hour, they were all vying with one another to supply examples and counterexamples. I could hardly believe it, it was sheer genius.

Anyhow, Pinker's theory is structuralist phonemics reborn. 'Rules' must be perfectly regular (though he has not been very specific about exactly what these rules are, how they can interact, what they can do and what they can't do); any forms whose derivation is not perfectly regular are 'words', which are stored and retrieved by a memory system completely disjoint from the rule system. If there appear to be processes in common between the rule system and the word system, it's just because the rule system long ago created some patterns that have been left as fossilized imprints on stored words. The storage system for words is seen as a connectionist one, but its emergent regularities are fundamentally different from the patterns created by the system of rules.

Mark Seidenberg, on the other hand, upholds Morris Halle's view that "morphophonemic" sound patterns can be (and typically are) made of exactly the same stuff as sound patterns that happen to be phonologically transparent. And what reward does he get for this? Phonologists root for Pinker, and psychologists choose sides in part based on what they think about modern phonology, with Seidenberg typically seen as the champion of those who would like to overthrow the whole generative-linguistics enterprise. There's a lot more to be said about this whole area, but believe me, it gets even stranger.

Mark's note to me continued:

You might be interested in the recent paper by Haskell MacDonald and myself which address the rat-eater *rats-eater cases, where the contrast between a graded, quasiregularity and the Pinker alternative is illustrated clearly (on the http://lcnl.wisc.edu website).

The paper Mark is referring to is here. His note continues:

So, it matters whether you think the system is quasiregular (graded, statistical, exhibits partial regularities) or not (rule governed vs. exception). Thus the disagreement is about the basic nature of the system, not different ways of explaining quasiregularity.

I think the same issues arises repeatedly in the characterization of linguistic knowledge, e.g., morphemes and phonemes that are not discrete beads on a string, sentences that are ungrammatical but overlap with grammatical ones, and so on.

I agree. On the other hand, the connectionist work has not yet produced a very convincing account of syntax (in the sense of general recursive compositionality). I'm impressed by some of Michael Ullman's recent work, elaborating and reinforcing older ideas about the linguistic role of frontal motor-skill circuits vs. posterior semantic memory systems. So I hold open the possibility that the McClelland/Seidenberg and Pinker/Ullman views might both turn out to be partly right about language in general, whoever turns out to win on the particular question of the English past tense.

I think that linguists in general, and phonologists in particular, should pay more attention to the details of this debate. They should press Pinker and his allies for more precise definitions of what "regular" means, and they should think hard about whether they're really willing to re-fight the battle of the two Slavic palatalizations, but on the other side this time.

I just can't get over how things like blogging, google, and archiving are changing the way we think and do things. It really is an incredible time. But excuse me I have to go check my bid on e-bay; I need a case that will keep my laptop from getting too banged up!

And I need to go prepare for class.

Mark and I had a further exchange about syntax, which I'll post later.

[Update: In this 1991 Linguist List posting, Bruce Nevin describes Halle's argument against the (structuralist) phoneme in some detail. I haven't checked any sources, but his summary rings true to me.

Aside from the details, there are two inter-related questions that are not always clearly separated -- certainly I didn't separate them in the discussion above. One question is how simple and transparent the phonemic/phonetic relationship is, and and the other is whether morphological exceptionality is allowed as an integral part of that relationship. The questions are connected because morphological conditioning of sound patterns would automatically make them "irregular" by most definitions; and on the other hand one can nearly always encode (apparent or real) lexical exceptions by making underlying forms more abstract.

Bruce also makes explicit what I hinted at, namely that some structuralists (such as Bloomfield) were on the Seidenberg/Halle side rather than the Bloch/Pinker side.

In the end, I think that my point stands. Phonologists from Sapir, Bloomfield and Harris to Halle, McCarthy and Prince have believed that some quasiregular sound patterns are part of the same phonological system as completely regular sound patterns. In this, they are on Seidenberg's side against Pinker, whose team includes Trager/Smith, Bloch and some other structuralists, along with (some variants of) "natural phonology".

I don't have any deep convictions on this question. But if Pinker is right, most phonologists working today need to repudiate most of their own work. Many he's right, and they should -- but I don't see how they can applaud politely for his side of the debate, and then go on doing their own stuff as if he were wrong.]

Posted by Mark Liberman at 09:41 AM

Phrases for Lazy Writers in Kit Form Are the New Clichés

Glen Whitman at Agoraphilia has picked up Geoff Pullum's idea about "phrases for lazy writers in kit form" and run with it, citing the phrasal template "X is the new Y."

Glen quotes grey is the new black, butt crack is the new cleavage, Dean is the new McCain, and 16 others found in a quick Google search. He provides links to citations, and divides the examples into categories, suggesting that the pattern started in fashion-talk but has moved as far afield as astronomy. Continuing his search, I started to look for all the new things that Dean is, and discovered that Slate magazine already published an illustrated catalogue of instantiations of this phrasal template back in August. Checking out the whole phrase "X is the new Y", I learned that 45 other people have already gone explicitly meta on this one, including Warren Clements in the Globe and Mail back in December, Time Magazine in its cover story of September 8, 2003, and an anonymous writer at artforum.com back in March of 2001, who suggests that the real estate business was an early source.

I tried to discover whether Juvenal had complained about over-use of the Latin form of this pattern in the first century A.D., but the Perseus Digital Library "is unavailable from 5:00 to 7:00, US Eastern time, in order to rebuild its databases with new or changed meta-data." Sorry.

Posted by Mark Liberman at 06:54 AM

January 14, 2004

Linguist's Search Engine teaser

Mark's note on Google sociolinguistics kindly provided me with an advertisement, not to mention a gentle nudge to start writing here, something I've been meaning to do for a while. Appropriately, therefore, my first posting has to with language data on the Web.

Mark asks whether refute that such-and-such is so is a construction that is coming or going. For some purposes, such as investigating the time course of a particular usage, I've found Altavista more convenient than Google, particularly because one can take advantage of the "advanced search" feature to access useful metadata. (Also, for linguistic purposes, I'm not sure preferring pages with a high Google rank is particularly helpful.) For example, when Mark looked for "refute" with sentential complements, he could have replicated his Googling using Altavista advanced search for "refuted that the" OR "refutes that the" OR "refute that the" with the date unconstrained, in order to get a rough idea of how Google and Altavista compare, and then done a time-based comparison by restricting the query to the time frame from 01/01/95 to 12/31/98 (18 hits) versus the time frame from 01/01/99 to 12/31/03 (739 hits).

As a matter of fact, a few years ago I did this same sort of operation for a pre-sentential exclamation I'd been noticing more and more frequently: ``woo hoo!''. (E.g. Woo hoo! I won!) At the time, my search using Altavista provided an estimated 15 instances of this expression in total prior to 1996, 144 in 1996, 459 in 1997, 2269 in 1998, and 6676 from January to August 1999. These were raw counts, but further analysis showed that the usage of this phrase increased two orders of magnitude even when counts were normalized to account for the growth of the Web. (At the time I used Web host count data at http://www.isc.org/, which have been saved from oblivion by the Internet Archive. Bless you, Brewster.) A bit more detective work led me to the probable origin of the expression, or at least of its increased popularity [sound clip]. This may not have been a linguistically deep example, but it did help convince me how powerful the Web could be, potentially, as a resource for data about language in use.

The trouble was, Web search engines were not -- and still are not -- well suited to the needs of the ordinary working linguist. If you're able to approximate a phenomenon using a contiguous sequence of words like refute that the, great. If, on the other hand, you're interested in looking on the Web for a phenomenon involving syntactic structure that is not easily approximated in this way, you're out of luck.

As an example, someone once commented that a model of mine predicted (incorrectly, he thought) that the verb titrate should be grammatical with an implicit direct object. For example, "You should stop titrating" should be able to mean "You should stop titrating whatever it is that you are titrating" the same way that "You should stop eating" can mean "You should stop eating whatever it is that you are eating". I agreed with the prediction, but I had no intuitive judgment of my own. Where to find lots of people for whom titrate is an active vocabulary item? One obvious step was to look on the Web for this particular verb used intransitively -- or better yet, find a way to search more generally for sentences according to linguistically relevant lexical and/or syntactic criteria.

That didn't exist... so, to make a long story short, a few years ago I decided that such a thing needed to be built and convinced NSF that this was a good idea, and we built it. ("We" being a team at the University of Maryland, most notably a brilliant software designer/programmer named Aaron Elkiss, with input from collaborators Mari Broman Olsen and Christiane Fellbaum.) We call it the Linguist's Search Engine, or LSE for short.

I'm not going to say much more about the LSE in this post -- that's why the title says it's a teaser. Why am I doing this? Well, I would have liked to be giving out the URL for it right now, but in mid-December the LSE server was one of nine computers stolen out of an office at the University of Maryland (!!). Everything was backed up, fortunately, but it took a while to get a new machine and restore everything, so we had to push our going-public date back a month or so. I hope to have the URL for you next week.

Meanwhile, though, let me close with a few examples I found quite easily using the LSE:

  • http://www.ocdsb.edu.on.ca/JMCCweb/PROJECTS/SCIENCE/ChemLabs/ABTitr.html
    ...In this experiment, you will use a computer to monitor pH as you titrate.
  • http://ww2.lafayette.edu/~bonos/Week1.htm
    We only titrated with phenolphthalein if the pH was above or equal to 8.3.
  • http://drnelson.utmem.edu/med2.html
    Conversely, if we titrate in the opposite direction...
  • http://misterguch.brinkster.net/q9.html
    The endpoint of a titration is when the indicator tells you should stop titrating....

Woo hoo!

Posted by Philip Resnik at 10:01 PM

Authorization: Who or what? (Plus: Tracing the origins of The Language Log)

All entrances to the Fort River School property, down the street from my place in Amherst, Massachusetts, are marked with signs that read
Let's ignore the apparent but perhaps unavoidable redundancy and consider instead what the sign means. Does it ward off unauthorized individuals, or does it militate against unauthorized activities?

There are basketball courts on the property. So, presumably, playing basketball is an authorized use of the property. Does this mean that anyone can wander onto the property and play basketball? Or must one first obtain authorization to do anything (even authorized things) on the property? This is a serious concern to area sporting enthusiasts, who can freely play basketball on the property on one interpretation but cannot on another. It should also be a concern to, e.g., Fort River schoolteachers (authorized users of the property) who are interested in holding poker tournaments on the property (presumably not currently an authorized use).

Note: The above note originally appeared in WHISC (What's Happening in South College), the new weekly newsletter of the UMass Linguistics Department. Both the idea for the newsletter and its name are "inspired by" (swiped from) the weekly newsletter of the UCSC Linguistics Department, WHASC (What's Happening at Santa Cruz), which is the work of Connie Creel.

The UMass newsletter is somewhat livelier than WHASC tends to be. It boasts pictures, puzzles, and general observations about the department and its researchers, in addition to day-to-day news. But WHISC's style and tone clearly take their cue from one very special issue of WHASC: May 29, 2002. Readers of The Language Log will not be surprised by the signature at the bottom of that page. In retrospect, one can hear its author crying out for a regular outlet for things like this.
Posted by Christopher Potts at 09:40 PM

This isn't poetry, this is abuse

Cursed as I am with the habits of a scholar, I actually read the little bits of paper and newsletters that arrive in the envelope with every bill I get, just in case I need to know anything that they tell me. With my latest mortgage bill I received a special extra slip of paper containing a poem called "What we can do for you!". It is so bad that readers of a delicate disposition may decide they don't want to look at it.

What we can do for you!

We can't cook your dinner
Or make all your beds
Or mow your tall grass
Or scrub your kids' heads.

We can't clean your garage
Or paint your front stoop
Or fix your computer
Or scoop your dog's poop.

Though we can't help with these things
You must understand
We can offer you something...
Something quite grand!

We can offer you home loans
That meet every need,
At competitive rates
That can help you succeed.

New purchase, refinance,
Home equity loan,
Need help? We'll be here
Online or by phone.

We can give you great service,
Never settle for less,
We're here when you need us,
That's... The Power of Yes.®

Astoundingly, they have not only registered the phrase "The Power of Yes" as a trademark, but also copyrighted the whole of this drivel (© 2003 by my mortgage company [I pay them the courtesy of not naming them], all rights reserved), as if someone might try to steal it!

(By the way, you may be thinking that in light of this fact I shouldn't be putting it on a web page like this. You're wrong. I am a qualified linguistic professional. All of us are, here at Language Log. We can cite material for purposes of study or exemplification or derision, and it counts as fair use. As a linguist, you see, I have special rights to do things with linguistic material that ordinary people cannot do. I'm really like a doctor. Take off your clothes and lie down.)

What on earth is my mortgage company thinking of, paying someone to compose this doggerel, paying registration and copyright fees, and printing thousands of copies on glossy paper, and sending it to me? I'm already a loan customer; do they think they will earn extra money off my loan if I come to have a higher opinion of them? Do they think that sending me six stanzas of unspeakable slop will cause me to have a higher opinion of them? Are they flaming nuts?

And shouldn't it be illegal to expose people to poetry this bad without their consent? Bad poetry can be really harmful. Take Vogon poetry, for example (it is the third worst in the world according to The Hitchhiker's Guide to the Galaxy). It is well known that Vogon poetry has been used as torture. This is unethical. Unrequested exposure to execrable poetry should be made illegal. We need a campaign to stop mortgage companies and all other commercial concerns from continuing with this cruel practice. Perhaps the BBC's robot mechanism for emailing randomly generated Vogon poems could be used to give the company in question a taste of their own medicine. Unfortunately, they do not seem to have supplied me with their email address.

Posted by Geoffrey K. Pullum at 04:22 PM

"Refute" for "Deny" -- we're on the case

Some time last year I noticed the new use of refute that Mark pointed to in his posting on "Google sociolinguistics " and we included it as an item on the most recent ballot we sent to the members of the American Heritage Dictionary's usage panel (of which I bear the august if empty title of chair). I don't have the results of that one yet, but we were remiss in not checking on the use of the verb with a that-complement that Mark observed -- the item we gave the panelists was "In a press conference, the senator categorically refuted the charges of malfeasance but declined to go into details." We should ask about the sentential complements in a future ballot -- I'd be surprised if the panelists accept this one, but it's always useful to know the percentages and to see their comments.

In case you're curious, some of the other items on the last ballot were counterfactual may, as in "If John Lennon had not been shot, the Beatles may have gotten back together," and insoluble for "unsolvable," as in "Many racetracks are plagued by a seemingly insoluble problem: a shortage of horses that results in small fields." We also included some compounds with self-, like the ubiquitous self-identify, which has a number of uses. I'm okay with "self-identified lesbians," say, but have more qualms about intransitive uses like "Students with special needs should self-identify at registration." And I balk at "The program first and foremost works to establish relationships with women, and strives to meet their diverse self-identified needs" (though I don't know why this bothers me more than phrases like "Limbaugh's self-confessed addiction to prescription painkillers" or "His self-confessed 'obsession' with Indonesia," which are all over the place if you look for them). We'll see if the panel makes these distinctions, at least in their degree of collective approval.

Then there's the coinage self-irony (ca. 4000 Google hits), an item that suggests the final conflation of irony and sarcasm that signals the passing of an age as well as anything else I can think of. (Are modern-day freshmen writing "self-irony" in the margins of their copies of Emma?). But maybe the panelists will be more accepting of this than I am.

Posted by Geoff Nunberg at 02:12 PM

Strange linguistics in this week's Economist

It was a big week for our subject in The Economist (January 10th, 2004). The Science and Technology section led off with linguistics, the topic being David Gil's report about Riau Indonesian grammar, already discussed here by Mark Liberman. (Very suspicious stuff that I'm not yet ready to believe, incidentally; I wish more details were given. I have seen only superficial essays citing a single two-word utterance that is supposed to have dozens of meanings. I'd want to see a lot of very detailed argument before I would be prepared to believe that there is a natural language with no nouns or verbs. For one thing, that would mean no word order rules whatsoever, because word order rules always make reference to syntactic category, e.g., whether some item is a noun or a verb.)

But there was another linguistic item too, about an even stranger topic than the allegedly grammarless province of Riau.

After two other short items (about Mars and SARS, respectively), the Science and Technology section returned to language with a piece about the Voynich manuscript. This strange document may date back as far as the 16th century, and has been intensively analyzed since it was purchased in 1912 by antiquarian Wilfrid Voynich. It has never been deciphered, even in part. Its elegant script is unknown from any other source, and not one clue as to its semantic content has emerged. It might be randomly assembled nonsense, or it might be a genuine message that has been encrypted. Right now, nobody knows. The linguistic science story actually comes out of computer science: Gordon Rugg, a computer scientist at Keele University in the UK, has worked out a way of showing that the text shows certain regularities that might have been produced by a cryptologic technique due to an Elizabethan era con-artist, Edward Kelley, which I guess means that he just might have been the creator of the manuscript.

What an odd, wacky, thoroughly peripheral pair of language-related items. And they appeared in the week of the Linguistic Society of America meeting. You'd think there might have been actual results presented at the meeting that could have provided fodder for the science journalists (which always happens every year with the meetings of the American Association for the Advancement of Science). If we linguists didn't have any positive, refereed results that would make a dozen column inches in the science section of a magazine or newspaper... Well, then perhaps we should try harder.

Posted by Geoffrey K. Pullum at 02:09 PM

Frozen precipitation in the networking infrastructure

This is better as a cartoon

than as an article in the Economist. See also Geoff Pullum's post on "Phrases for lazy writers in kit form." Note that the phrase "flurry of packets" brings it all together nicely.

Posted by Mark Liberman at 01:41 PM

Google sociolinguistics

Following the revelation that some heartless test designers have arranged for 50% of the scores to fall below the median, a recent article in the Daily Northwestern by Mindy Hagen says that "[i]n his weekly radio address last Saturday, Bush refuted that the law sets unreasonable standards."

This took me aback, not politically but syntactically. For me, you can refute something but not refute that such-and-such is so. I checked a few dictionaries, and learned that there's a controversy about whether refute must mean "disprove", or can be used more loosely to mean just "deny" or "repudiate". However, nothing is said about the question of sentential complements, though all the (about 20) example sentences in the four on-line dictionaries I checked had noun-phrase objects.

Not being a bigot, I checked with Google. The patterns "refuted that the", "refutes that the", "refute that the" get a total of 1,969 hits, showing that Ms. Hagen's usage is not an isolated case. The patterns "refuted the", "refutes the", "refute the" get 225,800 hits, suggesting that sentential complements for refute remain relatively rare. By comparison, "claimed that the" scores 497,000, compared to 580,000 for "claimed the".

Not being a corpus fetishist either, I took a look at the examples and their sources. Some of the sentential complement examples are in national publications, like this review in Salon Magazine of Christina Hoff Sommers' The War against Boys: "Sommers seeks to refute that the 'girl crisis' ever existed." So I'm convinced that this construction has a linguistic toe-hold. But is it coming or going? Is it going to be the norm in another generation, or is it just a sporadic idiosyncrasy?

On the first four pages of google's examples of "refuted that the", 17 of the 40 citations are from south Asia (India, Pakistan, etc.). This suggests that in "Indian English" and related varieties, sentential complements for refute have already become standard.

Several of the other citations on the first few pages are from (America) religious discussions -- perhaps this is just because refute is a relatively high-frequency word in religious debates, but maybe there are more interesting reasons.

A few of the other citations that I looked at are from other college newspapers. The Daily Illini writes: "While economics professor Fred Gottheil admitted that the nation is experiencing an economic dip, he refuted that the economy is in a recession." An article in the Grinnell paper says that "Leffler refuted that the U.S. has ignored Europe because there remained incomparable shared values and interests between the two giants." Is this an indication of "age grading"? are younger writers more likely to use sentential complements with refute?

If I cared enough, or had enough time on my hands, I could probably answer these questions. I'd have to read through a large sample of the thousands of available citations, and categorize them according to the apparent age and background of the writer, the topic of the passage, and relate such variables to the frequency of linguistic variants. If web search indices included (even automatically-derived and approximately-correct) metadata of this type, I could evaluate such hypotheses with much less work. Actually, I should say: "When web search indices include automatically-derived and approximately-correct metadata of this type, ..."

Sometime soon I hope that Philip Resnik will tell us about his Linguist's Search Engine -- I will, if he doesn't :-)... -- and you'll see that we are not talking about an imaginary day far in the future. The independent variables that Philip's system happens to focus on are lexical and grammatical, but there's no reason that genre, author characteristics and so on could not be introduced into a service of this kind.

Posted by Mark Liberman at 08:53 AM

Attributional abduction

This passage is from an article about the "No Child Left Behind Act", in the Daily Northwestern's 1/09/2004 edition (emphasis added):

"There are serious problems in the legislation, and that was recognized when Congress passed the bill," said Education Prof. Fred Hess, director of NU's Center for Urban School Policy.
Hess said some of the act's problems go beyond funding. The tests being used are formulated so that 50 percent of the test-takers will fall below the median score -- in effect setting school districts up for failure no matter how much preparation students receive, he said.

That last sentence is lucidly expressed, but it is so spectacularly stupid that it imposes an unavoidable problem of interpretation. In the chain from Prof. Hess to us, someone is ignorant, malicious or careless. But who? and which?


  • the source actually said this, and the reporter didn't see any problem with it; or maybe
  • the source said something sensible, and the reporter garbled it out of ignorance; or maybe
  • the source said something sensible, and the reporter garbled it on purpose, in order to make the source look stupid; or maybe
  • the source actually said this, and the reporter saw the stupidity but included it without comment or correction as ironic subversion; or maybe
  • the source said something sensible, and the reporter wrote something sensible, but an editor garbled it, whether out of ignorance or malice ...

Or maybe the paper's production facilities were infiltrated by a squad from The Onion. You get the idea -- something is happening here, but we don't know what it is. We know exactly what the sentence means, but we're completely puzzled about how to interpret it. We're reduced to doing a kind of attributional abduction: reasoning to the most likely explanation for the publication of this bone-headed remark.

Has anyone worked on formal models of this kind of reasoning? It has something in common with liar's paradoxes and the logic of communication. However, here we're not trying to determine the truth of a statement, but rather the responsibility for a stupidity; we're not unraveling who knew what when, but rather who garbled what when (and why!). And the influences on the decision are mostly gradient evaluations of how likely it is for a certain person, or a certain kind of person, to know or think or say or do something wrong or questionable, or to be influenced by a certain kind of external agenda.

It's amazing how often respected media leave informed readers in this interpretive bind. Over the last couple of months, Language Log has stumbled over a number of cases that require attributional abduction in this sense. Guy Bailey in the NYT says strange things about the sources of Texas dialect features; a Korean clinician asserts via Reuters that bilingualism causes autism; an Egyptian manuscripts scholar promotes the Protocols of the Elders of Zion; an Australian writer claims that English adds 20,000 new words a year; and now Prof. Hess complains that a test has been designed so that half the scores fall below the median. It sometimes seems that the only boggle-free articles are the ones where you start out completely ignorant of the relevant facts and principles.

Of course, it's unfair to the news media to single them out for this sort of examination. Sources, reporters and editors are not the only people who are sometimes stupid or malicious, and honest misunderstandings among well-intentioned members of the human race are more the rule than the exception. It's surprising, in the end, that anyone ever learns the truth about anything.

[Update: in case anyone is still wondering what Prof. Hess actually said, I should add that my rule of thumb in such cases is "blame the journalist." However, I've written to the parties involved and will let you know if I learn anything further.]

Posted by Mark Liberman at 06:49 AM

January 13, 2004

Linguistic Obstacles at Legal Sea Foods

All this talk about Legal Sea Foods reminds me of a dinner I had there many years ago. A friend of mine who speaks Japanese had met a Japanese visiting scholar who could read English but was quite incapable of speaking it. Since I also speak Japanese, he had arranged for the three of us to go out together to provide the Japanese fellow some conversation. We had to wait for a table, so we went to the Oyster Bar. Our Japanese friend decided to order some oysters. He rehearsed his speech with us, girded his loins, and went off to order oysters. He returned crushed: he had been unsuccessful. Curious as to what had gone wrong, I went up to the counter and addressed myself to the guy who sat behind it shucking oysters. He responded, in Portuguese, that he didn't speak English. It turned out that he had only arrived from the Azores three weeks previously. His cousin was the one who was supposed to take orders but had gone into the kitchen. Sometimes you just can't win.

Posted by Bill Poser at 11:34 PM

How many knots can you tie?

While at the LSA's annual meeting this past weekend, I was fortunate enough to make the acquaintance of a knot theorist from Harvard. I of course immediately asked him how many knots he could tie. He said just two, which was a bit of a disappointment, as was his inordinate fascination with that boring old shoe-tying knot that everyone above the age of three knows well. What's more, he had no idea why some knots are stronger than others. He brushed this off as a problem in physics.

This is all that I was able to learn about the inner lives of knot theorists: I had to go hear a talk about be.
Posted by Christopher Potts at 10:35 PM

In, out, up, front, back, minimal, maximal

Geoff Pullum pleads with us to tell him what incall and outcall mean, so that he can order massages from hotel rooms without wondering whether he is supposed to head across town to the massage parlor or relax and wait for the expert to ring his doorbell. I venture that the answer is that there is no answer: these directionals are ambiguous, and they are not alone in this. In support, I offer four similar examples as well as a private confusion.

Where I grew up (the Tri-State Area), order (food) in and order (food) out both describe having food delivered to one's house. This is mostly harmless. The phrases are used when everyone is hungry and nothing has been prepared. In such situations, one is unlikely to propose sending some food out of the house.

When someone says that an appointment has been pushed back, there is no telling whether it is now set to begin earlier than it was originally scheduled to begin or later than that. The phrase pushed up is equally unhelpful. In this case, we can usually recover the intended meaning by gauging whether the speaker has grown more frantic or less so.

Imagine that you are looking down a row of automobiles, each with its driver's side door facing you. A new car pulls up and stops at the end of the row that is farthest from you. Would you say that this car pulled to the front of the row or to the back of row?

Linguists often talk about tree-structures. These generally involve a binary relation on nodes that is called the dominance relation. There is always exactly one node, the root, that is not dominated by any other node (except perhaps itself). Is the root maximal or minimal with regard to the dominance relation? If you think in terms of bottom-up derivations for sentences (if you like to begin with the words and imagine projecting structure from them), then the root is likely to be maximal in your mind. If you think top down, the root is probably your minimal element. Happily, it doesn't matter which perspective you adopt for the dominance relation. Just be consistent. (As you probably gathered from my brief description, linguists draw their trees upside down, with the root at the top of the page.)

And my own private confusion: I have trouble remembering how former and latter work, because I can never remember whether one is supposed to begin counting from the beginning of the relevant phrase or backwards from where former or latter occurs.
Posted by Christopher Potts at 10:19 PM

Something you need to know about me

My colleague Armin Mester has recently put up on his door down the hall from me a New Yorker cartoon in which a man seated at a restaurant table is saying to his female companion:

There's something you need to know about me, Donna. I don't like people knowing things about me.

This is an odd enough thing to say that I giggled, without quite knowing why. Is there a hint of the liar paradox there? Is it coherent to tell someone (i.e., cause someone to know) that you don't like people knowing things about you, which entails that you don't want them knowing that you don't like people knowing things about you, which is precisely what you have just caused them to know about you? Is it merely self-contradictory? Or does it have no truth value at all, like This sentence makes a false claim?

I'm out of my philosophical depth as usual. And my philosopher partner Barbara Scholz is away in Ohio right now. Is there an epistemologist in the house? Perhaps Brian Weatherson will have a view; I guess I can watch Thoughts, arguments and rants to find out.

Posted by Geoffrey K. Pullum at 07:47 PM

Why cool remains hot

This story from the Baltimore Sun quotes Donna Jo Napoli as observing that cool is "underspecified", and paraphrases her to the effect that "the more unspecified a word is, the more staying power it has." I wonder if that's true. I'm not trying to suggest that it isn't true, it kind of makes sense, but I just don't know. Is there some way to quantify specificity, so that we could investigate the correlation between specificity and the length of time a word is in active use? For that matter, is there a good way to quantify lexicographic "staying power"? Has anyone ever studied this systematically?

Note: Donna Jo is a professor of linguistics at Swarthmore as well as a widely-read children's author -- here is her academic CV. She's largely responsible for the fact that Swarthmore, last I checked, had more linguistics majors in proportion to its undergraduate population than any other American college or university. So if Donna Jo says that underspecified words stay around longer, I'm paying attention. I just wonder, is all.

[Update: since the cited article requires registration on a site that promises to send registrants spam, here's the full passage quoting Prof. Napoli:

Of the word cool, "I can say one thing about it; it has not stood still," says Donna Jo Napoli, a Swarthmore College professor of linguistics.

When she was growing up, cool meant "wow!" says Napoli, also the author of young adult novels and the mother of five. Today, cool is used more often to mean, "OK, I'm fine with that," Napoli says. In other words, 'I'm cool with that.'

As Napoli suggests, cool gets around. There's "way cool," "cool beans," "That's cool," "Keep your cool" and "too cool."

Cool is an example of an "underspecified word," Napoli says. The less specific a word, the more meanings it can have. Assassinate is an example of a "highly determined" word, one that can't be used in too many contexts, she says. The more unspecified a word is, the more staying power it has, she says.


Posted by Mark Liberman at 03:06 PM

Incall and outcall

I want to confess something. It's kind of embarrassing. It involves hotels and making phone calls and getting naked... But it's time I was open about this. I'll share it with you. Though I would understand if those who have narrow views concerning the personal services industry might prefer not to read anything on this subject.

It is remarkably stressful to fly thousands of miles to a strange city, ride in a strange taxi from a strange airport to a strange hotel, and sit tensely for hours on strange hotel chairs at conferences listening to papers that are also sometimes rather strange. Often, when I get back to the hotel room from a day of conference attendance, I would really like to unwind by having a trained expert run their warm-oiled fingers over my naked body and untense the muscles that the day's stresses have knotted up. I felt the need in Boston after hours of sitting in sometimes frigid rooms at the LSA (Linguistic Society of America), for example, and also at the MLA (Modern Language Association) back in December in San Diego. So I often browse the advertisements for massage specialists that I see in local papers and the yellow pages. But I don't place that call to make the arrangements.

And I'll tell you why. It's not about any hesitancy concerning fragrant oils being applied to my naked torso by a total stranger. I'm cool with that. The hesitancy is linguistic. (This is Language Log, isn't it?) It's this: I have quite simply never been able to figure out, despite an intimate and extensive acquaintance with the syntax and semantics and word formation principles of English, what incall and outcall mean. I am perfectly able to guess what the two meanings are -- one of them means that the masseur or masseuse sits in an upstairs room above a neon sign with scented candles and fragrant oils playing quiet nose-flute music on a stereo and waits for you to take a taxi to where they are, and the other means that the masseur or masseuse loads a folding massage table into a station wagon and grabs a travel case of fragrant oils and a boom box and some nose flute cassettes and brings their equipment and manual skills to you wherever you are. But which one means which?

The problem is that two directions and two people's points of view are involved. In one scenario the client calls in -- first, by telephone to find out where the room is, and then by climbing the stairs and entering the room with the scented candles -- so perhaps that would be incall. But in that same scenario the client also has to call out on the phone from the hotel to find where it is, and go out to that address, so in a sense it could also be thought of as outcall: you call out and then you go out.

On the other hand, in the second scenario, with the station wagon, the massage practitioner comes in, hauling the massage table and travel case and boom box into your room, so perhaps that would be incall. Yet from a different perspective, to do it they have to go out when you call out to order their services, and they travel in the station wagon out to wherever the client is; so perhaps that would mean it was outcall.

I simply don't see it. Either could mean either. I don't want anyone to just tell me, you understand: if I can't see how it follows by some sort of linguistic principles, I will just forget it again. I want to see which meaning goes with which word, I want to understand it, to grok it. This is my language. I also don't want to have to place a call to someone who only does whatever incall is and ask for what actually counts as outcall, or the converse. That would be as embarrassing as uttering an eggcorn. I am supposed to know this language; I'm a native speaker. So that's why I never make arrangements for a massage when I'm away on a trip. I think about it, but then I just founder on the semantic puzzle all over again, and fall asleep wondering which meaning goes with which word.

Posted by Geoffrey K. Pullum at 02:55 PM

New middle school euphemisms?

Prentiss Riddle writes in with a story about new vocabulary creation in a Texas middle school:

I have no current data to offer you on cliquonymics, but I have an urban legend with a linguistic hook to pitch you.

His note continues:

A friend's 12-year-old daughter just finished two semesters of middle school here in Austin. Her peers are thoroughly immersed in the net and put a lot of time and energy into chat and e-mail.

They are under the impression that there is a full-time staff member at their school whose job it is to intercept messages containing profanity. Sounds like an awfully busy person! My guess is that their school really uses NetNanny or some equivalent, and in the kids' minds a piece of software has become an omniscient superhuman. Do you suppose this belief is held at other schools as well?

Here's the lingusitic hook: According to my friend's daughter, the kids have responded by creating their own synonyms for the standard obscene lexicon so their messages can slip past the censors.

I'd *love* to have a glossary of that jargon! But of course I was too responsible an adult to quiz a 12-year-old on the subject of dirty words. And perhaps I'd have been disappointed anyway -- maybe it's just l33tsp33k.

Still, I love the idea of a junior-high guerrilla linguistic movement eluding the forces of parental control. :-)

Good luck with your clique collection.

Thanks, Prentiss. But maybe it's less like guerillas eluding authoritarian control, and more like the ancient Slavs tiptoeing around the dangerous Bear Spirit by calling him the honey-eater? Magical thinking is suprisingly widespread in the linguistic culture of elementary-school kids these days -- maybe middle schoolers are not immune either.

Posted by Mark Liberman at 06:29 AM

Pete Rose and sorry statements of the third kind

Just before I left for the LSA meeting in Boston there was much discussion in the press and on radio and TV shows about whether Pete Rose's long-awaited apology (for betting on baseball while he was a baseball manager, and lying in his teeth about it for over a decade) would earn him sympathy with the public. I heard one radio show where they got hold of a professor who had written articles about public apologies and what makes them work (being sincere, showing understanding of what had been done wrong, expressing remorse, doing it on Larry King Live, etc.). Well, as far as the much-quoted passage from his book is concerned, the simple fact is that Pete Rose hasn't apologized at all. People aren't being sufficiently sensitive to the grammar of the adjective sorry.

It should be clear that an apology has to be in the first person, and in the present tense. But it is not enough to utter something in the first person that has sorry as the head of an adjective phrase predicative complement. The word sorry is used in three ways.

First, sorry can be used with a complement having the form of what The Cambridge Grammar calls a content clause:


I'm sorry that the the political situation in the Holy Land is still mired in violence, because I wanted to go to Bethlehem at Christmas.

If I utter (1), I am not apologizing; I have never caused or defended any of the violence in the Middle East. It's not my fault. I just regret that the situation persists. This use can constitute an apology (as Jonathan Wright reminded me when he read the first version of this post), but only when the content clause subject is first person as well: I'm sorry I hit you is an apology, but I'm sorry you were hit is not, so watch for that subject.

Second, sorry can be used with a preposition phrase headed by for with a complement noun phrase denoting a sentient creature:


I'm sorry for that poor little kitten, which seems to have figured out how to climb up a tree without having any idea how to get down.

If I utter (2), I am not apologizing; I never suggested to the stupid kitten that it should climb fifty feet up into a beech tree. I'm just expressing sympathy, as a fellow mammal, for its present plight.

And third, sorry can be used with a preposition phrase headed by for where the preposition has as its complement a subjectless gerund-participial clause or a noun phrase denoting an act:


I'm sorry for doing what I did; I behaved like an utter pig, and you have a right to be angry.


I'm sorry for my actions last night; I should never have acted that way and I want you to forgive me.

Only this third kind of use can constitute an apology, as opposed to a statement of regret about the truth of a proposition or a statement of sympathy for a fellow creature.

Now, here is the passage from Pete Rose's book (reprinted in an excerpt in Sports Illustrated) that people have been carelessly referring to as containing an apology:

"I'm sure that I'm supposed to act all sorry or sad or guilty now that I've accepted that I've done something wrong. But you see, I'm just not built that way. So let's leave it like this: I'm sorry it happened and I'm sorry for all the people, fans and family it hurt. Let's move on."

The first sentence ("I'm sure that I'm supposed to act all sorry...") couldn't possibly be construed as apologetic. And in the last sentence he clearly and explicitly employs only the first and second types of use for sorry: he regrets that the incident occurred without describing the incident with a first person singular subject (compare with (1)), and he has sympathetic feels for those hurt (compare with (2)). Beware of thinking that a sentence beginning with I'm sorry is an apology. It need not be. If it's like the quote just given, it may be closer to an intransigent refusal to apologize. If a genuine apology in writing is a precondition for getting back into baseball, Pete Rose is showing no signs of being eligible to get back in.

He actually came a lot closer in a December 12 interview with Primetime Thursday on ABC News, parts of which were also aired on "Good Morning America". He said:

"I am terribly sorry for my actions and for my bad judgment in ever wagering on baseball, and I deeply regret waiting so many years to come clean."

That's a sorry of the third kind, and it has the form of a direct apology. And he also said:

"I would like to apologize to the fans for abusing their trust."

You can perform an action with words by stating that you would like to perform it: if you are legitimately at the microphone and you say "And now I would like to introduce Professor Noam Chomsky", and Professor Chomsky promptly steps up to that microphone and begins to lecture, you will be understood as having introduced him, even though what you literally said was only that you would like to. That's known as an indirect speech act, and it does work.

Overall, one waits with interest to see if Rose's mealy-mouthed mixture of direct apologies, indirect apologies, and clear avoidances of apology are going to count as enough in anyone's view to allow him to get that Hall of Fame induction he is yearning for. I'd bet against it.

Posted by Geoffrey K. Pullum at 12:13 AM

January 12, 2004

Grammarless Riau?

Language Hat and his commenters have some interesting things to say about David Gil's work on Riau.

According to the stuff of Gil's that I've read, Riau is spoken within view of Singapore's highrises, so the opportunity for more thorough study is certainly there.

John McWhorter and I had a conversation about this about a month ago, following up on his post on the subject, and I read a number of Gil's papers at the time. The thing that bothers me most about the Riau business is the fact that it's the stigmatized end of a continuum with standard Malay/Indonesian [see this and this this for Gil's description of the situation], and Gil makes it clear that it's really hard to get speakers not to move up into a more formal register.

[Update 1/18/2004: here's what Gil writes about it:

Right away, I was struck by how different the local language was from the Standard Malay / Indonesian that I had read about in the linguistic literature. So I set out to investigate the language, by eliciting data from native speakers. But this turned out to be a virtually impossible task: the interference from the standard language was much too strong. If I asked speakers how to say something in colloquial Indonesian, they would invariably provide sentences in the formal language. If I then confronted speakers with sentences that they themselves had uttered, they would deny having produced them, and then offer to "correct" the sentences by translating them into the formal language. Similar problems occur in many or all languages; however, the extent of the phenomenon differs considerably from one speech community to another and here it was about as difficult as it gets.


When I've worked with language variants of that kind, there's often been a parallel difficulty -- once you persuade a speaker that it's really OK for him or her not to (try to) speak the standard language, it may be equally hard to prevent this "license" from being interpreted as an instruction to agree that any meaning can be paired with any form, since this seems to be what the field worker wants to hear. Gil seems serious and careful, but that may not be enough. [He discusses methods of observing "naturalistic" data by sitting around and jotting down what he hears when it seems interesting, but the claims about the many meanings of a given two-word phrase presumably come from another source than this.]

My own belief is that in cases like these, where judgments are so skewed by the social context, corpus-based methods are an essential part of any solution to the research problems. Given the number of speakers and the location, a good-sized corpus of Riau would not be hard to collect (and publish!), but I don't know that anyone is working on it.

Posted by Mark Liberman at 09:02 PM

Quantifying alternative translations

A few months ago, I verified a conjecture about degrees of grammaticality by comparing google hits. Yesterday, Geraint Jennings pointed out that the "flights" of drinks offered on upscale restaurant menus are a calque of French "volée", which has been borrowed directly as volley. In uses like "volée de coups", it seemed to me, volley is not as likely a translation as barrage, and in fact for the particular case of punches and kicks, the most idiomatic English collective noun seemed me to be (the snow word!) flurry. I wondered if these guesses are really valid, so I did a quick google check:

flight of  
volley of  
barrage of  
flurry of  


Posted by Mark Liberman at 08:21 PM

The ant, the spider and the bee

Last Friday, at the 2004 LSA Annual Meeting, Morris Halle gave an invited plenary address "In Honor of the 80th Anniversary of the Founding of the Linguistic Society." His title was "Moving On."

Maggie Reynolds told me Friday afternoon that more than 1100 people had registered for the meeting at that point, and I think just about all of them were in the room. Morris -- who has given many speeches in his 80 years -- commented at the start of his talk that this was the largest audience he had ever addressed.

I'll have more to say about the content of his talk later on -- and also about some of the "state of the art" presentations, including Noam Chomsky's, as well as Ray Jackendoff's presidential address, and the special session on "Modeling Sociophonetic Variation", the special session on "Constructions", and perhaps a few other things. However, I'm very pressed for time this week, and these posts will require some thought and will take more time than I'm likely to have over the next few days.

So instead I'll start with a few personal anecdotes, which are easy and quick. The first story is especially easy because most of it is just a quote that I can cut and paste from the net.

I took courses in linguistics as an undergraduate, but then was I away from intellectual life entirely during three years in the army. When I showed up for graduate school in 1972, I felt like a fish out of water. Since I'd taken the basic courses as an undergraduate, Morris decided to start me out in the second-year program. However, the formalisms that I'd learned in 1968 were out of date, and I understood only vaguely what the new ideas were and how they had been motivated. Worse, I still had an undergraduate's instinctive sense of theory as a bit of God's truth revealed, and so it was disconcerting to find that the theory I'd been taught wasn't considered true anymore.

Morris explained to me, not for the last time, his view that theories and formalisms are best seen as tools for exploring nature, making it possible to ask and answer descriptive questions in a systematic and incrementally more revealing way. To make the point, he quoted a passage from Francis Bacon. I can still remember the content of the quote well enough, 32 years later, to find it on the web. It's Aphorism 95 from Bacon's 1620 work The New Organon, or True Directions Concerning the Interpretation of Nature:

Those who have handled sciences have been either men of experiment or men of dogmas. The men of experiment are like the ant, they only collect and use; the reasoners resemble spiders, who make cobwebs out of their own substance. But the bee takes a middle course: it gathers its material from the flowers of the garden and of the field, but transforms and digests it by a power of its own. Not unlike this is the true business of philosophy; for it neither relies solely or chiefly on the powers of the mind, nor does it take the matter which it gathers from natural history and mechanical experiments and lay it up in the memory whole, as it finds it, but lays it up in the understanding altered and digested. Therefore from a closer and purer league between these two faculties, the experimental and the rational (such as has never yet been made), much may be hoped.

Or in the original Latin:

Qui tractaverunt scientias aut empirici aut dogmatici fuerunt. Empirici, formicae more, congerunt tantum, et utuntur: rationales, aranearum more, telas ex se conficiunt: apis vero ratio media est, quae materiam ex floribus horti et agri elicit; sed tamen eam propria facultate vertit et digerit. Neque absimile philosophiae verum opificium est; quod nec mentis viribus tantum aut praecipue nititur, neque ex historia naturali et mechanicis experimentis praebitam materiam, in memoria integram, sed in intellectu mutatam et subactam, reponit. Itaque ex harum facultatum (experimentalis scilicet et rationalis) arctiore et sanctiore foedere (quod adhuc factum non est) bene sperandum est.

It's hard to beat this as a recipe for rational inquiry. It also makes a good backdrop for my reactions to the Boston LSA meeting.

Posted by Mark Liberman at 07:21 PM

Rosanne: woman of mystery

Cyberspace makes for some weird situations regarding knowledge of other human beings. Take the case of the mysterious "Roseanne" of another language and linguistics blog, The X-bar. Mark Liberman and I (and several other Language Loggers including Geoff Nunberg and Chris Potts) were at the LSA meeting over the weekend in Boston, and, we now know, so was "Roseanne". We must have seen her. We were at Chomsky's lecture, and Jackendoff's, and so was she. We saw women linguists with infants in arms, and we know that one of them was "Roseanne" (she mentioned in one of her posts that she had her baby with her: Chomsky smiled at the baby). How should I truthfully answer if anyone ever asks me, "Do you know who Roseanne is?" I know her nom de net, and some of her opinions and ambitions and interests, and her probable real first name, and her location over last weekend. I know Mark and I were in the same room with her, several times. And yet in a sense I have no idea who she is, and I have no way to find out. She saw Mark and me and could identify us (we had our full real names on badges on our chests), and we probably saw her, and her badge (you had to wear your badge to get into the big-name lectures), but that doesn't enable us to identify her. Despite being out in the open amid a thousand people and wearing a name badge, she was able to watch us from a position of complete privacy, like a mountain lion hidden in the undergrowth. Woman of mystery.

Posted by Geoffrey K. Pullum at 04:11 PM

Reports on the LSA meeting

Rosanne at the X-Bar has several nice posts about the just-ended LSA meeting in Boston.

Quite a few of the Language Log regulars were there, but so far we've only managed to report on peripheral things, like the ADS "word of the year" process and the drinks menu at Legal Seafood. I expect that we'll do better over the next few days, as time permits, but for now I'll simply remind our readers that we stand ready, as always, to refund your subscription fees in full.

Posted by Mark Liberman at 10:29 AM

January 11, 2004

The Legal Treatment of Quantifiers

Geoff Pullum and I had dinner Saturday night at Legal Sea Foods with a couple of other friends. The menu of desserts and after-dinner drinks featured "Tasting Flights" of various types of fancy alcohol, such as port, cognac, single-malt scotch and so forth. Each "flight" consisted of three different instances of one type (e.g. three single-malts from Skye, or three Martell cognacs), and the menu promised "Three 1 oz. pours of each".

All of us at the table agreed, in our professional capacity as native speakers of English (two British, two American), that this unambiguously promises nine ounces of drink. While this quantity seems more likely to lead to a crash than a flight, at least in a single customer already fortified with a glass or two during dinner, we felt that it would work out well if somehow shared among the four of us. Needless to say, our server maintained that the correct interpretation of this menu item is a total of three ounces of drink, one ounce of each kind.

It's clear that pragmatics was on the server's side, even if semantics was on ours. As it turned out, she was a speech therapy student at a local university. Therefore she was pleased to learn that we were attending the Linguistic Society of America meeting, and that Geoff is co-author of the Cambridge Grammar of the English Language, but she knew enough not to fall for an appeal to linguistic authority. So Geoff had a single glass of Laphroaig, party animal that he is, and I had a cup of coffee.

However, I continue to think that we were right. Here's a pragmatically identical example that helps make the point. The website for the '21' Club offers a similar "flight" of fancy alcoholic beverages:

"Three-ounce servings of First Growth Bordeaux will be available for $45.00 a glass, or $80.00 for a flight of the two wines being offered (three ounces of each will be served)."

The '21' Club has got its pragmatics in tune with its semantics -- I read this offer as promising that you get 6 ounces of wine for your eighty bucks, and I'd be happy to bet anybody $80 that I'm right (loser pays). Legal Seafoods needs to trade up to a better semanticist-in-residence, who would advise them to promise that "a one-ounce pour of each of the three selections will be served", or something similar, rather than promising "three 1 oz. pours of each." Perhaps this is the menu-writer's equivalent of misleading journalistic concision, but there was plenty of white space on the menu...

And by the way, who decided that a serving of several different beverages of a certain class should be called a "flight"? The closest thing in the OED's entry for flight seems to be sense 8:

A collection or flock of beings or things flying in or passing through the air together: a. of birds or insects. Also the special term for a company of doves, swallows, and various other birds.

but since when are scotches and ports classified as birds or insects? And does this term apply only to fancy alcohol, or would (say) a small diet Coke and a small diet Pepsi be a "Cola flight"?

[Update: Geraint Jennings writes to point out:

"flight" as in "flight of drinks" - a calque from French "volée", I'd have said, via the art of the sommelier. Perhaps using the borrowed "volley" might give a better idea of the effect!

Tonnerre de Brest! But of course! "Flight" has a gentle, refined character that is somehow missing from volley much less barrage, which seems like a better translation in phrases like "volée de coups". Legal Seafoods is a no-nonsense kind of place. When I first ate there, the restaurant part consisted of a wooden picnic table at a fish store, where you could eat your take-out fried clams. Even now, their "tasting flights" are a bargain compared to 21's -- the prices were about $12-$25 for three drinks, not $80 for two. So maybe they should switch to "tasting volley" or even "tasting barrage", and show the power of reverse snobbery.

I'm reminded of an interaction at dinner in an upscale Los Angeles restaurant with Jean-Roger Vergnaud. The waiter delivered a long, poetic description of a wine that Jean-Roger had chosen, including the phrase "with a hint of earth in the nose." Jean-Roger paused for a carefully calculated moment, and then pointed to another choice. "And what about this one? Does it also have dirt on its nose?"]

Posted by Mark Liberman at 06:37 PM

LTSN Subject Centre for Language, Linguistics and Area Studies

This looks like it leads to all sorts of useful things, though I haven't had time to explore it much.

The British Academy Portal also looks interesting.

[Links via Damien Hall]

Posted by Mark Liberman at 05:37 PM

Contemporary cliquonymics?

The whole emo thing reminds me that I'm way out of touch with today's terminology for American middle-school and high-school cliques. My older sons graduated from high school in the mid-80s, and my younger son is in the second grade, where the groups are starting to find their identities but don't yet have names.

Please write to me and let me know what the groups are like and what they're called in your part of the world. If you know any web sites devoted to cliquonyms, or scholarly works on cliquonymics, let me know about that too. I'll also ask some of the freshmen here at Penn, when I see them at Tuesday's study break. I'll post a summary of what I've learned, probably some time next week.

I already know Penny Eckert's classic Jocks and Burnouts, so you don't have to send me that reference. But Penny's book was published in 1989 and reflects even earlier field work, and in this kind of culture, it's clear that 15 years is a long time.

Posted by Mark Liberman at 01:18 PM

January 10, 2004

History of emo

The word emo may have been new to me, but it's been around for years. This is not a surprise, since subculture words usually percolate for a while -- sometimes a long while -- before outsiders learn about them. I've gotten quite a few messages filling me in on the history.

T. Carter Ross writes:

Thanks for the Language Log post about "Emo." However, calling it a new word may be a bit relative: It definately broke into wider consiousness with Dashboard Confessional's *Places You Have Come to Fear the Most* in March 2001; the single "Screaming Infidelities" was all over modern rock radio and MTV and it crossed over to top 40/CHR and some adult contemporary stations. And now you can't turn on the radio without hearing Coldplay.

Well, I can, but you knew that already. T. Carter goes on to say:

The style of music, however, can be traced back to Hüsker Dü, who pionnered the hardcore sound mixed with confessional lyrics, but it didn't really gell into emo until the mid-1980s with Rites of Spring.

Here -- http://www.jimdero.com/OtherWritings/Other%20emo.htm -- is an article from 1999 talking about the term and the music. And here -- http://www.angelfire.com/emo/origin/ -- is a pretty well cited history of the genre.

Thanks again for the interesting article,
T. Carter Ross

I'm grateful to Mr. Ross for cluing me in.

Several others wrote about this term, including Kristina Spurgin:

Even within the emo subculture there appears to be no agreement as to what, or who, is or is not emo (from the music and band discussion boards I have seen). I have known about the word for several years now and still have not been able to satisfactorily clarify for myself exactly what it is supposed to indicate.

On his webpage, Gregory Williams writes:

Heehee... look at the linguists (and journalists, indirectly at the NYT) grappling with the discovery of "Emo."

I find it all mildly amusing that this is happening over a term that is common in my world, and that has been around for (I believe) nearly fifteen years.

We're always happy to provide entertainment to our readers, Gregory.

Posted by Mark Liberman at 11:35 PM


Emily Nussbaum has a nice piece on teen weblogs in today's NYT magazine. She talks about the role that blogging is coming to play in the emotional and social life of American adolescents:

For many of the suburban students I met, online journals are associated with the ''emo'' crowd -- a sarcastic term for emotional, and a tag for a musical genre mingling thrash-punk with confessionalism. The emo kids tend to be the artsy loners and punks, but as I spent more time lurking in journals and talking to the kids who wrote them, I began to realize that these threads led out much farther into the high school, into pretty much every clique.

On a sunny fall day, M. and his friends were hanging out in front of a local toy store, shooting photos of one another with digital cameras, when a group of three girls sashayed by. They sported tank tops, identical hairbands and identical shiny hair. I walked over to them and asked if they have LiveJournals. ''No,'' one said. ''We have Xangas.''

They were all 15, around the same age as M. and his friends. But the two groups had never read the other's posts. M.'s crowd was emo (or at least emo-ish; like ''politically correct,'' ''emo'' is a word people rarely apply to themselves). These girls were part of the athletic crowd. There was little overlap, online or off. But the girls were fully familiar with the online etiquette M. described: they instant-messaged compulsively; they gossiped online.

These nicknamed cliques have been a stable feature of the American adolescent scene for a long time, though the names, symbols and prototypical characteristics are always changing. Greasers, burnouts, yutzes; collegiates, socials, jocks; and dozens more over changing time and space. And now one that's new to me: emo.

I'm not the only clueless one -- the real word mavens seem to have missed it too (though there was an ADS list posting about jamband whose citation included emo). It wasn't on the list from yourDictionary.com, nor did anybody at the ADS "word of the year" discussion bring it up. But it sounds like a winner to me. Google has 1,010,000 "emo" pages in its current index, starting with this one. Some reference the township of Emo, Ontario; some deal with the comedian Emo Philips; others have something to do with an exhibition of European Machine Tools, or the Emo Oil company. But based on sampling ten page-fuls or so, it looks like more than half of them are "emo the music genre" or "emo the teen social group." By comparison, goth (which is well known even to people like me, and has been around a lot longer) only has a bit over 2 millions hits.

Nussbaum says that emo is "sarcastic" and "a word that people rarely apply to themselves." However, that doesn't seem to be true across the board. The musical reference seems to be used by fans, not by detractors. And Ebay has more than 14,000 items for sale that mention "emo" in the title: "OLD PEOPLE SMELL FUNNY EMO VINTAGE T-SHIRT L"; "Extra Thick Hemp Choker Necklace EMO Goth"; "Vintage Emo Punk Indie WORKSHIRT 'Nadine' XL"; "Punk Emo Retro Skater Hawaiian Shirt XL", "emo grrl Vintage Borg era Fila Tennis Skirt", and on and on. This isn't sarcastic description from the outside, this is people marketing stuff using the label that they expect their buyers to self-identify with.

So let's sum it up: emo is an evocative, descriptive new word for a popular musical genre, also used to label an adolescent subculture or perhaps a family of similar subcultures. It's got at least half a million google hits so far (and as far as I can tell, google doesn't index the 913,000 active LiveJournal weblogs and similar sites). It'll doubtless fade eventually, just like greaser and burnout did. But I bet it grows for a while first.

[Update: see this post for some information about the 15-year-long history history of emo.]

[Update: Nicholas Widdows writes to correct my (mistaken) assertion that google does not index LiveJournal:

Oh but they do. A quick check for Googlewhacks on my own even tells me the date of their last spidering: between 30 December ("shotputting mug") and 2 January (*"shotputting memorization").

Apologies -- I checked strings from a few LiveJournal posts that are a few weeks old, and didn't find them. Maybe google doesn't index all of LiveJournal? I'll say more about this if I learn more...

Ask, and ye shall receive:

I just wanted to mention that the reason not all of LiveJournal is indexed by Google is that it's possible to configure one's individual LJ account so that search engines ignore it.

Some people don't want to be found.

I hope that helps,

--Naomi Parkhurst, a happy reader of Language Log

That makes excellent sense, and I should have guessed it. Several others wrote in with the same information. For instance, David Elworthy wrote:

In LanguageLog (which is great, BTW), you say: > Maybe google doesn't index all of LiveJournal? It's a choice made by each LiveJournal user. There is a setting to "Block robots/spiders from indexing your journal"; so it's not that google doesn't index it, it's that search engines in general don't. I leave this setting checked for my journal (http://www.livejournal.com/users/xxxxxxxx/), mostly because I largely post drivel that posterity is better off without.

I looked through a dozen or so of David's posts, and my considered opinion is that he's wrong -- he has interesting things to say, none of which seem likely to be embarrassing now or in the future, and he should let others find his stuff via search engines! But I've x-ed out his journal title in deference to his choice.

And Kian writes to point out that parental googling is a key issue:

Livejournal includes an option that keeps spiders from indexing your website. Many people young enough to be highly competent with the internet have parents who know you can use google to spy on your kids.

I have to say I am a huge fan of languagelog. I'm a 3rd year linguistics major at UCSD, and languagelog is daily reading for me. Anyway, hope this helps out.

Patrick Hall at (the terrific computational linguistics weblog) fieldmethods.net writes that:

For searching blogs, you might try daypop.com:


Lots of hits for "emo" if you set the search to 'Weblogs.' I'm not sure if Livejournal posts are indexed there, either, but then, Livejournal has often been called "the antimatter of the blogging world."

Patrick Hall

Jessica Skrebes writes with further clarification:

I recently came across your Language Log posting, perhaps not surprisingly through a link from LiveJournal. Simply in the interest of further clarifying the relationship between Google and LiveJournal, I thought i'd offer my own experience. Although it is possible to check a box asking spiders not to index your site, LJ offers the disclaimer that they do not necessarily do so, as I discovered when I googled my own user name. Additionally, even if the spiders ignore your site, should you post in a friends journal which google has listed, you're name becomes traceable. While I have no trouble with people finding my journal, it's unfortunately difficult to ensure that it remains hidden, should this be one's wish. Thanks for the definitions though, I thoroughly enjoyed both of your articles.

A significant percentage of the visitors to Language Log come by way of referrals from LiveJournal links, mostly RSS syndication in people's " friends views" -- recently it's been 10-15%. When I first noticed this a couple of months ago, I looked around LiveJournal a bit, and I was really pleased to find that quite a few LiveJournal users found Language Log interesting enough to syndicate it and comment on it. I was also impressed by the scale, complexity and intensity of the personal expression and social interaction at LiveJournal and similar sites. It's past time for this phenomenon to get the kind of media attention that Nussbaum's article represents.

Posted by Mark Liberman at 01:31 PM

And the last shall also be first

The American Dialect Society recently selected Metrosexual as its Word of the year. Hardly more than a week earlier, the same word topped Lake Superior State University's yearly "List of Banished Words" (or to give its longer name, the "List of Words Banished from the Queen's English for Mis-Use, Over-Use and General Uselessness"), which came out on Dec. 31, 2003. The LSSU list has been widely cited: {Lake Superior metrosexual} gets more than a thousand google hits, including many journalistic sites (since the AP newswire picked it up) and many weblogs. I saw the LSSU list (via blogdex) when it came out, but I had forgotten that metrosexual was at the top.

Here's a free clue suggestion for the nice folks at ADS. Networked digital communication is an important, even central part of the life of a growing number of American teens and young adults -- this article in today's NYT magazine cites a survey to the effect that

"...there are expected to be 10 million blogs by the end of 2004. In the news media, the blog explosion has been portrayed as a transformation of the industry, a thousand minipundits blooming. But the vast majority of bloggers are teens and young adults. Ninety percent of those with blogs are between 13 and 29 years old; a full 51 percent are between 13 and 19 ..."

This is having a significant effect on the social networks by means of which new vocabulary (and other kinds of language change) spread, and the effect is going to keep on growing for a while. Maybe you should get out in more ...

[Update: Let me try to say this again in a way that may be less likely to offend. The ADS community is a national treasure, a deep reservoir of linguistic scholarship. Some of them (like Grant Barrett) are among the most net-savvy people I've met recently, while others have very different kinds of skills and knowledge. Many of them are interested in questions that have nothing to do with the ways in which English may be changing now.

However, networked digital communication is becoming part of the fabric of adolescent social life in the U.S. (and around the world). This will have significant consequences for what happens to the English language over the next few decades, and it also gives us an unprecedented opportunity to study language variation and change across social networks. Therefore, attention should be paid not only by those who care about the developing facts of English, but also by those who care about about cultural evolution more generally. And as a minor footnote, those who take part in a "words of the year" exercise, even a whimsical one, really should get out into the net a bit more.

Posted by Mark Liberman at 10:22 AM

ADS word of the year is metrosexual

The American Dialect Society held a couple of sessions yesterday to nominate and select winners in various made-up lexicographical categories yesterday. The whole list, with some discussion, can be found at the ADS web site (though this link will probably change later).

I want to emphasize that no on should take this too seriously -- it's basically a publicity stunt on the part of the ADS, though it's one that many of the participants clearly enjoy.

I missed the voting, which was yesterday afternoon, because I got into a discussion with an old friend here at the Linguistic Society of America annual meeting. I did go to the ADS word-of-the-year nominations session. I only knew one of the regular participants (Larry Horn, who gave a great talk on "lexical pragmatics" Thursday evening at the LSA), but was made to feel welcome.

One thing that surprised me at the ADS "Word of the Year" nominations session was that very few of the participants had ever heard of the term fisking. I nominated it but there was no uptake. Only one of the 30 or so people in the room indicated any familiarity with the word at all, and that was Grant Barrett, the webmaster of the ADS site. He argued that the word is limited to a small circle of ("like 23") warbloggers, who use it in a self-conscious way intended to spread it, rather than as a natural part of their vocabulary, and that it was unlikely to spread outside that narrow group or even to last as an item of subculture vocabulary. Given that no one else in the room seemed even to have heard of the word, I let it drop.

I checked later, and wrote a note to Grant that read in part:

The term "fisking" gets 33,800 google hits. I checked the first 60 and found 50 different sites. The 20th page (200-210) still has 9 out 10 that are not among the earlier 50. I've sure that things start to repeat more after a while, but I'd be willing to wager the price of a good dinner that there are more than 10,000 google-indexed sites where the word is used.

There's only 1 hit in google's "news" index, which does indicate that there's not much uptake yet outside of the blogosphere.

But when you've got a word in active use by an active subculture of tens of thousands of people -- with an audience of millions -- then I think it's pretty sure to last.

So my own personal suggestion for word of the year is fisking. It provides a name for a new (or at least newly-prominent) form, the interlinear critique. This got started as something people did in email and became commonplace in newsgroups and bulletin boards, but there has never been a name for it in the past.

The core usage of the term among bloggers has been for political criticism from the right, but there is plenty of evidence that it is generalizing politically, and is also being used outside of politics and for non-textual forms of criticism. In a few minutes of searching, I found someone who writes about how "AL FRANKEN DELIVERS a mild fisking to aphorism-happy commencement speakers", someone else who uses the term to describe how "Travis Nelson takes Joe Morgan to task" for a column on baseball, and another case where someone writes

I've had a song called "Astley in the Noose" stuck in my head all day. Yes, it's not just a line in a Pop Will Eat Itself song, it's a not so gentle fisking of Rick Astley.

Metrosexual is somewhat ahead at present, with 57,100 google hits to fisking's current 35,400 (up 1,600 from yesterday!), but we'll see ...

[Update: if we add the 2,770 for fisked and half the 8,910 for to fisk (sampling suggests it should be closer to 80%, but never mind), we get more than 42,000 and rising. On the other hand, we need to add in the 14,900 hits for metrosexuals. Yes, this is foolish, but I'll check back on it from time to time anyhow.]

Posted by Mark Liberman at 07:26 AM

January 09, 2004

Bletchley Park in the Lateral Interparietal Cortex

If you're a fan of Alan Turing, and interested in the work at Bletchley Park on the Enigma decryption, and also interested in models of perception and models of learning, then this paper will bring it all together for you -- Banburismus and the Brain: Decoding the Relationship between Sensory Stimuli, Decisions, and Reward, by Joshua Gold and Michael Shadlen.

I've felt for a long time that the "Banburismus" stuff deserves more attention, both on its own terms and in its relationship to the birth of information theory. One reason for the neglect is that its details were classified until a few years ago. Also, Turing never had a chance to write about the general principles of Banburismus because he was busy with other things in the few years before his death, and previous descriptions of the work either focus on the cryptanalytic details or present the general ideas without enough mathematical substance to tell you how to apply them. Gold and Shadlen do a great job of explaining the general principles and why they're important, while also grounding the ideas in enough specifics for you to implement them if you want to. And they also show how to use this perspective to frame research questions about the neuroscience of perception!

There is still an untold story about the relationship between Turing's Banburismus work and the birth of Shannon's information theory. Most of the story will probably never be told, since all the people involved are dead, and there are probably no relevant documents. Turing spent a few months at Bell Labs in 1942-43, as the British representative on a team working out a method for encrypted voice communication, which was implemented in a system that Churchill and Roosevelt used for transatlantic conversations. Given what Turing and Shannon were like, I bet that the Banburismus stuff came up. I don't mean to take anything away from Shannon -- he was clearly one of the most important and original thinkers of the 20th century -- but it's nice to think that Turing had a hand in the origins of the mathematics of information as well as the mathematics of computation.

It's too bad that Gold and Shadlen's paper didn't come out before Neal Stephenson wrote Cryptonomicon, since its theme would be excellent source of dramatic metaphors. Stephenson's novels tends to nucleate around odd bits of intellectual history. Wilkin's "philosophical character" in Quicksilver; Julian Jayne's "bicameral mind" theories (about which more later...) in The Big U and Snowcrash; the Enigma decryption in Cryptonomicon.

There's another indirect literary spin-off from Turing's visit to Bell Labs. Stalin was very jealous of the vocoder, and set mathematicians and electrical engineers to work in a prison-camp laboratory to try to create one for him. One of the prisoners working on this project was Aleksandr Isaevich Solzhenitsyn, and his experiences there later became the back-drop to his novel The First Circle.

Posted by Mark Liberman at 08:40 AM

January 08, 2004

A short sharp slap for Dennis Overbye

I'd like to take a minute of Language Log time to slap Dennis Overbye real hard upside the head, if that's all right.

But first, a cordial word to the many good friends of mine who sent me the opening paragraphs of Dennis's article "Falling Physics, When the Weather Outside is Frightful", which appeared in the Science Times section of the New York Times (12/23/03, p. D3) and which Bill Poser recently commented on, all thinking I would be delighted with it: stop sending me this article, you idiots. All of you. Stop it.

And now to Dennis. Those who disapprove of violent punishment may choose not to watch this.

Dennis, your article about the physics of snowflakes begins with some boring crap about weather that turns one more time to the tired old nonsense about the Eskimos and their legendary snow vocabulary, only this time it's about New Yorkers, and all their snow words are unprintable, ha ha hee hee; oh, stop it, Dennis, I am laughing so-o-o uncontrollably (not!).

But it's worse than that. Your limp and worthless joke about having many words for snow that are all obscene expletives turns out not even to be original. A correspondent points out to me that this passage appears in Terry Pratchett's 6th Discworld novel, Wyrd Sisters:

The idea that Winter could actually be enjoyable would never have occurred to Ramtop people, who had eighteen different words for snow. All of them, unfortunately, unprintable.

So you didn't even make it up, Dennis. Whether you knew it or not, the stupid boring introductory paragraph that so many of my dear friends misguidedly mailed to me wasn't even original in concept.

And as for the original Eskimo version to which it obliquely alludes, this drivel about many words for snow has appeared in the Times so many times before. It was in an editorial on February 9, 1984 (Laura Martin pointed out that its claims were exaggerated in American Anthropologist in 1986, but nobody listened). Jane E. Brody used it on February 9, 1988 (I wrote to point out to her how silly this was but she paid no attention). It turned up in the Magazine on August 18, 1991 (mathematics professor Jim Lepowsky wrote to protest on August 19). My book The Great Eskimo Vocabulary Hoax came out that year, with its title essay publicizing Laura Martin's work, but Jane Brody didn't read it; she used the old chestnut yet again on March 23, 1993 (and the patient Jim Lepowsky wrote in to complain again on March 24)... But no one listens. The unstoppable flood of snow-word blather blunders brainlessly on.

Dennis, I want to make a suggestion to you about your use of hackneyed phrases in kit form to launch articles, and it's this: get a life. Think up some novel stuff. Don't be an indolent hack, use your left brain. Don't just make trips up the well-worn staircase to the attic full of dusty phrasal bric-a-brac that journalists keep returning to time after time after time.

Thwack! I hope that hurt.

That's it. I'm done. You can get up now. I'm off to the annual meeting of the Linguistic Society of America in frigid Boston. You should come too. Get some serious ideas about language. Next Tuesday I'll buy the Times for the Science Times section, and I'll look to see if there's anything by you. Don't let me see any crap about snow.

Posted by Geoffrey K. Pullum at 12:10 AM

January 07, 2004

Transexual, transsexual, and restricted Google searches

Natalja Schmidt wrote a thoughtful note to me from Germany on New Year's Day about my post Beware corpus fetishists, in which I discussed the curious fact (originally pointed out by Mark Liberman, citing this) that the incorrect spelling transexual is significantly more common on the web than the correct spelling transsexual. What she says merits some thought. Let me quote her in full.

Says Natalja:

Unfortunately you didn't say what settings you used for your Google search. I'm sure you are aware of the fact that the results can vary considerably depending on the Google configuration, especially language and country.

Per default, Google searches the entire web, ignoring what country the page is from or what language it's written in. In this case the results for transexual indeed outnumber the ones with the correct spelling transsexual, although I get a different number (correct spelling: 2.24 million, incorrect spelling: 2.83 million).

Obviously, sources in foreign languages must be excluded from the results. In Spanish, for example, "transexual", not "transsexual" is the correct spelling (And you get lots of Spanish pages if you use the default search settings). Another problem are foreign speakers of English (such as myself) who make many mistakes a native speaker would never make. These results shouldn't count either I think.

Unfortunately, even if you narrow down the search to English pages (i.e. pages in English) located in the US, the results are not much better, mainly because of the fact that English is an international language and because the "American" TLDs .com, .net, .org etc. are not only used by Americans. Another option are UK or Australian domains, since they are rarely used by foreigners. So I tried searching for UK/Australia pages. The number of incorrect spellings for UK and Australia was very small compared with the correct ones and compared with the results of web-wide searches, i.e. not restricted by language or country.

I guess what I'm trying to say here is that the "reservoir of error" in Google is not as giant as you think after all (if you know how to use it).

These sensible words of Natalja's are well worth keeping in mind, and for the most part I do not dispute them -- except for that very slight hint of an implication ("if you know how to use it") that I am a meathead web surfing moron who doesn't know how to Google his way out of a wet paper bag and believes fgrep -i on large text samples gives you a direct line to God; and I forgive her for any such suggestion, since there are so many meatheads out there, and who knows, I could easily have been one of them, rather than the sophisticated guru-class data wrangler and enemy of corpus fetishism (not to mention sexy super fun wild and crazy guy) that I actually am.

One can indeed tell Google to limit its searches in all sorts of ways to make the results more useful. There are some nice books on using Google in a sophisticated way; the wonderful O'Reilly Associates publishes a couple of them, the more technical of them called Google Hacks; I chose Google Pocket Guide by Tara Calishain, Rael Dornfest, and DJ Adams, which supplies enough. The results Natalja got with her restricted searches were these:

Jan 1, 2004, Google search results for transsexual / transexual



Any language or country


2,8 30,000

English, any country



English, USA



English, UK



English, Australia



Let me say a word first about the results I originally cited, and how I got them. I tried out a quick and simple way (perhaps too quick and simple). I went to GoogleDuel, a student experiment site that compares two given words or phrases with regard to their occurrences on Google, typed in the two words, and hit Go. No prizes for sophistication there.

Now, Natalja has used her human intelligence to figure out that if you restrict yourself to British and Australian sites, the correct spelling does come up as more frequent. But of course, she needed to know in advance what the correct spelling was, and she needed to use some of her general knowledge (like the number of Spanish pages, and the frequency of non-US-controlled .com sites) in order to come to her conclusion. When you use the web as a corpus (and that idea is the theme for the whole of the latest issue of the journal Computational Linguistics published by the Association for Computational Linguistics), you have to use it with care, and intelligence, and caution. And that, precisely the point I was originally making, is a point on which I think we agree.

Posted by Geoffrey K. Pullum at 01:07 PM

Eggcorns everywhere

Google has 62 hits for "eggcorn". We're number one, but most of the rest are authentic eggcorn mistakes, such as United Design Turtles' "Turtle Eating Eggcorn" figurine for $61.00.

This AQFL list message quotes the proverb that "[e]ven a blind squirrel will find an eggcorn once in a while."

In this chat log, Erica deals in a mature way with the embarrassment that results when Lindsay discovers her secret:

Lindsay: Shut UP!
Aj Loves Mindy
Erica: Someone hit her w/ an eggcorn??? wimp
Lindsay: eggcorn?? You mean acorn!
Erica: no, eggcorn
Lindsay: what the @#$*?!
Erica: EGGCORN @#$*?!
Lindsay: You are crazy
Lindsay: SHUT UP!!!!!! You're a Freak
Erica: EGGCORN!!!! I get knocked "Eggcorn" but I get up again eggcorn you ain't never gonna keep my eggcorn
Lindsay: AHHHHHH!
Erica: AHHHeggcornHHHH!
Lindsay: Dashing through the snow....
Erica: On a one horse open Eggcorn!
Lindsay: Shut up!

Finally, Thomas Irven art gallery has an actual cherry wood meta-eggcorn in the form of a lathe-turned box that "simulates a transition from an acorn to an egg."

Posted by Mark Liberman at 10:41 AM

Brenda Wineapple hates colons

Anita Samen says that book titles need colonoscopies. I think she might mean "colonectomies".

Douglas Armato says "we're sort of intermittently vigilant."

Kate Douglas Torrey says "it could be worse."

William Germano says that "[w]hat the colon does in black tie the semicolon does in khakis."

Willis G. Regier says that it's a reversion to the 18th-century practice that gave birth to Travels Into Several Remote Nations of the World. In Four Parts. By Lemuel Gulliver, First a Surgeon, and Then a Captain of Several Ships.

Read all about it: film at 11:00.

Posted by Mark Liberman at 07:29 AM

And the bead goes on

Yesterday I wrote about people who pronounce vowels the same way as vows. I'm not one of them, but like many English speakers, I've taken a step or two myself down the slippery slope towards turning syllable-final /l/ into a vowel -- what linguists call vocalization. The /l/ at the end of bell is still phonetically a lateral consonant for me, pronounced with the blade of my tongue in contact with the roof of my mouth. However, the /l/ in belfry has gone over to the vowel side, so to speak. If you were to record me saying belfry and play the first syllable back very slowly, it would sound like "beh-oh". When I say belfry, my tongue never makes contact with the roof of my mouth at all.

The fact that some people say vowels like vows doesn't in itself explain why they come to the strange conclusion that "wedding vows" are "wedding vowels". No one in Google's ken has written about "a hoarse of another color", perhaps because confusing an adjective for a homophonous noun is rare. Of course, we don't always owe an explanation for such mistakes, especially idiosyncratic ones. Egg corns and mondegreens and other lexical reshapings are sometimes pretty random: when someone hears "the girl with kaleidescope eyes" as "the girl with colitis goes by", I think we need to chalk it up to neural noise and move on.

However, "wedding vowels" is one of the cases where we can tell a pretty convincing story, at least after the fact. Using the word vowels to refer to a ritual promise looks like another example of synecdoche, the practice of referring to objects in terms of their salient parts (like jocks for athletes or hands for sailors) or their salient materials (like steel for a sword). If letters can stand for writing in "arts and letters", why shouldn't vowels stand for speaking in "wedding vowels"?

Well, because the expression is really "wedding vows." But the theory that it's "wedding vowels", while mistaken, is arguably common because it's poetically as well as phonetically and syntactically apt.

Another common poetic mistake is the substitution of beat for bead in the expression "get a bead on" or "draw a bead on". In Monday's New York Times, sportswriter Thomas George quotes Denver coach Mike Shanahan as saying about Peyton Manning:

"That was a great game plan and it was executed as well as I've ever seen. We came into a hornet's nest. Once he gets a beat on you, he is hard to stop."

I'm sure that Mike Shanahan is one of the great majority of North Americans for whom "gets a bead on" and "gets a beat on" are pronounced in exactly the same way, due to voicing and flapping of the word-final /t/ in "beat" before the initial vowel of "on". So the theory that Shanahan said beat and not bead came from the sportswriter, who spelled it, and not from the coach, who spoke it.

The original version of this idiom involves the word bead, for which the OED gives this sense:

d. The small metal knob which forms the front sight of a gun; esp. in the phrase (of U.S. origin) to draw a bead upon: to take aim at.

and credits the first citation for this sense to John James Audubon, who was as familiar with drawing beads as drawing birds:

1831 AUDUBON Ornith. Biogr. I. 294 He raised his piece until the bead (that being the name given by the Kentuckians to the sight) of the barrel was brought to a line with the spot he intended to hit. 1841 CATLIN N. Amer. Ind. (1844) I. x. 77, I made several attempts to get near enough to ‘draw a bead’ upon one of them. 1844 MARRYAT Settlers II. 206 ‘Now, John,’ said Malachi; ‘get your bead well on him.’ 1875 URE Dict. Arts II. 391 The front sight is that known as the bead-sight, which consists of a small steel needle, with a little head upon it like the head of an ordinary pin, enclosed in a steel tube. In aiming with this sight, the eye is directed..to the bead in the tube. 1919 Chambers's Jrnl. June 399/1 I'd got a lovely bead on her with one of my own torpedoes. 1929 G. MITCHELL Myst. Butcher's Shop xii. 132 You've got a bead on your man all right.

The commonest theory about this idiom still has bead: in Google's current index, the various forms of "draw(s)/drew/drawn a X on" and "get(s)/got/gotten a X on" have 14,995 hits for X=bead vs. 640 hits for X=beat. But Americans don't spend as much time looking at things over a bead sight as they used to -- even those who regularly use a rifle for hunting probably have a telescopic sight -- so this metaphor is getting old and stale.

The sportswriters seem to have stepped in with a fresh idea, making a new idiom out of an old one. In sports, the idea of getting a (musical) beat ahead of someone else makes sense -- marching to a different and faster drummer, so to speak. And it seems to be in sports where a significant fraction of the "get a beat on" examples come up:

"But, when Chance Mock's pass was slightly under-thrown, freshman Aaron Ross closed in quickly, got a beat on the pass and lunged to make a TD-saving breakup."

"They wanted the deep plays, the big plays early in the game. Fortunately, they didn't get them. We really had to sit back and see what was happening. Then we got a beat on what they were trying to do. We just tried to get after them."

"They're definitely the hardest team to prepare for in the NFL because they run so many different types of plays. Once you think you've got a beat on it, they'll change the whole playbook the next week. "

"Clark said his Eagles 'never really got a beat on' Shenandoah’s wing-T offense and that was most apparent during the Hornets’ opening drive."

"He’s got a beat on a sweet gig teaching youth hockey in Kiev."

But there are non-sports examples as well:

"Law's picture ends up in a lot of newspapers; Nazi intelligence gets a beat on him, and they send out their own master marksman (Ed Harris) to pick him off."

"Signature Move: Flying the Blackhawk BELOW the tree line through the streets of DC Lost Village straffing soldiers with the help of a gunner taking out as many Opposition soldiers as possible before a stinger gets a beat on me."

"Buffy seems rather lukewarm with the whole thing, but Spike says he's got a beat on two vamps in a warehouse who are probably responsible for the train incident."

"Getting a beat on someone" has another poetic resonance that may be inspiring some of these writers: you could interpret beat as "an edge" or a "a competitive advantage," a nominalization of the verbal sense "to defeat [someone]." The "faster rhythm" and the "competitive advantage" interpretations both work better with "get a beat" than with "draw a beat", and the pattern of co-occurrences is consistent with this:


Especially for journalists, there might be yet another association, with beat as a regular assignment and thus an area of special competence. I guess we could ask Thomas George and other sportswriters what they think they meant when they wrote "...get a beat on..." But I doubt that we can depend on journalists to be any better than poets are at explaining their ambiguities. And in the end, what matters to the development of the language is less what they meant to write than what we manage to read.

Posted by Mark Liberman at 04:43 AM

January 06, 2004

Schwarzenegger's State of the State speech: two comments

As I write this, I am listening to the first "State of the State" speech by Arnold Schwarzenegger, the new governor of the State in which I live, and newest member of the Board of Regents of the university for which I work (in my day job at the University of California, Santa Cruz). I have two linguistic remarks to make about the speech. One is phonetic and one is semantic.

First, after this I don't want to hear anyone shooting their mouth off with nonsense about him speaking broken English. (During the campaign Gray Davis accused the governor of not even being able to pronounce the name of California correctly, as if slight lowering and backing of the vowel in the first-syllable and raising and fronting in the second were some kind of phonetic mangling. That was a real low point in American politics. Gray Davis actually got my vote -- because I didn't think that removing a sitting governor after 11 months in office was a good precedent -- but he didn't deserve it. He's a phonetic idiot, and the comment made him sound like a bigot.) The fact is that virtually no English-speaking Americans ever learn to speak a foreign language as well as Schwarzenegger speaks Standard American English. I would strangle a kitten to be able to give a speech in any foreign language with the fluency and clarity that Schwarzenegger has in my language.

Yes, he has a light (Austrian-)German accent: voiced dental stops for voiced interdental fricatives ("de" for "the", "dis" for "this"), occasional [s] sounds at the ends of words that would normally end in a [z] sound, and things like that. [Added later: on second and third hearings, there was also an incidence of nasality lingering from the [m] of matters through the voiced intervocalic flap, making "I will not make matters worse" sound like "I will not make manners worse." But this may have been an articulation slip unrelated to accent. No one seems to have picked up on it or misunderstood it. I don't believe it was an eggcorn, and anyway, those are very common even among native speakers.] But this is spoken English of high quality, indubitably fit for public speaking. Anyone who can deliver a speech that well should be proud of their communication skills. Any further attempts by politicians and journalists to accuse Schwarzenegger of being a pidgin-speaking inarticulate foreigner will do nothing but exhibit the ignorance of the accusers.

Second, the movie metaphors do come through, but they're well judged for the audience. The first one I heard came at the 15th minute, where he said he didn't want to move boxes around in the organizational chart of government, he wanted to blow them up. There was applause, of course. Always a good idea to have a big explosion in the third reel. We like our special effects in California. It's one of our local industries.

Posted by Geoffrey K. Pullum at 08:45 PM

Suspicion of charges

This morning my local radio station reported on someone who had been arrested "on suspicion of gang-related charges". Now that's an example of a currently very common sort of linguistic mistake that I actually do object to and think should be corrected. The charges are a fact -- ask the sherriff. The suspicion is that the charges might be true (but that's what the courts are for: they will start from the assumption that the charges are false and let the prosecution attempt to show otherwise). News sources, concerned (very properly) to protect the rights of the accused, are overdoing it to the point of getting the truth conditions wrong, as has often been noted before. Linguists may sometimes appear to be (and are often accused of being) protective of all sorts of usages that other people call "errors". I offer this case as a reminder that it's not that simple. I don't regard use of a prescriptively condemned but colloquially widespread syntactic construction as linguistically culpable; but I do blame radio news scriptwriters for putting together a sentence that does not even state correctly whether an arrested person has been charged or not. So don't say that linguists never seem to treat anything that occurs as wrong. I don't want there to be a suspicion of those charges.

Posted by Geoffrey K. Pullum at 11:45 AM

Public Service Announcement: Wedding Vows are not Wedding Vowels

As I've mentioned in this space before, I occasionally check our server logs to see who is visiting us and what they're looking for. These logs show me the URL from which a visitor was referred to our site. About 30 to 40 times a day, the referring URL is something like


In other words, 30-40 people a day are finding our site because they are asking Google or Yahoo! or some other search engine to tell them about "wedding vowels" or "renewing wedding vowels" or "alternative wedding vowels" or the like. I'm convinced that nearly all of these people are planning to get married, or planning to renew their commitment to an existing marriage, not exploring funny word substitutions.

Their searches lead them to a Language Log post by Geoffrey Pullum citing "wedding vowels" as an example of a certain kind of linguistic error, or a jokey discussion by me that mentions one such search.

I wish these people well in their quest, and to help them on their way, I've edited our wedding vowels posts by adding the following announcement, right at the top:

Public Service Announcement: If you've come here because you're interested in solemn promises of faithful attachment in marriage, and you've searched for "wedding vowels", you really should make this search for "wedding vows" instead. A vow is "a solemn engagement, undertaking, or resolve, to achieve something or to act in a certain way." A vowel is "a speech sound produced by the passage of air through the vocal tract with relatively little obstruction, or the corresponding letter of the alphabet", usually contrasted with consonant. Your vows will need to contain both vowels and consonants. I wish you all the best in your ceremony and in your life together!

There is a linguistic point here on which I'm willing to be entirely prescriptive: people who think that a marriage ceremony involves the exchange of vowels are making a mistake. (Well, vowels are part of their ritual statements, but you know what I mean.) There are many dialects of English that fully vocalize syllable-final /l/, turning it into a high back off-glide, and for speakers of these dialects, vows and vowels have merged phonologically. They've become homophones. However, that doesn't make "wedding vowels" a legitimate variant. For /l/-vocalizers, the distinction between vows and vowels is like the distinction between beats and beets -- an arbitrary convention of spelling that they need to learn. Even if /l/-vocalizing became as widespread as the merger of hoarse and horse is -- and that may be where things are headed -- this wouldn't change. I don't make any distinction in pronunciation between hoarse and horse, but if I write about "riding a hoarse", I'm making a joke or a mistake.

Standard English spelling really is prescribed. It's a set of artificial social conventions that change only very slowly. The resulting system has many problems, especially for learners; there are a few regional differences (e.g. -our vs. -or); there are some corners of the culture such as hip-hop lyrics and instant messaging that manage to develop their own conventions; but basically we're stuck with it. This is not a necessary condition -- Elizabethan spelling was not standardized, and writers and their readers got along fine. However, things are different now. English spelling is frozen, and it would take the social equivalent of a hydrogen bomb to make any big changes.

Pronunciation, on the other hand, continues to be a matter in which local speech communities are free to go their own way. In some societies, there are standard ways of talking that are defined in terms of the practice of elite communities -- the Queen's English, the language of the court. But in modern America, there are many potential models, and by no means any popular consensus that we should have a single pronunciation standard, much less any agreement about whose pronunciations should be privileged. I don't personally see any reason to change this -- our welter of accents works as well for us as the variety of Elizabethan spellings did for Shakespeare, even though it can sometimes lead to miscommunication.

There may be a few people who pronounce vowels and vows differently but get confused about which is which. For them, using vowels when they mean vows is just a malapropism, like using epitaph for epithet. Here the politics are somewhat different. If a malapropism becomes common enough, the meaning of the words might simply change, as word meanings do all the time. This appears to be in the process of happening for fulsome in the sense of abundant, fortuitous in the sense of fortunate, and infer in the sense of imply. In the early stages of such a change, it's just a sporadic mistake, and sometimes it never goes any further than that. In the middle stages, it gets to be a sort of battle over what the conventions should be. Things get confused because prescriptivists are often bad historians -- they don't distinguish between variant usages that are innovations, like fortuitous, and those that are hold-outs, like notorious. Whatever the historical details, this is just a struggle over a kind of social convention that often changes, sometimes fairly rapidly. There may be a quick winner and loser, or the struggle may go on for centuries. The one thing that's certain is that trying to keep word meanings fixed over time is not a matter of principle. It's not even possible, and it wouldn't be a good thing if it were. We each have our own opinions about how words should be used -- it give me the willies, personally, when someone uses infer to mean imply -- but as linguists, we have no dog in that fight.

This seems to lead to a contradiction. Spelling is frozen, but meanings are not. So we can't decide spell vows as vowels, but we could perhaps decide that vowels means vows? Well, I don't believe that either change will become more than a sporadic error. But there's no contradiction in any case, because there's no fundamental principle involved. Anglophone society could decide to change its spelling conventions -- it's just a fact of life that this hasn't happened much over the past couple of hundred years, and doesn't seem likely to happen much now. Regions of the anglosphere can also decide to change their conventions about word senses -- and the fact is that this happens all the time.

[Update 1/26/2004: A reader has pointed out that the word avowal is no doubt part of the pattern that results in the vow/vowel confusion. (myl)]

Posted by Mark Liberman at 07:35 AM

Secret Annual Cabal - Reminder

In a previous post, Geoff Pullum rebutted the ridiculous idea that professional linguists hold "secret annual cabals" and pointed out that anyone can attend the Annual Meeting of the Linguistic Society of America, an organization that anyone can join. In case there is anyone out there who still believes such nonsense, or who missed the meeting announcement, let me take this opportunity to remind everyone that this year's Annual Meeting will take place in Boston starting this Thursday. Information is available at the LSA web site.

Posted by Bill Poser at 12:23 AM

Baby signing

Judith Berck writes in the NYT about teaching babies to sign before they can talk. The piece opens with an anecdote about a baby from Beaverton OR, presents a few sentences from interviews with Elizabeth Bates (last fall, before she died), and then features the "Babysigns" research of Acredolo, Goodwyn and others, who claim that early use of signing with (hearing) children leads to "faster verbal language development", and even "an advantage of 12 I.Q. points". Babysigns was an industrial-scale enterprise by 1997, and so it's a bit surprising that the NYT writes about this as if it were a new development, some six years after the authors were on Oprah. The article also cites Joseph Garcia, another popular success whose work on signing with infants goes back to 1987. There are new publications, as always, but basically this is a brief review article presented with a misleadingly newsy sort of flavor.

Let me say first that the Babysigns phenomenon is basically all to the good, in my opinion. Here's some research on language that's very popular, that engages some interesting ideas, that is surely doing no one any harm and may be providing some real benefits. It's also a Good Thing that a major newspaper is publishing a piece about it.

However, it's hard not to wish for more from an article like this: some kind of historical sense about the work, and some slightly deeper engagement (if only through hyperlinks...) with the material. For example, the article quotes Bates as saying that "[r]ecent work in neuroscience has shown that the areas in the brain that control the mouth and speech and the areas that control the hands and gestures overlap a great deal and develop together." That's all -- no indication of whose recent work it is, where one can go to learn more, etc.

As another example, consider the relation of the Babysigns results to the work by Tomasello et al. on the role of "episodes of joint attention specifically focused on topics of immediate interest to the child" in speeding language acquisition, and the suggestion that "the effect of symbolic gesturing on verbal development is ... mediated at least in part by increases in the infant's effectiveness at initiating joint attention". This is not hard stuff to understand, it's really thought-provoking, and it's something parents and other caregivers should know about. But it's not mentioned at all in the NYT piece.

Posted by Mark Liberman at 12:07 AM

January 05, 2004

Snow in New Yorkish

The efforts of anthropologists and linguists starting with Laura Martin and including our own Geoff Pullum to debunk the claim that Eskimos have an inordinate number of words for snow appear to have had an impact. The theme is evidently too attractive to be discarded, so Eskimos have now been replaced with New Yorkers.

A piece by Dennis Overbye in the New York Times ("Falling Physics, When the Weather Outside Is Frightful", New York Times Dec. 23, 2003) begins:

According to legend, New Yorkers have hundreds of names for snow, depending on whether it is the stuff under the spinning tire of a car trying to escape being plowed in, the puffy or sticky mound on your snow shovel just as you begin to ponder the statistics of heart attacks, the streaks flying like sprites across an airport runway or the missiles stinging your face as you trudge up an urban canyon under a load of packages, the goop of suspicious integrity lying in wait as you step off the curb.
Unfortunately most of these linguistic riches are unprintable here.

For some actual facts on snow words in Eskimo, check out Tony Woodbury's discussion of Central Alaskan Yup'ik and Stuart Derby's list for West Greenlandic. In case anyone is interested, I've written a short piece on The Solid Phase of Water in Carrier [PDF file], which discusses snow and ice and such in a very different language spoken in a cold place. I don't know of any reliable information on words for snow in the language spoken by the aborigines of New York. What language that might be is unclear, judging from the mixture of languages and cuisines in the restaurant in this photograph, which I took on my last expedition to New York.

Signs on the wall of sushi restaurant  in New York City

[The Hebrew says "Shalom Kosher".]

Posted by Bill Poser at 08:15 PM

Psycholinguistics career options

People sometimes ask me what they can do after graduate school in Cognitive Science.

Posted by Mark Liberman at 04:13 PM

and uh -- then what?

There's a piece in the 1/3/2004 NYT, featuring recent research on disfluencies by Liz Shriberg, Herb Clark, Jean Fox Tree and others. This is an area where a lot of good work has been done over the past decade or so. Predictably, the writer is most impressed by Nicholas Christenfeld's 1991 finding that "humanities professors say you know and uh 4.85 times per minute, social scientists 3.84 and natural science professors 1.39 times", and that "drinking alcohol reduces ums." (Christenfeld seems to have a flair for catchy research -- he's also known for studying whether a machine can tickle.)

One of the things that I like about disfluency research is that it has produced some exemplary collaborations between psycholinguists and engineers, especially in the work of Andreas Stolcke and Liz Shriberg. As an example of how this interplay works, I'll describe one of their early papers, "Statistical language modeling for speech disfluencies". Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 405-408, 1996 (HTML, PDF). They test the hypothesis that conversational transcripts would be more coherent (from an information-theoretic point of view) if disfluencies such as filled pauses (ums and uhs) were removed.

They trained a trigram model on 1.8M words of Switchboard transcripts, and tested on 17.5K words of held-out transcripts, comparing a model in which the filled pauses were edited out with one in which they were left in place. They did this in two different ways, in one case dividing the conversations into "phrases" based on the occurrence of pauses, and in the other case dividing the conversation into "phrases" on the basis of linguistic content. The initial and final "phrase boundaries" (however they are defined) function like words in the sequence, so that after the final word of a phrase, the thing to be predicted is the phrase end, rather than the first word of the next phrase. Likewise, the first word of a phrase is predicted based as following a phrase boundary, rather than following the final word of the previous phrase.

In both cases, the measure of coherence is the local perplexity, which is 2 raised to the power of the local entropy. This is a way of quantifying how predicable the next word is. It's simple to calculate this, given a statistical model of word sequences. Let's say we want the perplexity of the word immediately following (all the examples of) uh in the test data. For each such word wi we estimate its conditional probability pi (given the previous two words in the text), and across all the uhs (about 500 in their test), we average the quantity -log2(pi). This average is an estimate of the local entropy e (with respect to the statistical model), and the local perplexity is just 2e. For those who aren't familiar with this measure, it may help to note that if N different words are possible at a given point, and all of them are equally likely, then the perplexity at that point is N. The information-theoretic perplexity is just a way of keeping track of the degree of uncertainty -- the effective "branching factor" -- when the alternatives are not equally likely.

Stolcke and Shriberg's test was to compare the results of statistical language models prepared with the uhs and ums left in as 'words', with the results of otherwise-identical models with the uhs and ums left out. On the hypothesis that the uhs and ums are not really part of the message, the model should make better predictions if we leave them out. However, using the acoustic segmentation, Stolcke and Shriberg found that the perplexity immediately after the uh or um was significantly increased in the "edited" model, not decreased:


In contrast, if they divided the conversation up on linguistic grounds (i.e. based on the syntax and semantics), and looked only at phrase-medial filled pauses, the edited model was a better predictor (i.e. gave lower perplexity):


You should be able to see what happened. When the phrasing was pause-based, the uhs and ums were often phrase-final. So when you see an uh, you have a good chance to predict a (pause-based) final phrase boundary right after it. If you edit out the uh, you lose that predictive ability. But if you divide phrases on the basis of linguistic structure, the uh will generally not be phrase-final, and the word following the uh will usually be pretty high entropy -- after all, the speaker is emitting an uh before dredging it up -- and you'll have a slightly better chance to predict it based on the preceding two words than based on the uh and one preceding word.

This is a good example of why purely mechanical applications of statistical analysis procedures can be misleading. When Andreas and Liz first did this work (at the Johns Hopkins summer workshop in 1995), they first thought that the pause-phrasing results showed that disfluencies are really carrying information about the word sequence. However, being smart, sensible and careful researchers, they went on to look more closely at the situation, with the results that you can read in the cited paper.

There has been a lot of work over the years suggesting that disfluencies are often really communicative choices rather than system failures. I have a favorite anecdote about this. Former New York mayor Ed Koch has (or used to have?) a radio talk show, which I would sometimes listen to in the car when I lived in northern New Jersey, back in the neolithic era. Though highly verbal and even glib, Ed is a big um-and-uh-er, to the point that he would often introduce himself by saying "This is Ed uh Koch." Since it's not credible that he was having trouble remembering his own last name, I concluded that he often used a filled pause as a sort of emphatic particle.

Ideas like this would have made it easy to interpret the first (pause-based) results that Andreas and Liz founds as confirming that filled pauses are communicatively significant. They are, no doubt about it, but not in the sense that they help a trigram model to predict the words that follow them. As Dick Hamming used to say , "beware of finding what you're looking for." (I haven't been able to find a web link for this aphorism, but you can find some other good advice from Hamming here.) Liz and Andreas were (and are) really interested in the foundational questions about this problem, and so they didn't just go for the quick score, but probed their results carefully, re-did the analysis in other ways, and made a solid contribution rather than a flashier but more ephemeral one.

Andreas, Liz and others have gone on to learn a lot more about the science of disfluency as well as about how to solve the engineering problems involved in recognizing and understanding disfluent speech. It's too bad that (as far as I know) linguists who study syntax, semantics and pragmatics have not been involved in this enterprise to any significant extent.

Posted by Mark Liberman at 10:16 AM


Take a look at Teresa Dowlatshahi's Shoecabbage, if you haven't already seen it. It's fascinating that a nationally syndicated cartoon can be created out of an illustrated exploration of accidental phonetic similarities between words of other languages and words of English. Don't get me wrong, I'm happy to see linguistic content get featured in the popular press for any reason at all, and it's especially nice to see Dowlatshahi's slightly eccentric 25-year-long quest for cross-language homophones turn into a popular success. Let's just say that this is one of many real-life events that I would find implausible in a novel.

My favorite (in the current 14-day window accessible without paying $9.95 to uComics.com) is this one.

But I'll be really surprised and impressed when there's a popular syndicated cartoon whose captions are given as interlinear text.

Posted by Mark Liberman at 07:18 AM

New words from Silicon Valley

The San Jose Mercury News had a competition to make up new tech-related words for the vocabulary of Silicon Valley English, the constraint being that they had to differ by only one letter from a real word or lexicalized phrase. The results are out (Sunday, January 4, 2004), and some of the top-rated entries are quite cute. Among my favorites are these, some of which one can almost imagine catching on, despite how difficult it is to coin words that catch on:

banalysis: the minutiae to which television news viewers are subjected each time a major news story breaks. [Bruce Levin, Santa Cruz]

deaficit: a budget problem no one wants to hear about. [Michael and Janet Singer, Santa Cruz]

downloafing: surfing the net when you should be working. [Michael Vaughn, San Jose]

denture capitalist: financier whose agreements lack real teeth. [Ken Braly, San Jose]

egosystem: the self-sustaining collection of yes-men and sycophants who orbit around sports stars, celebrities, and various executives. [Katie Fitzgerald, Felton]

For more, look here.

Posted by Geoffrey K. Pullum at 01:19 AM

January 04, 2004

Ordinary language philosophy of language: not a good idea

Once upon a time there was something called ordinary language philosophy. It is generally taken to have been born in the later Wittgenstein as instanced by his Philosophical Investigations, which came out in the 1950s (though the ideas had been taught at Cambridge and Oxford earlier). The idea was that philosophy could be done on the basis of ordinary language. For example, you could do epistemology (philosophy of knowledge) by simply investigating the truth conditions of English sentences with the verb know, and the fundamental questions about the bases of ethics were to be resolved by reflection on the ways in which we use such words as ought. The idea was that the only true philosophical insights were already embedded in the plain common sense that our language has incorporated during its evolution over centuries of use: "The proper function of philosophy is to map out the logical geography or our conceptual schemes", as The Philosophy Pages puts it in describing the views of ordinary language philosopher Gilbert Ryle.

Maybe ordinary language philosophy is faintly plausible for some areas (it did not really survive; it was already under attack by the 1960s); but it seems to me that it would be particularly ridiculous when applied to the philosophy of language itself. The folk wisdom about language that seems to be embedded in English, the phrases that the general public uses to talk about language, suggest that philosophy of language done by mapping out the geography of the conceptual schemes revealed in the way we ordinarily talk about language use would be a complete crock. I can give two very simple examples.

The first stems from the expression to call someone names, which refers to the use of insulting epithets. If I say to you, "Dickhead! You forgot to get milk!", you would say that I had called you a name for forgetting the milk. But of course, this misses the distinction between names and descriptions. If your name is Dickhead and I call you that, it is not insulting. What is insulting is for me to implicitly describe you as a dickhead by using that form of address. A name is exactly what the insulting description dickhead isn't.

The second example is much broader but just as simple. People talk about language as if words express propositions and bear truth values. They don't; sentences do. But hundreds of phrases illustrate that the lay mind as recorded in everyday idiom believes otherwise:

I don't believe a word of it.
I'd like to have a word with you.
What's the word on this reorganization plan?
Give me your word that the check is in the mail.
Some harsh words were said last night.

etc., etc., etc. -- it's a useful exercise to try lengthening the list. Words are of course not true or false; they don't convey messages or state promises; and even in the case of harsh or obscene language, words are not the point. You haven't said a bad word if you say

Some people believe that if you curse a man you can damn him to hell.

But you have if you say

Some people believe that we should be grateful to Jack but I say damn him to hell.

In the first case you are just giving an inoffensive description of a superstitious or religious belief; in the second case you are swearing; but the relevant part (the last four words) is the same in each case. Swearing is not about use of bad words, it's about deploying them in utterances having conventional understandings as oaths, imprecations, or tabooed expostulations, God damn it.

The theory of language that seems to be implicit in everyday English is disastrously naive and stupid, and if elevated to philosophical dogma through ordinary language philosophy it would have reduced the philosophy of language to absurdity.

Which may be why no one ever seems to have tried it.

[Note added later: Cool philosophy dude Brian Weatherson has informed me that the point about idioms like I want a word with you has been made elsewhere: in "The Cult of Common Usage" by Bertrand Russell, in the British Journal for the Philosophy of Science. In 1953, actually. (Hey, don't give me that pitying look! I'm so-o-o supposed to know about, like, every freaking paper in every freaking philosophy journal since the coronation of Queen Elizabeth the freaking second, is that it? All right. So shoot me.)]

Posted by Geoffrey K. Pullum at 08:10 PM

Divine ambiguity

Geoff Pullum notes that Pat Robertson has attributed to the Almighty an English idiom stereotypically associated with adolescent American females, namely the hedging discourse particle like: "It's going to be, like, a blowout election in 2004".

Geoff suggests two alternative hermeneutic approaches to the sociolinguistics of this revelation: "[t]his may indicate, surprisingly, that God uses a younger-generation dialect in his communications with the older generation, or it may indicate a preference for communicating with people in their native dialect." I'd like to suggest that there is textual evidence favoring a subtly different view: a sort of linguistic transubstantiation whereby the Lord's phrasing is ambiguous between the language patterns of the the older and the younger generations, with the same meaning in both construals.

In this 1998 interview with Robert Duvall, Pat Robertson himself uses an apparently similar hedging like:

PAT: Let me ask you about this movie. It is really a powerful piece. Who did you have as a model? It's like you modeled after somebody or was it a composite?

However, this is not really the version of hedging like used by the young woman quoted in Muffy Siegel's paper "Like: The Discourse Particle and Semantics" (J. of Semantics 19(1), Feb. 2002):

"... her and her, like, five buddies did, like, paint their hair a really fake-looking, like, purple color."

The crucial difference is that the likes in Muffy's examples can be omitted without injury to the basic syntactic framework of the sentence

"... her and her five buddies did paint their hair a really fake-looking purple color."

whereas in Pat Robertson's phrase, the like serves to introduce the clause "you modeled after somebody", making it suitable for use as a complement to is. Without this like, the sentence falls apart:

"*It's you modeled after somebody ..."

Words that serve this sort of function can typically introduce either a clause or a noun phrase:

It was after I arrived.
It was after my arrival.

It was like the scales fell from my eyes.
It was like a revelation.

This pre-nominal use of like is syntactically dispensible, in the sense that if you leave it out, the sentence is still OK, although the meaning changes somewhat, in the direction of being more forceful and unqualified:

It was a revelation.

To "be X" is a much stronger statement than to "be like X". So you can always weaken a statement of the form "Y is X" (where X and Y are noun phrases) by sticking in a like in front of X. Of course, you could always weaken the statement in other ways too, by inserting any one of various words and phrases with adverbial force:

It was practically a revelation.
It was more or less a revelation.
It was, if you will, a revelation.
It was almost a revelation.
It was sort of a revelation.

Presumably this is how like was first bleached semantically into a mere hedge (in some uses), and then re-interpreted syntactically as a particle that can be inserted almost anywhere to "signal a possible slight mismatch between words and meaning". The semantic bleaching has certainly been around long enough for Pat Robertson to be familiar with it: the OED cites "1500-20 DUNBAR Poems xix. 19 Yon man is lyke out of his mynd."

On this analysis, God's phrase

"It's going to be like a blowout election in 2004."

is a prototype of syntactic change: the same word sequence can be interpreted in one way by Robertson's generation, and in a different way by the generation of his grandchildren. And yet, miraculously, both interpretations mean the same thing: the 2004 presidential election will have some but perhaps not all the characteristics of a blowout.

Posted by Mark Liberman at 04:57 PM

January 03, 2004

Exclusive: God uses "like" as hedge

My friend Tom Hukari in Canada points out this recent news item in USA Today:

Norfolk, Va. -- Religious broadcaster Pat Robertson said yesterday he believes God told him U.S. President George W. Bush will be re-elected in a November "blowout."

"I think George Bush is going to win in a walk," Mr. Robertson said on his 700 Club program on the Virginia-Beach-based Christian Broadcasting Network, which he founded. "I really believe I'm hearing from the Lord it's going to be like a blowout election in 2004."

I read what Robertson is hearing from the Lord not as a rather odd claim about similarity (that something unspecified is going to be similar to a blowout election in 2004) but as a claim that a blowout election will occur, with a hedge on the NP a blowout election. So assuming Pat's report accurately reflects what God saith unto him, it seems the Lord may have said "It's going to be, like, a blowout election in 2004, Pat." (I've added the commas that are normally used around the hedging like. The commas would not be discernible in the almighty's speech, of course, and were not included in USA Today, which is a tertiary source.)

One might have thought that God would be a user of the alternate educated-English phrase if you will, but apparently not. God seems to use like in this sort of hedging function. This may indicate, surprisingly, that God uses a younger-generation dialect in his communications with the older generation, or it may indicate a preference for communicating with people in their native dialect. We should be grateful to Pat, because linguists don't often get to do this sort of dialect philology on the speech of the Lord, even at second remove.

[Revised January 4th, 2004.]

Posted by Geoffrey K. Pullum at 07:15 PM

Lolita's revenge

I suspect that somewhere, Volodya is enjoying this:

"It's like if Jeffrey Dahmer started a band with Ed Gein, Charles Manson, Elizabeth Bathory, Ozzy Osborne, and Vladimir Nabokov, except instead of playing music they just played EVIL."

I wonder if anyone on LiveJournal will ever form a fantasy rock band that includes Roman Jakobson?

Posted by Mark Liberman at 06:38 PM

Dwarves vs. dwarfs

John O'Neil wrote to point out an error in a recent post:

It's a minor sidelight to your recent post on Langugage Log on the "Theology of Phonology" to consider the plural of "dwarf". Without discussion, you firmly come down on the side of "dwarves", as does everyone who I have ever known actually to employ the plural of "dwarf".

Actually, John is far too kind. If I'd thought for a second, I'd have realized that Disney has "Snow White and the Seven Dwarfs", astronomers talk and write about "white dwarfs", and so on. Google has 615,000 "dwarfs" to 424,000 "dwarves". Dwarf should definitely go in the category of final-f words with variable plurals.

John goes on to write:

However, there is an urban legend, which might actually be true, that
"dwarves" was a invention of Tolkien's in "Lord of the Rings", as an
analogy to "elves" and with the desire not to slight the dwarven race
by giving them a more regular plural form. Before that, the story goes,
it was "dwarfs" since Middle English, and before that "dwarrows".

We can't completely blame Tolkien. The OED has both in the first half of the 19th century:

1818 W. TAYLOR in Monthly Mag. XLVI. 26 The history of Laurin, king of the dwarves. 1834 LYTTON Pilgrims of Rhine xxvi, The aged King of the Dwarfs that preside over the dull realms of lead.

A quick scan does suggest that google hits for "dwarves" tend more towards the fantasy realm, whereas the hits for "dwarfs" include the Disney animation, astronomy sites, small humans and plants, and the odd rock band. There are some fantasy usages for "dwarfs", but it looks like astronomers never use "dwarves".

The OED cites a bewildering variety of spellings from various periods:

duer, dweor, dweorh, dwæruh, dweru, dwer, dwere, dwergh, dwargh, dwarghe, duergh, dwerk, duerch, duerche, dorche, droich, dweruf, dwerf, dwerfe, dwerff, dwerffe, dwrfe, dwarfe, dwarff, dwarffe, dwearf, dwarf, duerwe, durwe, dwarw, dwerwh, dwerwhe, dwerwe, dwerowe, duorow, dwery, duery, dueri

I don't know enough about the history of English spelling to be able to figure out what range of sound patterns lie behind that list.

[Update: Bill Poser points out another issue -- spelling and sound don't necessarily correlate here...

For me the plural of <dwarf> is [dworvz], no two ways about it.
I consider [dworfs] outright error, even in other people's speech.
(Of course I acknowledge that there may be other dialects. What I
mean is that I will not accept [dworfs] as a possible variant within
what I consider my own dialect of English. This contrasts with, e.g.,
[rufs]. I myself have both [rufs] and [rUvz] and somebody ceteris paribus
consider someone who has either one to be a speaker of my own dialect.)

When I read your most recent post, at first I didn't get it. The reason
is that I read <dwarfs> as [dworvz]. For me, the <fs> spelling doesn't
necessarily indicate that the word is to be pronounced [fs]. In some
cases, I consider both spellings acceptable, e.g. <dwarfs> or <dwarves>.
In others, I use only one spelling but still have both pronounciations.
I write only <roofs>, *<rooves>, but still say both [rufs] and [rUvz].

He's right. I agree with his judgments about "roofs" -- that's the only way to spell it, but I can pronounce it either way, likewise with the correlation between vowel quality and consonant voicing. But I have a different pattern with "dwarfs" -- because I can pronounce the plural either way, and because there are two spellings available, I guess I've always assumed that "dwarfs" was only pronounced with [f] and "dwarves" was only pronounced with [v].]

[Update #2: Daniel Ezra Johnson writes to point out that Tolkien discusses the plural of "dwarf" at some length in Appendix F of LOTR:

"It may be observed that in this book as in The Hobbit the form dwarves is used, although the dictionaries tell us that the plural of dwarf is dwarfs. It should be dwarrows (or dwerrows), if singular and plural had each gone its own way down the years, as have man and men, or goose and geese. But we no longer speak of a dwarf as often as we do of a man, or even of a goose, and memories have not been fresh enough among Men to keep hold of a special plural for a race now abandoned to folk-tales, where at least a shadow of truth is preserved, or at last to nonsense-stories in which they have become mere figures of fun. But in the Third Age something of their old character and power is still glimpsed, if already a little dimmed: these are the descendants of the Naugrim of the Elder Days, in whose hearts still burns the ancient fire of Aule the Smith, and the embers smoulder of their long grudge against the Elves; in in whose hands still lives the skill in works of stone that none have surpassed.

It is to mark this that I have ventured to use the form dwarves, and so remove them a little, perhaps, from the sillier tales of these latter days. Dwarrows would have been better; but I have used that form only in the name Dwarrowdelf, to represent the name of Moria in the Common Speech: Phurunargian_ For that meant 'Dwarf-delving', and yet was already a word of antique form. But Moria is an Elvish name, and given without love..."

Daniel adds that "this is told from a perspective within tolkien's mythology, and i'm not sure if he's really making the claim either that he invented the form "dwarves", or that in real English the expected plural would have been "dwarrows" (and i'm not enough of a historical linguist to answer that, but it seems at least partially backed-up by some of the OED spellings you uncovered)."

Maybe someone who knows about the history of English can clarify this.

Anyhow, my original point stands. The standard relationship between singular and plural pronunciations of English nouns ending in /f/ is inconsistent and indeed variable. Eliminating all the [-vz] plurals would make the system more consistent and easier to learn, but it would be a distinctly non-standard way of talking.]

Posted by Mark Liberman at 01:41 PM

You say Nevada, I say Nevahda

President George W. Bush has a language problem. At least, people who don't like him see this as a point where he's vulnerable, and they keep the journalistic spotlight focused on it, just as people who didn't like President William J. Clinton kept the spotlight on what they saw as his vulnerabilities.

In both cases, I find that the intense scrutiny makes it hard to evaluate the issues. The focus on Clinton's "Whitewater" transactions seemed so wildly out of proportion to the facts, and so clearly motivated by political animus, that at a certain point I simply starting ignoring the whole sordid business. Throw in a few tens of millions of dollars worth of high-powered investigators with subpoena powers, and you can cast a few financial shadows on anybody -- or so I reckoned.

I've started to feel the same way about Bush's linguistic miscues. You can make any public figure sound like a boob, if you record everything he says and set hundreds of hostile observers to combing the transcripts for disfluencies, malapropisms, word formation errors and examples of non-standard pronunciation or usage. It's even easier if the critics use anecdotes based on the perceptions and verbal memories of equally hostile listeners. And the whole thing has crossed some kind of line when you can make the AP wire by citing him for using a widely accepted pronunciation, like Nevada with the stressed vowel of cod instead of cad.

It's interesting to read through Slate magazine's list of Bushisms, which Jacob Weisberg has turned into a small industry over the past four years. Some of the citations are from broadcasts or other recordings that are subject to checking: "Kosovians can move back in."—CNN Inside Politics, April 9, 1999. Others appear to be journalistic anecdotes of uncertain authority: "Keep good relations with the Grecians."—Quoted in the Economist, June 12, 1999; "If the East Timorians decide to revolt, I'm sure I'll have a statement."—Quoted by Maureen Dowd in the New York Times, June 16, 1999.

It's possible that W. applied a culpable consistency in the derivation of ethnonyms. It's also possible that he made one mistake of that kind, replacing Kosovars with Kosovians, and some journalists started kicking it around over drinks -- "wow, I wonder if he thinks the Greeks are the Grecians" -- "I bet he says Grecians" -- "I heard that he said 'we need to keep good relations with the Grecians'" ... Anyone who thinks this couldn't happen needs to pay some attention to what journalists do to quotes even in friendly contexts, or how completely false stories -- like the notion that Bush was pictured holding a plastic turkey in Iraq last Thanksgiving -- get created, picked up and discussed even in the case of fully recorded events.

In many of the other cases, the cited examples seem well within the range of expected human error. Which of us could stand up to a similar level of linguistic scrutiny? Robert Beard, the CEO of yourDictionary.com, is a highly educated man and a trained linguist. He writes clearly and forcefully, and he's won many teaching awards, so I'm confident that he speaks well, though I've never met him. Given his training and his career choices, I'm sure that his English word knowledge and spelling abilities are far above the norm. Still, his four-paragraph note to me about presidential pronunciation problems contained three potentially embarrassing typographical errors. The first error was a switch of their and there, which he caught and corrected when I asked him for permission to post the note on this site. The other two errors were missed in his no doubt cursory proof-reading, and I didn't notice them either before I posted what he wrote. He has "spectogram" for spectrogram, and he cited the president's "agregious solecisms" when he meant to write egregious solecisms. I'm absolutely certain that Bob knows how to spell spectrogram and egregious. These were slips of the fingers, though perhaps slips guided by sound patterns, as such things often are. In another context -- in a note from George Bush, for example -- a hostile observer might take such slips as evidence of linguistic ineptness.

Bonus dormitat Homerus. Let's accept that W is no Homer, and move on.

Since that's not likely to happen, I have another idea. I'll buy dinner for Jacob Weisberg, if he'll let me record a couple of hours of convivial conversation about speech and language, and then examine the transcripts carefully for Weisbergisms ...

Posted by Mark Liberman at 08:33 AM

Mispronunciation and autodidacts

There always seems to be a tacit assumption hovering in the background to journalistic interest in mispronunciations (see e.g. the list of Bush's mispronunciations put out by yourDictionary.com, recently discussed by Mark Liberman), and it is this: people who mispronounce English words are taken to be in some way culpable. If not morally blameworthy (if you were a good and trustworthy person you would take the trouble to get things right), then at the very least they are taken to be slovenly or unintelligent. But Barbara Scholz reminds me of a very simple point that we should not forget. A person who mispronounces some words may be more deserving of our respect, rather than less. Mispronunciations are a characteristic feature of the speech of autodidacts -- people who have had to teach themselves.

Not everyone has a family background that gave them the advantages of hearing words like genre and hegemony and autodidact passed around with the silver butter dish at the dinner table. Some nonetheless buckle down and improve themselves by reading books, and when they come upon a new word for the first time in print, they guess a plausible pronunciation. The spelling system of English gives precious little reliable guidance on pronunciation to people who've seen a word but not yet heard it. (Why wouldn't precious and specious rhyme, for example?) And even dictionaries, with their often idiosyncratic pronunciation keys, don't always make it easy to figure out what the word is supposed to sound like. Calling someone ignorant is only reasonable they were supposed to know and could have known. So don't laugh at someone who mispronounces words until you know a bit more about their origins, not only regional (a point that Mark makes) but also with respect to class and family educational level. Don't mock someone until you've walked a mile in the shoes they wore on the long road of lexical acquisition.

Note added later: Just to be clear, let me state that I am not in any way suggesting that the above might apply to George W. Bush. Mostly he gets lampooned for regionalisms that are not really properly called errors at all, as Mark noted. But take the case of "Anzar" for "Aznar". It seems to me perfectly reasonable to hold the opinion that a Yale graduate from a highly privileged background who grew up as the son of a legislator and later a President, and who purports to speak Spanish, and who holds the title of President of the United States himself, should be assiduous in learning the names of world leaders, at the very least from allied Spanish-speaking countries. One might even take the view that if he were a good and trustworthy person he would have taken the trouble to get it right. Or then again, maybe not; Mark is quite right: it is really very hard to make judgments on this sort of thing in such a polarized world, where every syllable the President utters is tracked and scrutinized for goofs and blunders, but your conversations, and my lectures, and Robert Beard's emails, are usually not.

Posted by Geoffrey K. Pullum at 01:41 AM

January 02, 2004

"Nucular" solecism traced to 200 B.C.

Inspired by Jim Bisso's ardor in tracing "more unique" to Plautus, I've discovered that the same author is responsible for the whole "nucular" flap. Well, maybe this is a bit unfair, since he is just the earliest extant source for the original form of the base word nucleus, in which there was in fact an extra u between the c and the l ...

According to the OED, the etymology of nucleus is

< classical Latin nucleus (also nuculeus) kernel, inner part, in post-classical Latin also core of a comet

According to Lewis and Short:

nu^cleus (nuculeus ), i, m. [for nuculeus, from nux] , a little nut.

e nuce nuculeum qui esse vult, frangit nucem, he who would eat the kernel of a nut breaks the nut, i. e. he who desires an advantage should not shun the labor of earning it, Plaut. Curc. 1, 1, 55: nuculeum amisi, retinui pigneri putamina, I have lost the kernel and kept the shell, id. Capt. 3, 4, 122 .--

I suggest the last quotation as a motto for language moralists everywhere.

Posted by Mark Liberman at 05:56 PM

"Same procedure as every year, James"

Traditional end-of-year video experiences: in the U.S., It's a Wonderful Life; in Germany, Dinner for One.

Both are heart-warming and sentimental, both involve reliving the past, but in very different ways. It's silly (but tempting) to generalize about cultural differences from two samples with N=1 ...

There are two bits of linguistic relevance.

First, Dinner for One is doubly ritualistic. Within the sketch, the conversation conveys no new information to the participants -- "Same procedure as last year, Miss Sophie?" "Same procedure as every year, James". In the real world, the sketch also conveys no new information to most of its viewers, since it has apparently shown repeatedly over the last few days of every year for the past 40 years, and Jürgen Meier-Beer, NDR's head of light entertainment, is quoted as saying that "one in every two viewers in our area will watch it at some point on New Year's Eve."

Second, it's extraordinary that this end-of-year tradition in Germany -- which is hardly known in the UK, and I think even more completely unknown in the U.S. -- is in English!

[I learned about Dinner for One from Margaret Marks at the always-interesting Transblawg.]

Posted by Mark Liberman at 10:26 AM

The theology of phonology

In a previous post, I quoted a note from Robert Beard in which he came out four-square as a language moralist, and identified what is "proper" and "right" in pronunciation with what is "consistent".

Now, I'm in favor of language standards. Many of my colleagues consider me dangerously right-wing on this question. However, I think it's unwise to use ethical metaphors to justify arbitrary cultural norms. If a man should wear a necktie in court, it's not because there is something intrinsically immoral about an open collar.

In contrast, Prof. Beard argues eloquently that the standards he is defending are not arbitrary social conventions, but rather consequences of basic linguistic principles. In particular, he suggests that "proper" pronunciation is not a matter of how well-spoken people talk, but rather a question of what is "consistent", by which he means something like "characterized by regularity in the relations among the forms, sounds, and meanings of words". Alas, if morality requires consistency in this sense, then we are all deep-dyed linguistic sinners, every one of us.

Here's what he wrote:

We do not consider language a democratic process here at yourDictionary. So, even if the majority of US citizens pronounce "nucleus" [nyu-klee-us] and "nuclear" [nyu-ku-lar], it doesn't make it phonologically right, which we take to mean simply "consistent." Generally, we simply point out the inconsistency and tell our visitors they may be consistent or talk like the folks around them, whichever pleases them.

One of our most popular projects on our website is out "100 Most Often Mispronounced Words" which include both "nuclear" and "jewelry." It is popular because the educated people who visit our site are convinced that there are proper and improper ways to pronounce words and they, by and large, prefer the former.

There are plausible arguments for enforcing consistency in syntax, though it can be tricky to decide what the principles should be. In semantics, the truth should certainly be something we can calculate without taking a poll. But in morphophonemics -- the relationship between the form and sound of words -- the idea that standards are determined by fundamental laws is a surprising one. To see why, let's take a simple example from standard English.

The plural of loaf is loaves, the plural of thief is thieves. However, the plural of oaf is not oaves, and the plural of chief is not chieves.

Quite a few words ending in /f/ work like loaf, voicing the final /f/ in the plural: calf, dwarf, half, hoof, knife, leaf, life, loaf, scarf, self, sheaf, shelf, thief, wolf.

A somewhat larger number of words work like oaf, letting the final /f/ stand unchanged in the plural: belief, chief, clef, cliff, coif, cuff, gaff, goof, handkerchief, kerf, midriff, muff, oaf, pontiff, proof, puff, reef, relief, riff, ruff, sheriff, skiff, sniff, snuff, standoff, stiff, tariff, tiff, whiff. As far as I know, all words where final /f/ is spelled /ph/ or /gh/ also fail to voice the final consonant of the stem in the plural: epitaph, glyph, graph, morph, nymph, seraph, sylph, triumph, etc.; and cough, laugh, rough, tough, trough.

Some words are variable: in my speech, hoof, roof, beef, turf, and wharf sometimes pluralize thiefishly and sometimes chiefishly. In a few cases, it depends on what you mean. If a staff is a stick, its plural is staves, but if a staff is a set of employees, its plural is staffs.

Any fair-minded observer will agree, I think, that we have here an inconsistent relationship between word structure and word pronunciation. But is this a moral problem? Should you call for Sancho Panza, mount Rocinante and ride off to restore consistency to the plural of English nouns ending in /f/?

Well, I don't see any volunteers, at yourDictionary or elsewhere. The most obvious reason is that this sort of partial inconsistency -- what Mark Seidenberg calls quasi-regularity -- is ubiquitous in English and in every other language. Enforcing regularity in morphophonemics is like trying to clean sand off the beach:

The Walrus and the Carpenter
Were walking close at hand;
They wept like anything to see
Such quantities of sand:
"If this were only cleared away,"
They said, "it would be grand!"

"If seven maids with seven mops
Swept it for half a year.
Do you suppose," the Walrus said,
"That they could get it clear?"
"I doubt it," said the Carpenter,
And shed a bitter tear.

There is an interesting and important controversy among psycholinguists about where this quasi-regularity comes from and what it means. James McClelland, Mark Seidenberg and others think that quasi-regularity arises because the (partial) regularities are emergent properties of connectionist networks; Steven Pinker, Michael Ullman and others think that quasi-regularity arises because there are two distinct and competing brain mechanisms whose functions overlap, one a (temporal/parietal-lobe) semantic memory system for looking things up, and the other a (frontal-lobe and basal ganglion) procedural memory system for figuring things out.

In both theories, human speech is east of morphophonemic Eden. There are forces leading to regularization and forces leading to exceptionality. If you think that consistency is next to godliness, both theories -- like the facts of language -- force you to confront phonological original sin. And with respect to the morality of inconsistent pronunciation, let him who is without sin cast the first stone.

Posted by Mark Liberman at 08:56 AM

January 01, 2004

The politics of pronunciation

I wrote to Robert Beard, CEO of yourDictionary.com, to draw his attention to the recent post in which I was critical of yourDictionary's list of alleged presidential mispronunciations. He was kind enough to send a thoughtful response, which I've quoted in full below, with his permission. His note struck me as a particularly clear presentation of some widely-held views on the politics of pronunciation.

Don't worry, you aren't giving us a hard time. Quoting Merriam-Webster as a lexical authority is considered an act of desperation at yourDictionary, since they constantly rake the gutters for changes that have been noted this week with no concern as to whether they will be there next week. Their new editions remove as many words that have arisen in the past 10 years as they add. We admittedly stretched too far for "Nevada" but all we have said about it is that it made the news as a mispronounced word, which seems to be the case.

We do not consider language a democratic process here at yourDictionary. So, even if the majority of US citizens pronounce "nucleus" [nyu-klee-us] and "nuclear" [nyu-ku-lar], it doesn't make it phonologically right, which we take to mean simply "consistent." Generally, we simply point out the inconsistency and tell our visitors they may be consistent or talk like the folks around them, whichever pleases them.

One of our most popular projects on our website is out "100 Most Often Mispronounced Words" which include both "nuclear" and "jewelry." It is popular because the educated people who visit our site are convinced that there are proper and improper ways to pronounce words and they, by and large, prefer the former. We tell them what is consistent with the facts of language (without showing them sound spectograms) and explain regional dialectalisms as such.

However, it is a fact that outside the given region, the use of regionalisms can be economically and politically costly. If the only price President Bush has to pay for the agregious solecisms he is known for is the tongue-in-cheek sparring he gets from us at the end of the year, he should be a happy guy.


I'm impressed. These are strong opinions, strongly stated. Merriam-Webster is upbraided for gutter lexicography; linguistic democracy is firmly rejected; a bright line is drawn between "proper" and "improper" pronunciation, with morphophonemic consistency as a requirement for propriety; and regional variants are placed firmly on the "improper" side of the boundary.

I'm not competent to evaluate M-W's practices, but after some reflection, I think I disagree with all the rest of it.

In brief, my opinions are as follows. Standards depend on usage. The key question is "whose usage?", and there is more than one reasonable answer. It's a bad idea to use metaphors drawn from ethics, law and medicine in talking about linguistic norms: non-standard speech is neither improper, lawless nor degenerate, it's just non-standard. Morphophonemic consistency is at best partial, as a matter of historical fact across languages (standard and otherwise), and so it's not appropriate to try to turn it into a matter of principle. Regional standards ought to be given an appropriate level of respect, for reasons of social as well as political pluralism.

These are opinions, not facts, except perhaps for the question of morphophonemic consistency, about which I'll say more in another post. Reasonable people hold a variety of opinions on these matters. There are interesting parallels to other areas of political philosophy -- but at least no one is suggesting a constitutional amendment to defend traditional morphophonemic values in the pronunciation of nuclear. . .

Posted by Mark Liberman at 11:54 PM


I like Chinese character idioms and other people seem to like them too, so here's one for the new year. The last day of the year in Japanese is /o:misoka/, which is literally "big 30th day". The analysis is /o:/ "big", miso "thirty" and /ka/ "day", where /miso/ can be decomposed into /mi/ "three" and /so/, an allomorph of /to:/ "ten". /miso/ is the archaic native Japanese word for 30. At one time, /misoka/ was the term for the 30th of any month, and the 31st was called /o:misoka/. Nowadays, one says /sanzyu:itiniti/, with all of the components borrowed from Chinese, for the 31st of other months.

[o:misoka] is written like this: 大晦日. The first character means "big" and has the native reading /o:/. The last character means "day" and has /ka/ as one of its native readings. It's the middle character 晦 that is an idiom. The number thirty is usually written like this: 三十, with two characters, "three" followed by "ten". There is also a single, shorthand, character meaning "thirty" 卅. So the middle character in /o:misoka/ isn't the character for "thirty".

Actually, the character 晦 has the basic meaning "darkness". Its Sino-Japanese reading is /kai/ as in 晦朔 /kaisaku/ "the last day of one month together with the first day of the next". Its native reading is [kura], which shows up in the verb /kuramasu/ 晦ます "to disappear, give the slip to". It is also read /tugumori/ "dark of the moon" whence "end of the month". So the characters make sense. The old word for the 30th literally means "dark day", "end of the month day". But the semantic decomposition reflected by the Chinese characters doesn't correspond to the morpho-phonological decomposition of the word.

Posted by Bill Poser at 01:37 AM