October 31, 2004

Marx: red or blue?

Karl Marx is traditionally a red. But recently, things have gotten kind of swapped around in the U.S., so that the red states are the ones that vote Republican. Anyhow, this post is not really about Karl Marx at all, it's about a method for estimating media bias by modeling citation frequencies, featured in some recent work by Tim Groseclose and Jeff Milyo. There was a lot of discussion of this work back in August, and I've seen several mentions recently, as the question of election coverage is debated.

You can follow the links to get the details, but G&M's own description of their basic method is as follows:

To compute our measure, we count the times that a media outlet cites various think tanks. We compare this with the times that members of Congress cite the same think tanks in their speeches on the floor of the House and Senate. By comparing the citation patterns we can construct an ADA score for each media outlet.

As a simplified example, imagine that there were only two think tanks, one liberal and one conservative. Suppose that the New York Times cited the liberal think tank twice as often as the conservative one. Our method asks: What is the estimated ADA score of a member of Congress who exhibits the same frequency (2:1) in his or her speeches? This is the score that our method would assign to the New York Times. [924K .pdf here]

Their full mathematical model assumes that every citer and every citee can be assigned a numerical position on a single political dimension (which they interpret as left-right), and that the "utility" that a citer receives from making a citation is the product of the citer's and the citee's coefficients plus a measure of the citee's overall authoritativeness plus an error term. Then the probability of citation choices can be predicted from these political and authoritativeness coefficients by a "multinomial logit". G&M start from an estimate of citers' politics (in this case, the ADA rankings of members of congress), use that to estimate the politics of citees (in this case, political think-tanks), and then work backwards to estimate the politics of another class of citers (in this case, media outlets).

I'm not a big fan of the idea that political opinions should be reduced to a coordinate on a single dimension, but even if we grant that point, the logic of their first step troubles me. I often cite people I don't agree with at all, and I think others do too. For example, in this post, I obviously felt that I gained "utility" by quoting Goebbels and Hitler, whose overall political opinions are very different from mine; on other occasions, Language Loggers have cited Stalin, Thomas Jefferson, Benjamin Franklin, Thomas Aquinas, Friedrich Hayek, and so on. But as I read G&M's model, it says that we derive maximum "utility" from citing the most extreme sources whose political polarity is the same as ours. This is hard for me to square, in common-sense terms, with my intutions about my own behavior, or with (what I think are) the observed facts of human discourse.

In particular, I suspect that the one-dimensional politics of a given source can't as a rule be well estimated from the political distribution of its citers. In order to explore this point -- in an entirely unscientific but still empirical way -- I decided to look at blogospheric citation of Karl Marx. I asked Technorati this morning over breakfast, and found 345 mentions of "Marx" in blogs over the past seven days. Ignoring Groucho (and any other non-Karls), I took the first dozen English-language citing sites as my highly unscientific sample.

There was one clearly left-wing site:

PolemicBlog: "That's an egalitarian dream even Marx never contemplated."

one site that is mostly about things like music and software, but seems to express somewhat left-of-center politics:

eric's site: "capitalism the way marx and engels said it would be. the world turns a blind eye on genocide to keep the wheels of commerce greased with high quality petroleum products."

one anti-collectivist but apparently also anti-Bush site:

H. Duthel: The bond between all citizens of the state, their common political will, is the result of a forced act of volition on the part of each individual who, in order to reach his or her goal of private advantage, also participates in an abstract and general will. "The separation of bourgeois society and the political state necessarily appears as a separation of the political member of bourgeois society, the citizen, from bourgeois society, his own actual, empirical reality, because as an idealist of the state he is a being who is completely distinct, different from, and opposed to his own reality" (Marx, Critique of Hegel's `Philosophy of Right', Cambridge University Press 1970, p.79).

one hyper-individualistic site that explicitly refuses to be assigned a coordinate in the left-right dimension:

hobopoet: "Liberals say we should end employment discrimination. I say we should end employment. Conservatives support right-to-work laws. Following Karl Marx's wayward son-in-law Paul Lafargue I support the right to be lazy. Leftists favor full employment. Like the surrealists -- except that I'm not kidding -- I favor full unemployment."

and two others with vaguer politics yet

short black: "Jake then passed off some of his communist hot air dressed as wisdom. 'Well, Marx wouldn't have existed if it wasn't for Engels. Engels paid for him to work. If it wasn't for that, Marx couldn't have done what he did. What you need is some rich guy who thinks you're really smart, some rich guy who thinks he's smart. Pay for you to do your writing.'"
kirei "Ja, I’m off to read Marx."

two sites that define their politics mainly in religious terms, but would probably count as right-wing in most people's categorization:

poachedfrog.com (Christian, pro-Bush): "Rousseau, Byron, Shelley, Hugo, even Freud and Marx were the standard bearers of this new secular notion that man is fully capable of self-redemption, that man is his own master, and that man is perfectible given the appropriate political environment, an environment devoid of kings and religion."
Anti-MDP (Islamist -- "We respect the rule of law and Islam is what we believe superior.") "[Bismarck] said that the best strategy for the Bolsheviks or the Communists was to prevent the Tsar from modernising the country. In this he agreed with Karl Marx. The best way to defeat a government is to not give it the space to reform."

and four politically-oriented sites that seem clearly to the right of center:

Desert Rat Ramblings: John Kerry demands "tax fairness for Americans." This is his euphemism for fleecing Americans who pay the highest taxes. What he calls "tax fairness," Karl Marx called wealth redistribution.
The World Through Juan's Eyes: The communists are crawling out of the woodwork in support of sKerry. ... "......as critics from Marx to Chomsky have pointed out ..."
Indiana Observer: Karl Marx coined the word “capitalism” in the mid 1800s, though in his “Communist Manifesto” he never really defines it.
Stambord: That other great manifesto, the Marx and Engels one, is of course also on the reading list.

But here is (the right-hand side of) G&M's equation describing the probability that citer m selects source j:

Only the numerator of this expression really matters to us here -- the sum in the denominator is the same for all sources, and is just a normalizing factor to ensure that the probabilities for each citer sum to 1.

In the numerator, aj is a measure of the authoritativeness or popularity of source j -- the bigger aj is, the more source j will be cited (by everyone). The term bj and cm are the political coefficients for source j and citer m. (The subscripted a, b and c terms in the denominator are have similar meanings -- the denominator means "sum over all sources for citer m".)

The B&M model assumes that the political center is 0 -- they happen to have "left" corresponding to positive values, as I understand their paper, so I'll do the same. Let's take Karl Marx's political coefficient bj to be (say) +10, and set his authoritativeness arbitrarily at 100. Then if citer m has political coefficient cm, his or her probability of citing Karl Marx will be proportional to exp(100 + 10*cm).

For a citer whose political coefficient cm is similar to Karl Marx's, this expression evaluates to exp(100 + 10*10) = exp(200) or about 7.2*10^86. For someone whose political coefficient is the opposite of Marx's (a political clone of Friedrich Hayek?), this expression evaluates to exp(100 + -10*10) = exp(0) = 1.

In each case, the resulting quantity will be normalized by the sum of the similar expressions for all the other possible citees, in order to get a predicted probability. But if the rest of the situation is reasonable, then as we along the political spectrum from right to left, the probability of citing Karl Marx is predicted to increase more or less monotonically.

The thing is, that's not my impression of how the world works; and I'll take the results of my little excursion into Technorati as a crude validation of my impression. These days, the people who are most likely to cite Karl Marx are right wingers. Or maybe there's a bimodal distribution, with the extreme left and right both more likely to cite him than people in the center are. Anyhow, I'm asserting here that the G&M model makes predictions in this case that are qualitatively wrong, not just quantitatively out of wack.

I'm fond, myself, of the scientific proverb (adapted from Picasso) that "a model is a lie that leads us to the truth." So the fact that G&M's model makes some qualitatively counterfactual predictions is not necessarily a reason to reject it. Depending on the real relation between the politics of citers and citees, and the empirical distribution of citers and citees in political space, their model might be leading us towards the truth, or it might not.

One difficult question is what the rhetorical content of "citing" a source is. The implication of G&M's model is that citing X is a a sign of political agreement with X, and thus the rhetorical context would be something like "As X showed, it's true that P." But sometimes people go out of their way to find support from those whose views they don't share: "Even X admitted that P". And there are other rhetorical frames entirely: "The evil ones have no shame: X just proposed that P"; or "When my opponent suggests that P, she is echoing the ideas of X"; or just "Here's something new: X said that P".

G&M discuss the question of what counts as a citation, and demonstrate that they're aware of the issues:

We looked for instances where the legislator cited a view or a fact stated by a member of the think tank. We then counted the sentences in the citation. [...]
Along with direct quotes, we sometimes included sentences that were not direct quotes. For instance, many of the citations were cases where a member of Congress noted “This bill is supported by think tank X.” [... ]

Sometimes a legislator or a media outlet noted an action that a think tank had taken—e.g. that it raised a certain amount of money, initiated a boycott, filed a lawsuit, elected new officers, or held its annual convention. We did not record such cases in our data set. However, sometimes in the process of describing such actions, the reporter or member of Congress would quote a member of the think tank, and the quote revealed the think tank’s views on national policy, or the quote stated a fact that is relevant to national policy. If so, we would record that quote in our data set. For instance, suppose a reporter noted “The NAACP has asked its members to boycott businesses in the state of South Carolina. `We are initiating this boycott, because we believe that it is racist to fly the Confederate Flag on the state capitol,’ a leader of the group noted.” In this instance, we would count the second sentence that the reporter wrote, but not the first.

Also, we omitted the instances where the member of Congress or journalist only cited the think tank so he or she could criticize it or explain why it was wrong. About five percent of the congressional citations and about one percent of the media citations fell into this category.

In the same spirit, we omitted cases where a journalist or legislator gave an ideological label to a think tank (e.g. “Even the left-wing Urban Institute favors this bill.”). The idea is that we only wanted cases were the legislator or journalist cited the think tank as if it were a disinterested expert on the topic at hand. About two percent of the congressional citations and about five percent of the media citations fell into this category.

But by my reading of their rules, four of the six right-of-center mentions of Karl Marx still count as citations -- look at them yourself and see what you think. It's clear of course that those sources are not in accord with Marx's ideas, but you can't necessarily tell from the immediate context, at least not in any simple way that is not circular with respect to the goal of quantifying political stances. And in the case of media citations, it's often even more obscure whether the cited position is being presented in an approving, disapproving or neutral way.

Whatever the truth about blogs citing Karl Marx, I feel that it remains problematical to try to determine the politics of a source by looking at who cites it. One of the most widely cited Language Log posts was Geoff Pullum's takedown of Dan Brown's The Da Vinci Code. A majority of the links (says my impressionistic memory) were from Catholic sites. Those folks have historical and theological beefs with Dan Brown, and so they were happy to link to Geoff's piece, which was strongly negative on linguistic and literary grounds. We can't conclude anything from this about Geoff's attendance at Mass or his feelings about any theological points whatever.

[Note from Geoff Pullum: Holy smoke! You can say that again! Don't ever read off any opinions from my citation list. I cite people I utterly despise — Strunk and White, to name but two.]

[lFurther note from Mark Liberman: G&M wouldn't count your references to Strunk and White as "citations" in their sense, I think, because your criticisms of them are always front and center when you cite them. However, when a religious website references your Dan Brown post approvingly, this does seem to count, by the lights of G&M's model, as evidence for your position on (say) the continuum from secular to regligious. And if we ran the numbers, I think it might turn out that your estimated position is rather far out on the religious end of the scale. Independent of any direct evidence on the subject, it seems imprudent to me to draw such an inference simply because you happen to criticize effectively on linguistic grounds a writer whom many committed Christians dislike on theological grounds. ]

Posted by Mark Liberman at 04:20 PM

Femina floresiensis?

Jemima Lewis' observation that female "hobbits" are being given short shrift, recapitulated by Mark, is well taken, but the implicit opposition between Homo floresiensis and Femina floresiensis is not. It presupposes that homo means "man" as opposed to "woman", which it does not. Latin homo means "human being, person". It is gender-neutral. The word that is opposed to femina "woman" is vir "man". Homo appears to those who don't know Latin as a male-specific term because English conflates "male human being" and "human being" in man, but this is a fact about English, not about Latin.

The same is true in Greek. γυνή [gyne] "woman" is opposed to ἀνήρ [aner] (combining form [andro]) "man". ἄνΘρωπος [antʰro:pos] means "human being", without reference to gender.

Update: Gene Buckley points out that one newspaper, the New York Times, did depict females in the illustration accompanying this article.

Posted by Bill Poser at 10:08 AM

F. floresiensis

Jemima Lewis in the Telegraph makes some telling points about journalistic iconography and scientific nomenclature:

The discovery of Homo floresiensis - the tiny, hairy humans who once lived on the Indonesian island of Flores - is great news for anthropologists, but disheartening for feminists. It was, after all, a woman whose 18,000-year-old skeleton, buried in a limestone cave, led scientists to this breakthrough. Yet the artists' impressions of the "Hobbit human" that appeared in all the newspapers was male. Very male, in fact, with swinging testicles, a six-pack and a slaughtered animal in his arms.

This is most unjust. By the sounds of things, Hobbit women were rather more vibrant than the men. The villagers of Flores, who swear there were little people living on the island until the 18th century, even have a feminine name for them: Ebu Gogo, meaning "grandmother who eats everything". The women feature much more heavily in local legend - especially their breasts, which were so pendulous that they had to be slung over the shoulder for ease of movement.

But the breasts are not the point. Whatever her body shape, the female Hobbit deserved to get her picture in the newspapers. She alerted scientists to the existence of a vanished race. For that, we should salute her: Femina floresiensis.

Posted by Mark Liberman at 07:02 AM

October 30, 2004

Language, thought and counting in Amazonia

In the Oct. 15 issue of Science: two different groups of Amazonians, same basic result. Peter Gordon's article on the Pirahã was pre-published online on August 19, and elicted a lot of comment at the time (Language Log postings here, here, here, here and here). His article's formal publication (in the October 15 issue) is matched with another report, this one on the Mundurukú: Pierre Pica, Cathy Lemer, Véronique Izard, and Stanislas Dehaene, "Exact and Approximate Arithmetic in an Amazonian Indigene Group", as well as a Viewpoint piece by Rochel Gelman and Randy Gallistel, "Language and the Origin of Numerical Concepts".

As Gelman and Gallistel put it,

The research findings indicate that, whether or not humans have an extensive counting list, they share with nonverbal animals a language-independent representation of number, with limited, scale-invariant precision. What causal role, then, does knowledge of the language of counting serve? We consider the strong Whorfian proposal, that of linguistic determinism; the weak Whorfian hypothesis, that language influences how we think; and that the "language of thought" maps to spoken language or symbol systems.

Gelman and Gallistel discuss the background of theories and experiments about language and thought in general, and language and numerical cognition in particular, and conclude that

reports of subjects who appear indifferent to exact numerical equality even for small numbers, and who also do not count verbally, add weight to the idea that learning a communicable number notation with exact numerical reference may play a role in the emergence of a fully formed conception of number. The challenge now is to delineate that role.

Unfortunately, if you don't have an a subscription to Science, you can't read what they have to say, nor can you read Gordon's work, nor the work of Pica et al. [Update: here is a .pdf of the Gelman and Gallistel viewpoint piece, and here is a .pdf of the Pica et al. article. You may be able to get a .pdf of the Gordon article here -- if that doesn't work, try clicking through from Peter Gordon's web page. See note at the bottom of this post for more links]

Peter Gordon's stuff has been fairly extensively discussed here already, so I'll just say a few words here about the Mundurukú. Here's where they live:

(A map showing where the Pirahã live can be found here.) The language of the Mundurukú is Ethnologue code MYU, with about 2,000 speakers, fairly close to Guraní with about 5,000,000 speakers. It has words for one, two, three and four, and "hand" is used for five or so:

In estimating quantities, according to the article,

The Mundurukú did not use their numerals in a counting sequence, nor to refer to precise quantities. They usually uttered a numeral without counting, although (if asked to do so) some of them could count very slowly and nonverbally by matching their fingers and toes to the set of dots. Our measures confirm that they selected their verbal response on the basis of an apprehension of approximate number rather than on an exact count. With the exception of the words for 1 and 2, all numerals were used in relation to a range of approximate quantities rather than to a precise number. For instance, the word for 5, which can be translated as "one hand" or "a handful," was used for 5 but also 6, 7, 8, or 9 dots. Conversely, when five dots were presented, the word for 5 was uttered on only 28% of trials, whereas the words for 4 and "few" were each used on about 15% of trials. This response pattern is comparable to the use of round numbers in Western languages, for instance when we say "10 people" when there are actually 8 or 12. We also noted the occasional use of two-word constructions (e.g., "two-three seeds"), analogous to references to approximate quantities in Western languages. Thus, the Mundurukú are different from us only in failing to count and in allowing approximate use of number words in the range 3 to 5, where Western numerals usually refer to precise quantities.

I won't give the details of their experiments here, nor the background literature against which they interpret it. A description of some of their experimental techniques can be found in this 10/21 Guardian article by Brian Butterworth. However, I'll reproduce their conclusions, omitting the footnotes:

Together, our results shed some light on the issue of the relation between language and arithmetic. They suggest that a basic distinction must be introduced between approximate and exact mental representations of number, as also suggested by earlier behavioral and brain-imaging evidence and by recent research in another Amazon group, the Pirahã. With approximate quantities, the Mundurukú do not behave qualitatively differently from the French controls. They can mentally represent very large numbers of up to 80 dots, far beyond their naming range, and do not confuse number with other variables such as size and density. They also spontaneously apply concepts of addition, subtraction, and comparison to these approximate representations. This is true even for monolingual adults and young children who never learned any formal arithmetic. These data add to previous evidence that numerical approximation is a basic competence, independent of language, and available even to preverbal infants and many animal species. We conclude that sophisticated numerical competence can be present in the absence of a well-developed lexicon of number words. This provides an important qualification of Gordon's version of Whorf's hypothesis according to which the lexicon of number words drastically limits the ability to entertain abstract number concepts.

What the Mundurukú appear to lack, however, is a procedure for fast apprehension of exact numbers beyond 3 or 4. Our results thus support the hypothesis that language plays a special role in the emergence of exact arithmetic during child development. What is the mechanism for this developmental change? It is noteworthy that the Mundurukú have number names up to 5, and yet use them approximately in naming. Thus, the availability of number names, in itself, may not suffice to promote a mental representation of exact number. More crucial, perhaps, is that the Mundurukú do not have a counting routine. Although some have a rudimentary ability to count on their fingers, it is rarely used. By requiring an exact one-to-one pairing of objects with the sequence of numerals, counting may promote a conceptual integration of approximate number representations, discrete object representations, and the verbal code. Around the age of 3, Western children exhibit an abrupt change in number processing as they suddenly realize that each count word refers to a precise quantity. This "crystallization" of discrete numbers out of an initially approximate continuum of numerical magnitudes does not seem to occur in the Mundurukú. [emphasis added]

I'll take this as confirmation that my analogy to throwing was not completely nuts (though I certainly admit that language, internal or external, is likely to be more involved in the development and deployment of skilled arithmetic than in the comparable aspects of skilled throwing). "Language" and "thought" are not the only actors in this cognitive drama.

[Update: here are some videos of Mundurugú finger and seed counting, and a description of the videos.]

[Update #2: David Nash emailed that

..from your posting the description of Munduruku fits pretty well with what I understand to be a situation in Australian (Aboriginal) language communities, or at least as it was before schooling in numeracy...

and added that Bill McGregor is writing a review of Australian number classification. ]

[Update #3: Here is an excellent page discussing research on arithmetic and the brain at Stan Dehaene's Unité de Neuroimagerie Cognitive (Cognitive Neuroimaging Unit), with many links including to the new Science paper. ]


Posted by Mark Liberman at 03:30 PM

Is the Future Tense for Losers?

Economists Rosa Karapandza and Milos Bozovic of the Universitat Pompeu Fabra in Barcelona have written a paper [pdf] in which they report that there is a statistically significant negative correlation between the use of the future tense in companies' annual reports (10-K filings with the US Securities and Exchange Commission) and their performance. They report a similar relationship between the use of the future tense by US presidential candidates in debates and the results of the election: in every election since 1960, the candidate who used the future tense more frequently lost the election. On this basis, they predict a victory for Bush.

In the case of the company reports, one can imagine that companies that are doing well focus on their accomplishments, which they describe in the past tense, while weaker companies talk about what they are going to do, in the future tense. What might account for this effect in presidential elections, if it isn't just a fluke, is not obvious to me. It doesn't seem to reflect the advantage of the incumbent: In 1960, Nixon, who was Vice-President, used the future far more than Kennedy and lost; so did incumbent President Ford, who lost to Carter in 1976. Bush Senior was the incumbent in 1992 when he used the future more than Clinton and lost. On Tuesday we'll see how this holds up.

Update 2004/10/31: The Greenbay Packers beat the Washington Redskins today. This is supposed to predict that Kerry will win the election. We'll see if football is a more accurate predictor than use of the future tense.

Posted by Bill Poser at 01:34 AM

October 29, 2004


Google's first hit for mosh is now the GNN page for Eminem's get-out-the-vote video entitled "Mosh". The same video is also now #1 among MTV's "Hot 5 Videos", though I'm not sure to what extent this reflects popularity among its intended audience as opposed to the effect of exhortations by Kos and others.

I'm not an Eminem fan, in general, but to my surprise, I liked 8 Mile. So I was wasn't sure what to expect from Mosh. What I got was a mixed bag of Indymedia-style political clichés -- "blood for oil", demon cops -- combined in a video-game animation with some less obvious lines

If you don't understand don't even bother to ask
A father who has grown up with a fatherless past

and a few really puzzling ones:

Put your faith and your trust as I guide us through the fog
Till the light at the end of the tunnel...

The center of it is the mosh/march equivalence:

we gonna march through the swamp, we gonna mosh through the marsh

and the transformation of both mosh and march into a line to wait to vote -- though not until the tunnel in a foggy marsh turns into a debate in a desert storm:

and as we proceed to mosh through this desert storm
in these closing statements if they should argue
let us beg to differ as we set aside our differences
and assemble our own army to disarm this weapon of mass destruction
that we call our president for the present
and mosh for the future of our next generation to speak and be heard
Mr. President Mr. Senator
do you guys hear us?

Like a dreamland version of Grand Vote Auto, with a plot by the team of William Gibson and J.R.R. Tolkien. But did anyone else think that the grimly purposeful crowd in black hoodies was a little creepy? Like, black shirts?

I was curious about the history and etymology of the word mosh. The OED isn't sure, suggesting that it's "[App. a variant of MASH v.]" But the first two citations are

1983 Village Voice (N.Y.) 18 Jan. 30/1 Slam dancers..agree that it is ‘violence within friendship’... Besides, ‘you're so into the music and dancing that you don't think about getting moshed.’
1985 ‘STORMTROOPERS OF DEATH’ Milano Mosh (song) in Speak English or Die (record sleeve), We mosh, until we die, We mosh, until you try. You think that you can try, But can you do the Milano Mosh.

The full lyrics to Milano Mosh are here. The band Stormtroopers of Death seem to have been about as nice as you'd imagine, though maybe they were more ironic than actually fascist. The "milano" part of Milano Mosh turns out to be a reference to Billy Milano, one of the band members, not an evocation of Benito and Clara hanging in the Piazzale Loreto, as I first thought it might be. In addition to being in on the birth of mosh, S.O.D.'s other linguistic claim to fame was their song Speak English or Die, which gave its name to the (first?) mosh album.

None of that was in Marshall Mather's mind when he wrote his lyrics to Mosh, I'm sure.

Let me be the voice in your strength and your choice
Let me simplify the rhyme just to amplify the noise

If Mosh helps energize voting among American youth, it'll be a triumph. But I'm still a little uneasy about all those black hoodies. It's almost halloween, so a little creepiness is seasonally appropriate. Still, simplified rhymes and amplified noise are good entertainment but problematic politics.


Posted by Mark Liberman at 03:52 PM

The unspeakable

I noticed that two films have arrived in town bearing names that cannot literally be pronounced because they are not spelled with letters. One is I ♥ Huckabees, directed by David O. Russell (where that ‘♥’ should appear as a heart; under Firefox 1.0.6 on a Mac it does not, I notice). The other is What the #$*! Do We Know, a documentary by William Arntz, Betsy Chasse, and Mark Vicente. People appear to be getting around the problem typographically by writing as "(heart)", and writing #$*! as "(bleep)", and presumably pronouncing them likewise.

In related news from the orthography/phonetics interface, Language Log readers may be surprised to hear that after a reorganization, Royal Dutch / Shell changed its name yesterday. It is now called Royal Dutch Shell. The slash is gone, but phonetically the name remains the same.

Get your language news here on Language Log. We may spend a certain amount of time obsessing over piddling orthographic trifles of no importance to man or beast, but hey, the price is right, you know that.

Posted by Geoffrey K. Pullum at 01:55 PM

October 28, 2004

World Champions???

Geoff Pullum's note on the victory of the Boston Red Sox in the World Series quotes Brian Weatherson's exclamation:

Boston is Champion of the World!!!
I'm sure I'll incite the undying hatred of American baseball fans, but this isn't true. The so-called "World Series" is a competition open only to teams from the United States and Canada. There is a lot more to the world than these two countries. To become world champions, a team would have to win a competition open to all of the countries in the world.

I have always supposed that everyone knew that the World Series was misnamed, but I'm beginning to wonder. Brian Weatherson is not alone in calling the winners of the World Series the world champions. Is this some sort of Whorfian effect? Do people believe that the winners of the World Series are world champions simply because the word "world" is , inappropriately, used?

Update: Readers point out that the US champions in football, hockey, and basketball are also commonly described as "world champions". One reader points out that this is particularly unreasonable in the case of basketball as this is a sport that is played in many countries and in which the US is not dominant, as evidenced by defeats of the US team in the Athens Olympics and in the real World Championship.

For those who were wondering, no, I do not seriously entertain the Whorfian explanation. I think its a combination of imperial arrogance and, in some sports, a long period of American dominance.

Posted by Bill Poser at 12:56 PM

The base

An unpleasant chill down the spine as I read the editors' piece "The Choice" in The New Yorker (November 1 edition, p. 37) this morning. It mentions that the Bush regime calls its staunch conservative Christian support community "the base". And they add in a parenthesis: "(in English, not Arabic)".

Translating "the base" into Arabic gives us, of course, al qa'eda. I think I might have been happier if they hadn't pointed out that parallelism.

Posted by Geoffrey K. Pullum at 12:25 PM

Moi is what moi am

A familiar case of pronoun borrowing, closer to home than the phenomena in Bahasa Malaysia, and also motivated by status calculations. Scroll down in the linked page to see the title of this post in context. As seems to be the case in Malaysian, there is some equivocation about whether the borrowings are pronouns or just nouns.

Posted by Mark Liberman at 11:58 AM

Red Sox win

You are probably thinking that even the seguemeisters of Language Log cannot find a linguistic angle to the fact that the Red Sox just won the World Series. Over at Brian Weatherson's Thoughts, Arguments and Rants, all pretense at philosophical content disappeared: "Go Sox! Go Sox!! Boston is Champion of the World!!!" wrote the philosopher proprietor and fanatical Red Sox fan. Philosophy only came back onto the agenda in a parenthesis, as Brian realized it would be morally wrong to encourage property damage: "(But don't burn down too much of Boston in celebrating this fact - we'd miss it.)"

I promise you, though, I do have a linguistic connection for Red Sox fans. If you would like to read on.

*     *     *     *     *

Regular readers will be able to name my least favorite book in the world: it is Strunk & White's The Elements of Style, a horrid little compendium of unmotivated prejudices (don't use ongoing), arbitrary stipulations (don't begin a sentence with however), and fatuous advice ("Be clear"), ridiculously out of date in its positions on appropriate choices among grammatical variants, deeply suspect in its style advice and grotesquely wrong in most of the grammatical advice it gives. (Don't make me go on; if you want an hour-long lecture on the demerits of this beastly little book, that can be arranged.) One of the things that worries me about the number of Americans who seem to treasure this little piece of trash, now in its fourth edition since it was revived in the 1950s after E. B. White undertook a light revision of an earlier version by Strunk and added a chapter on style, is that they just don't realize how absurdly old it is — that it is pretty much not even a work of the last century, but rather reflects ideas formed in the one before that.

White agreed to revise and expand Strunk's little book for the reissue in 1959 because he already revered it. Strunk had been one of his professors at Cornell. He vividly remembered Strunk positively shouting vacuous slogans like "Omit needless words!" at him, and apparently he thought it was wonderful. The interesting thing is about when it was that White took that course.

Strunk had been born in 1869. That is, he was old enough to read the news when General Custer led his men to massacre at the Little Big Horn. Strunk was a grownup with a Ph.D. when Dracula was first published. By the time White was his student and had to buy the privately published precursor of what would become Strunk & White, the professor had reached the age of 50. It was 1919.

It's no wonder Strunk's view about a phrase like everyone in the community, whether they are a member of the Association or not was that it should be "corrected" to everyone in the community, whether he is a member of the Association or not: women still didn't have the vote in America, so who would care if this sort of use of he excluded them. Prohibition was newly adopted; the Model T Ford was on sale; the Treay of Versailles was being readied for signature to formally end the First World War.

But what I'm saying about the extreme age of the outdated nonsense in Strunk & White can perhaps best be put like this: White's formative experience in Strunk's class was so long ago that the Red Sox had just won the World Series the year before.

[Click for more Language Log posts this subject.]

Posted by Geoffrey K. Pullum at 11:43 AM

More Borrowed Pronouns

A while back we had some discussion of whether pronouns are unborrowable and can therefore be used as sure indicators of genetic affiliation. Sally Thomason pointed out some examples in which pronoun systems have changed so radically as to make their relationship undetectable through superficial inspection. I pointed out that there are a fair number of documented examples of borrowing of pronouns. Another example has now come to light in Malay. Darul Fahim has posted a discussion of the borrowing of the English pronouns I and you into Malay, which he says is taking place as a way of avoiding the need to make decisions about which of the native Malay pronouns to use based on complex considerations of relative social status.

Posted by Bill Poser at 11:39 AM

Hominin, hominid, hominoid, whatever

Reading about the Javanese hobbits in Nature, you may wonder whether hominin is a typo for hominid; and then, realizing that it isn't, you may wonder what the difference really is. Dictionaries won't help you -- at least none of the ones that I have do. Hominin isn't even in the OED or the American Heritage Dictionary or Merriam-Webster's 3rd Unabridged or the online version of Encarta. But Thomas Greiner explains it all here:

...when scientists use the word hominin today, they mean pretty much the same thing as when they used the word hominid twenty years ago. When these scientists use the word hominid, they mean pretty much the same thing as when they used the word hominoid twenty years ago.

This is one of the consequences of insisting on an distinguishing certain types of evolutionary "clades" as the well-defined taxonomic levels of class, order, family, genus, species -- just like we distinguish particular levels of government in the U.S. as state, county, city.

Recent work shows that apes are not a monophyletic group (all descended from one ancestor), so that chimps and gorillas share a more recent ancestor with humans than they do with the orangutan. That means that, on the strict taxonomic level, chimps and gorillas are hominids. There are some specialists that use the term in this way -- although it gets very confusing when they do. If chimps and gorillas are hominids, what then do we call the group that leads to humans but not to chimps and gorillas? For that, we come up with a new taxonomic level called Tribe, that lies between Family and Genus. The Tribe hominini describes all the human species that ever evolved (including the extinct ones) that excludes the chimps and gorillas.

That's us hominins, if you're keeping score at home.

Advantage, Google (lexicographically speaking).

It's not clear that insisting on these qualititative distinctions among cladistic levels is really doing enough work to be worth the trouble, whether in naming clades or in assigning scientific names to species. Kevin de Queiroz and the rest of the PhyloCode gang think that it's all a big mistake, an early-enlightenment missstep like the phlogiston theory, and ought to be tossed, leaving species (= terminal node) and clade (= non-terminal node) as the only qualitatively-distinguished categories in the evolutionary tree. At least, I think that's what they think! In any case, it's very much a minority view these days, but it seems to be gaining ground. (As it happens, I've been meaning to try to understand -- and blog about -- PhyloCode for a while now, and this is some new motivation. If the creeks don't rise, I'll get to it within a week or so).


Posted by Mark Liberman at 04:53 AM

October 27, 2004

"Appalling" discovery of hobbitoid hominins

The internets are buzzing with news about the announcement in Nature (of a discovery back in September 2003) of a new pygmy hominin species named Homo floresiensis, apparently a local island specialization of Homo erectus. These folks lived on the Indonesian island of Flores, happily hunting pygmy elephants and giant rats, until a volcano did them in about 12,000 years ago. The primary papers in Nature are "A new small-bodied hominin from the Late Pleistocene of Flores, Indonesia", by Peter Brown et al., and Archaeology and age of a new hominin from Flores in eastern Indonesia, by M. J. Morwood et al. There's also a discussion paper "Human Evolution Writ Small" by Marta Mirazon and Robert Foley, a news piece by Rex Dalton entitled "Little Lady of Flores Forces Rethink of Human Evolution", and an interview with Brown and Morwood. (Some of this stuff will only be accessible if you have a subscription to Nature, alas).

An ignorant headline writer at The Scotsman describes the find as a "species of hobbit-sized ancestors", though Peter Brown, who led the analysis team, states quite clearly in the Nature interview that

Although it was a member of our genus, H. floresiensis is unlikely to have contributed to the gene pool of H. sapiens. So for me, its importance is not in the evolutionary story of modern humans, but in how the broad group from which modern humans evolved may have adapted and evolved to different ecosystems.

Other headlines (e.g. in the NYT) use more appropriate words like "cousins" rather than "ancestors".

There are a couple of linguistic hooks. On the substantive question of the new species' linguistic abilities, Nicholas Wade's NYT piece quotes Morwood (indirectly) as arguing for "yes":

If the stone tools were made by the little Floresians, as Dr. Morwood believes, that is striking evidence of their cognitive abilities. Dr. Morwood says they must have hunted cooperatively to bring down the pygmy elephants. To conduct such hunts, and to fabricate such complex stone tools, they almost certainly had some form of language, he said.

But Wade also quotes Robert Foley (co-authored of the discussion article in Nature) expressing skepticism:

This will be a surprising finding, if true, because the little people have brains slightly smaller than a chimpanzee and similar in size to Australopithecenes, the ape-like ancestors of the human line. Dr. Foley said he would not rule out Dr. Morwood's suggestion but noted that chimpanzees hunt cooperatively without using language. Modern humans are known to have reached Australia by at least 40,000 years ago and were probably in the general neighborhood of Flores at the same time, so it is a plausible alternative that they could have been the makers of the stone tools. "I think it's a big jump" to assume the Floresians had language, Dr. Foley said. He also noted the danger of assuming the Floresians behaved like diminutive people when their nature might in fact have been quite different.

Never mind chimpanzees hunting cooperatively, what about wolves?

If there's any information about sphenoidal angles or cerebral asymmetries in the endocasts, or other possible cranial indications of adaptations for speech, I didn't see it. [Update: Brown et al.'s list of characteristics (of skeleton LB1) includes the laconic note "Cranial base flexed", and Mirazón et al. do comment that "Most of LB1's other characteristics, such as the thickness and proportions of the skull, the flexion evident at the skull base, and the shape of the teeth, are derived traits of the genus Homo."] However, as Rex Dalton's news piece in Nature explains

The hominin bones were not fossilized, but in a condition the team described as being like "mashed potatoes", a result of their age and the damp conditions. "The skeleton had the consistency of wet blotting paper, so a less experienced excavator might have trashed the find," says Richard Roberts of the University of Wollongong, Australia.

so maybe such measurements are not possible or at least not believable.

On a less substantive note, NPR has more Q&A, in which Brown is quoted as saying that

It's the most fabulous testimony to the appalling preservation potential of fossils in the geological record, and makes you wonder just how many fossils of other human species, as well as other members of the animal kingdom, lie concealed in some subterranean time capsule, patiently awaiting discovery! [emphasis added]

If the transcript is accurate, this is a very odd use of the word appalling, which for Brown has apparently been bleached of its negative connotations and just means something like "very surprising".

Well, surprising it certainly is, but appalling? I don't think so. Unless this is a covert reference to the "mashed potatoes" or "soggy blotting paper" consistency of the skeletons, which some might consider at least yucky.

[Update: the degree of encephalization (relation of brain size to body size) was surprisingly low for a descendent of H. erectus, as this figure from Brown et al. shows:

The H. floresiensis skeleton (referred to as "Liang Bua" in the figure, after the location where it was found) is more like an australopithecine in this parameter.]

[Update #2: The low degree of encephalization and the relative size of the tools (which were not found in very close association with the H. floresiensis remains) lead some anthropologists (such as Colin Grove, who is quoted in the linked article) to wonder whether the tools may have come from H. sapiens settlers, whether contemporary or later. And more facts and speculations about the discovery can be found here, here, and here here. ]


Posted by Mark Liberman at 10:10 PM

The Axis of Spam

America is under attack. Unwanted linguistic material, possibly hazardous, is being launched at us in vast quantities. Spam received at my email address, which I now never advertise anywhere in machine-readable form (though it is of course too late) is now vastly outnumbering genuine messages. And I have noticed that a large and increasing proportion of it is now coming from accounts in the national domains of foreign countries. To fight this we must know our enemy. Which countries are the most guilty? Who are the gravest offenders in the Axis of Spam? Here are some very rough details for the spam I have received (all of it trapped by the indispensable spamassassin) in the last few weeks.

The details have to be rough because of course many (perhaps most) headers are forged, often "From" lines and "From:" lines and "Received from" lines and "Sender" lines and "Reply-To" lines don't match, and so on. The Unix tools I have used make it easy to automate the count without any human effort, but they go wrong on some unexpected kinds of message, they will not detect duplicates, and so on. For what it is worth I have ranked the countries named below in reverse order by one very crude measure: the number of times their suffix appears in material that has been correctly filtered as spam by my filtering. Here, then is a very rough guide to evilness of countries as spam sources, in decreasing order of evildoerhood. The number at the left is the count for that country's suffix in my sample of spam.

33 Italy (.it)
16 Japan (.jp), Germany (.de)
14 France (.fr)
13 The Czech Republic
12 The United Kingdom (.uk)
11 China (.cn)
8 Guatemala (.gt), South Africa (.za), Spain (.es), Taiwan (.tw),
7 Portugal (.pt)
5 Austria (.at), Latvia (.lv), Poland (.pl), Sweden (.se)
4 Argentina (.ar), Australia (.au), Canada (.ca)
3 Brazil (.br), Greece (.gr), Iceland (.is), Israel (.il), Lithuania (.lt), Russia (.ru), Slovenia (.si)
2 Cocos (Keeling) Islands (.cc), Hungary (.hu), Netherlands (.nl), Nieue (.nu), Panama (.pa), Peru (.pe), Romania (.ro)

There are the data. Make of them what you will. I cannot see much rhyme or reason in any of it — except for one extraordinary fact that John Bell pointed out to me (oddly I had missed it): the three original Axis Powers of World War II are right at the top of the list! Could spam be a continuation of World War II by other means?

Apart from that, not much that is significant. Countries using Indo-European languages are in the majority, but that's not much of a surprise considering the character of the Internet. Putative friends and allies like the U.K. appear to rank as more evil than known enemies like France. (Just kidding! A hearty welcome to all our French readers!) Tiny Latvia and Lithuania are in on the act along with huge countries like Brazil and Australia. Poor countries like Guatemala and rich countries like Japan are cheek by jowl.

I have had at least some spam from other countries that do not show up in the sample surveyed above: Chile (.cl), Denmark (.dk), Finland (.fi), India (.in), Ireland (.ie), Malaysia (.my), Micronesia (.fm), and Switzerland (.ch).

Switzerland! "Five hundred years of peace and democracy," says Harry Lime (played by Orson Welles), contemptuously, in The Third Man; "and what did that produce? The cuckoo clock." Well, that, and good chocolate, and the Vatican Guard, and also spam advertising cheap software, it seems.

Absent from the above list are countries the USA has recently made war on (Afghanistan, .af, and Iraq, .iq), but also the two countries belonging to the original Axis of Evil who have not yet been attacked (Iran, .ir, and the Democratic People's Republic of Korea, .kp). The two countries that are usually at the top of Transparency International's list for freedom from corruption (Denmark, .dk, and Finland, .fi) are sending spam, and the two countries most usually found right at the other end, the most corrupt in the world (Cameroon, .cm, and Bangla Desh, .bd) are not.

A rather weird thing is the absence of Nigeria (.ng), considering that every day I am contacted several times by Nigerian widows, businessmen, and ex-aides to fallen government ministers who want my help in shifting hundreds of millions of illicit dollars into the country. But they always seem to get hold of .com or .net addresses.

Of course, right at the top of the list would be the most evil country in cyberspace: the USA, which sends most of the spam in the world, as Noam Chomsky would certainly want me to point out. But I cannot work out how many American spams I get, because .us is hardly used so far, and names in the .com and .net domains and the like can be purchased or used by non-Americans too. What I can tell you is that if yahoo.com and hotmail.com were countries, they would be the world's most evil — Yahoo! (which has several foreign subsidiaries) being the most evil, generating about 10% of all the spam I get, and Hotmail second, with about 7%.

Posted by Geoffrey K. Pullum at 04:43 PM

Scott on Saxon swine

When I first encountered the French words for farm animals such as pig (fr porc, cf pork), ox (boeuf, cf beef), calf (fr veau, cf veal), it seemed as though the French couldn't look at an animal without salivating. However, the fact that English vocabulary distinguishes field-names from food-names for certain animals is, crudely stated, a consequence of the Norman conquest of 1066 and the following centuries in which the French-speaking aristocracy ruled over English-speaking peasants. Recently I discovered a delightful discussion of this point of vocabulary in Walter Scott's Ivanhoe (1820), a novel set in 12th century England. We pick up a conversation between Gurth, the slave-swineherd, and the jester Wamba, in a glade in the ancient forest near Sheffield, around sunset...

"Truly," said Wamba, without stirring from the spot, "I have consulted my legs upon this matter, and they are altogether of opinion, that to carry my gay garments through these sloughs, would be an act of unfriendship to my sovereign person and royal wardrobe; wherefore, Gurth, I advise thee to call off Fangs, and leave the herd to their destiny, which, whether they meet with bands of travelling soldiers, or of outlaws, or of wandering pilgrims, can be little else than to be converted into Normans before morning, to thy no small ease and comfort."

"The swine turned Normans to my comfort!" quoth Gurth; "expound that to me, Wamba, for my brain is too dull, and my mind too vexed, to read riddles."

"Why, how call you those grunting brutes running about on their four legs?" demanded Wamba.

"Swine, fool, swine," said the herd, "every fool knows that."

"And swine is good Saxon," said the Jester; "but how call you the sow when she is flayed, and drawn, and quartered, and hung up by the heels, like a traitor?"

"Pork," answered the swine-herd.

"I am very glad every fool knows that too," said Wamba, "and pork, I think, is good Norman-French; and so when the brute lives, and is in the charge of a Saxon slave, she goes by her Saxon name; but becomes a Norman, and is called pork, when she is carried to the Castle-hall to feast among the nobles; what dost thou think of this, friend Gurth, ha?"

"It is but too true doctrine, friend Wamba, however it got into thy fool's pate."

"Nay, I can tell you more," said Wamba, in the same tone; there is old Alderman Ox continues to hold his Saxon epithet, while he is under the charge of serfs and bondsmen such as thou, but becomes Beef, a fiery French gallant, when he arrives before the worshipful jaws that are destined to consume him. Mynheer Calf, too, becomes Monsieur de Veau in the like manner; he is Saxon when he requires tendance, and takes a Norman name when he becomes matter of enjoyment."

"By St Dunstan," answered Gurth, "thou speakest but sad truths; little is left to us but the air we breathe, and that appears to have been reserved with much hesitation, solely for the purpose of enabling us to endure the tasks they lay upon our shoulders. The finest and the fattest is for their board; the loveliest is for their couch; the best and bravest supply their foreign masters with soldiers, and whiten distant lands with their bones, leaving few here who have either will or the power to protect the unfortunate Saxon..."

[ Ivanhoe | Walter Scott | Project Gutenberg E-Text ]

Posted by Steven Bird at 07:55 AM

-ome is where the art is

I've spent the past three days at the SOFG-2004 conference, among folks who are wrestling not only with the genome, but also with the proteome, the transcriptome, the reactome, and the metabolome (or is it the metabonome?). X-ome in this sense means something like "the overall ensemble of Xs". Though it's not quite that simple, since the reactome is "a knowledgebase of biological processes", i.e. reactions. So perhaps X-ome, in the modern sense, is "a systematic digital compendium of things denoted by some word derived from the stem X".

As usual, there's a bit of historical dirt in the morphological mix. For instance, I've been confused about the distinction between metabonomics and metabolomics, which seem to be used interchangeably to refer the ensemble of information about metabolites and their interconnections. Google tells me that

These very similar terms have arisen at about the same time in different area of bioscience research, mainly animal biochemistry and microbial/plant biochemistry respectively. Although both involve the multiparametric measurement of metabolites they are not philosophically identical as metabonomics deals with integrated, multicellular, biological systems including communicating extracellular environments and metabolomics deals with simple cell systems and, at least in terms of published data, mainly intracellular metabolite concentrations.

And to get the form metabonome, I think we have to postulate a mis-parsing of genome as ge+nome and metabolism as metabo+lism, so as to give us metabo+nome: an -omic eggcorn or folk etymology. This is the only use of an invented affix -nome that I've seen so far.

From an on-line -omes and omics glossary, I've learned that other recent coinages include behaviourome, cellome, clinome, complexome, cryptome, crystallome, ctyome, degradome, enzymome,epigenome, epitome, expressome, fluxome, foldome, functome, glycome, immunome, ionome, interactome, kinome, ligandome, localizome, metallome, methylome, morphome, nucleome, ORFeome, parasitome, peptidome, phenome, phostatome, physiome, regulome, saccharome, secretome, signalome, systeome, toponome, toxicome, translatome, transportome, vaccinome, and variome.

Not all the derivations are obvious: thus good old epitome gets overloaded because epitomics is defined as "a new field of science that studies all epitopes of the proteome in an organism. ... An epitope is a functional recognition site that binds by a specific monoclonal antibody." And my current favorite X-ome is a sort of pun:

At present, a large proportion of genes can only be described as members of the unknome -- those with currently no functional information!  [Dov Greenbaum "Interrelating Different Types of  Genomic Data" Dept. of Biochemistry and Molecular Biology, Yale Univ. 2001]

Terminological oddities aside, the intrepid -omicists deserve admiration. Like the intellectual adventurers of the enlightenment, they foresee that an encyclopedic compendium of the facts of nature will reveal hidden truths. They go boldly botanizing in the nucleus and the cytoplasm and even the far reaches of the extracellular matrix, like Humboldt on Tenerife or Darwin in the Galapagos. Darwin suggested the source of their zeal when he wrote in his autobiography that "no pursuit at Cambridge was followed with nearly so much eagerness or gave me so much pleasure as collecting beetles".

According to the American Heritage Dictionary, the -ome suffix comes from Greek via New Latin -ōma, -ōmat-. The original Greek meaning of the -ome morpheme foreshadows the notion of a large ensemble of items: e.g. phyllon "leaf", phyllôma "foliage". But according to the OED, -ome spent some time in less relevant corners of biology, as an

anglicized form of -OMA (partly through influence of G. -om, F. -ome), occurring chiefly in Bot. in terms such as CAULOME, HADROME, PHYLLOME, RHIZOME, and usu. signifying a structure or group of cells forming a normal part of the anatomy, in contrast with the abnormality implied by -oma (cf. MYCETOME, an organ in insects, MYCETOMA, a fungal skin disease). It also occurs in a few obs. forms of words now written -oma, e.g. FIBROME, TUBERCULOME.

The oldest -ome word I could find is rhizome "A prostrate or subterranean root-like stem emitting roots and usually producing leaves at its apex; a rootstock." with citations from 1845, though phyllome follows not too much later:

1858 MAYNE Expos. Lex., Phylloma. Herschel terms thus..the whole of the germs destined to produce the leaves which come from the bud..when it is developed: a phyllome.

By the early 20th century, we have biome in the sense of "biotic community":

1916 F. E. CLEMENTS Plant Succession 319 Human evidence of past climates and biotic communities, or biomes, must come to be of very great value.

The first use of genome in the sense of "the sum-total of the genes in ... a set [of chromosomes]" was in 1930, where it was used to refer to one of the (haploid) sets of chromosomes in a diploid organism:

1930 Cytologia I. 14 Chromosomes from different sets (or genoms) of Triticum vulgare show affinity toward each other.

This meaning was still standard in the 1960s:

1965 A. M. SRB et al. Gen. Genetics (ed. 2) vii. 190 Among organisms with chromosomes, each species has a characteristic set of genes, or genome. In diploids a genome is found in each normal gamete. It consists of a full set of the different kinds of chromosomes.

The -omic flowering is very recent, and almost entirely limited to the life sciences. Thus the compilation of data that forms the subject matter of economics is by no means the econome, nor is the ensemble of taxonomies the taxonome. And no one has yet suggested to my eight-year-old son that his compendium of information on his favorite games could be called the Pokémome.


Posted by Mark Liberman at 01:06 AM

October 26, 2004

Puncturing etymology porn

That was the subject line of an email from Rich Alderson, commenting on a post about the history of boring and some other words. As he wrote, "Man, does *that* look like a spam subject header." The body of his note:

Just wanted to comment on your

??His balloon punctured.

Easily gotten in context:

The task set for the game show contestants was to see who could
carry a fully inflated child's balloon through a narrow corridor
lined with various sharp objects. The male contestant was much
too sure of himself, moving very aggressively, and not surprisingly
his balloon punctured first.

And while I get bored with some things much more often than others, I didn't
even blink at "quickly boring of the task".

Rich's proposed context certainly sugar-coats the pill. (Though his gender stereotypes are less inventive than his linguistic context: why not "the female contestant was much too sure of herself, moving very aggressively..."?)

I'm reminded of a series of Language Log posts in which Geoff Pullum and various readers examined the question of whether there are any strictly transitive verbs in English. Phil Resnik referred the discussion back to the recent literature on lexical syntax and semantics. And (this was my favorite part) Sally Thomason wrote about how strict transitivity almost caused a prison riot.

But (I think) all of the examples in those earlier discussions involved some kind of null complement, where an object is omitted as generic or habitual or anaphoric or otherwise unneeded. The outcome of the discussion seemed to be that pretty much any transitive verb can have its object thus annulled, if you work at it. Geoff ended with this quotation from T.H. White:

"There is nothing," said the monarch, "except the power which you pretend to seek: power to grind and power to digest, power to seek and power to find, power to await and power to claim, all power and pitilessness springing from the nape of the neck."

The examples with bore (and puncture) are different. These are causative/inchoative alternations, pairing an intransitive phrasing about something that undergoes a change of state ("chocolate melts") with a transitive phrasing about some cause that induces such a change ("heat melts chocolate"). But perhaps the conclusion is similar: this alternation can be induced in any transitive verb (that can be construed as) denoting the causation of a change of state in its object. If you work the context hard enough, at least.


Posted by Mark Liberman at 07:30 AM

October 25, 2004

Computing in Zulu

Some people seem to have difficulty with the idea that people may like to use their own language, even if it is not particularly widely used. The folks at www.translate.org.za, a site devoted to a project to translate Free Software into the eleven official languages of South Africa, quote the following from an article in the Toronto Globe and Mail:

Dwayne Bailey hears the question all the time. "Why bother translating software into isiZulu?" people ask him. "Who needs it? English is the language of global business -- you'd be better off spending your energy teaching people English. To which Mr. Bailey replies, quite simply, "Izixhobo kufuneka zisebenzele abantu, hayi abantu izixhobo. Isoftware sisixhobo ngoko ke kumele sisebenzele abantu ngolwimi lwabo lwasemzini!
Clear enough?

Bailey answered in Zulu to make the point that English is as impenetrable for a Zulu speaker as Zulu is for an English speaker. His answer means:

Tools adapt to people not people to tools. Software is a tool, so it must adapt to people and their language.

Posted by Bill Poser at 04:50 PM

Wilgoren invents a trend

I've often criticized the bushisms industry, by questioning some of the specific complaints about George W. Bush's speech, and by pointing out comparable disfluencies, mispronunciations and malapropisms committed in public speaking and writing by others, including Kos, John Kerry, Charles Gibson, and Jacob Weisberg himself. This is a fruitless endeavor, of course, since once journalists get hold of a stereotypical association, they hang on to it like a superstitious athlete wearing a long-unwashed lucky jockstrap.

In an equally fruitless struggle, Geoff Nunberg has repeatedly debunked the stereotyping of John Kerry as a linguistic elitist. As Bob Summerby observed in the Daily Howler, this bit of groupthink surfaced again in a 10/22 NYT article, where Jodi Wilgoren claimed that

Mr. Kerry has been doing what he can to seem more down to earth. He uses more contractions and drops G's, T's, and N's, making "does not" sound like "dudnt," and "government" come out, as it might have in the Old West, "guvmint."

More contractions? What's the evidence? I've heard Kerry talking like this, in his informal mode, since I first listened to him campaigning in the primaries. Has Wilgoren gone over Kerry's speeches and made a count? Has someone else? I doubt it -- I'll bet that this is Wilgoren's vague impression, at best, and more likely it was just invented to plug into a rhetorical structure requiring something that can pass for a fact.

Wilgoren's curious description of the phenomena doesn't add to my confidence on this point. Talking about dropping G's is conventional though misleading, but what does it mean to drop T's and N's? This moves beyond ignorance into incoherence.

Why do journalists think that when it comes to matters of style, gesture, language use and so on, it's OK to fill straight news articles with unsupported impressions, careless exaggerations and plain old fabrications? Most NYT writers would think twice before claiming that some non-linguistic statistic has changed in a significant way -- casualties in Iraq, or jobs created in Ohio -- if they had no evidence other than a vague personal impression, and didn't even know how to describe the phenomenon coherently. Why is it OK to make a descriptively incoherent and factually unsupported assertion about Kerry's contractions, without even the usual figleaf of a quote from an anonymous source?

I think the answer is clear. Editors encourage journalists to make assertions about public figures' style -- how they walk, talk, dress, gesture, grimace and so on -- because this makes their stories more interesting and more accessible to readers. The factual content of these assertions, is rarely if ever checked, and no journalist ever gets into trouble for selecting, exaggerating or even inventing "facts" of this kind. However, out of cowardice and laziness, journalists tend to select or invent stylistic "facts" that resonate with some aspect of the conventional wisdom -- some stereotype -- about the people they're covering. Political operatives work hard to play these resonances on behalf of their candidates and against the opposition.

Bob Summerby's piece balances criticism of Wilgoren's "silly clowning" about Kerry's contractions with retrospective analysis of Frank Bruni's 1999 invention of the Bushism concept, and Katherine Seeley's 2000 put-down of Gore for using a few uhs. Summerby's larger point is that reporters at the Times and elsewhere routinely use a wide variety of novelistic tricks to manipulate public perceptions of political candidates and other public figures. It seems to me that this is true without any doubt, and I agree with Summerby that it's a major problem (though not a new one).

But I don't agree with Summerby that "discussions of candidate speech patterns are hopelessly subjective and trivial". Speech patterns can be accurately described, both in particular cases and in statistical aggregates. While many aspects of speech patterns are politically trivial, others may be relevant to voters' choices, if only to counter the effects of (negative or positive) stereotypes on communication across regions, subcultures and classes. And people are interested in such things, so they're going to notice them and talk about them in any case.

So I'm not making the argument that description and discussion of public figures' linguistic style ought to be out of bounds in principle. But shouldn't there be some standards? We don't like the idea of journalists (or politicians) lying to us about matters of content -- why should they be free to lie about style?

We're starting to see some blogospheric fact-checking about this sort of thing, as in Summerby's widely-read complaints. Not that this is likely to have much direct impact on the guilty party in this case. An American Journalism Review article quotes Wilgoren: "I've never been on InstaPundit. I don't even know what that is."

[Update: On October 24, Maureen Dowd chimed in:

The senator is desperately trying to prove his regular-guydom. He's using more contractions and dropping G's, T's and N's, as Ms. Wilgoren points out, and he drank Budweiser with his male aides while watching a Red Sox game, when you know he was dying for an imported beer.


[Update: this (satiric -- I trust) journalist's weblog captures the situation precisely. ]


Posted by Mark Liberman at 07:34 AM

[fake [news show]] or [[fake news] show]?

Crossfire is one of my least favorite TV experiences, so I was one of the millions who enjoyed Jon Stewart's 10/15 take-down of Tucker Carlson and Paul Begala. I had nothing to add that fits this blog's theme, but now I'm glad to be able to comment on Dan Quattrone's post about it at Doing Things with Words. Dan points out that "The Daily Show is a fake news show, and there's a couple ways to parse that - as a fake news show, or as a fake news show." Dan concludes that the Daily Show is the second of these.

For lagniappe, I'll observe that this is also the internet's a priori judgment as to how the three-word phrase "fake news show" should be parsed. At least, that's the verdict of the simple-minded bigram-counting method discussed here -- "fake news" has 56,000 web hits on Google (whG), while "news show" has 563,000 whG.

If you want to apply one of the more sophisticated methods, using other pattern frequencies as well, "fake * show" has 5,210 whG, while "fake show" has 942; and the unigram frequencies are "fake" 7.29M, "news" 424M, "show" 154M; and Google claims to be indexing 4,285,199,774 pages overall. But I haven't found that other methods routinely beat adjacent bigram comparison, though perhaps in principle they should -- this is worth a discussion in itself, someday.

Of course, none of this has anything to do with why Crossfire is such a cancer on the body politic.


Posted by Mark Liberman at 05:59 AM

October 24, 2004

An informative tautology

Among the examples of true Bush tautologies cited by Geoff is:

If affirmative action means what I just described, what I'm for, then I'm for it." George W. Bush, during the third presidential debate, St. Louis, Mo., October 18, 2000
This is is actually an informative statement and doesn't deserve to be derided as a tautology. The reason that tautologies are derided is because they typically convey no information. Sometimes, however, they do convey information. In this case, what Bush is saying is that "affirmative action" means different things to different people and that he is in favor of some of them, and, implicitly, opposed to others. He's not alone in this. There are, for example, lots of people who favor such activities as recruiting of under-represented groups but oppose quotas.

In fact, there is a large class of statements that are technically tautologies but are nonetheless informative. These are statements that equate terms whose intensions are distinct but in fact have the same extension. Crudely put (since the details are the subject of on-going debate among semanticists and philosophers), the extension of an expression is the set of things of which it is true. The intension of an expression is the set of properties taken to be characteristic of it. The classic example of an informative tautology is "Hesperus (the evening star) is Phosphorus (the morning star)." This is a tautology since Hesperus and Phosphorus are both the planet Venus - the two expressions thus have the same extension. It is, nonetheless, informative, because the intensions of the morning star and the evening star are different and we learn something when we are informed that they are actually the same heavenly body.

Posted by Bill Poser at 11:33 PM

More on Pountain on Spanish

My post on Horror and Boredom in Castile motivated an email from John Lawler.

Thanks for the piece on Pountain's paper.  I'm glad to get it. In return, let me point you at the most useful book I've ever seen for a linguist with an interest in Spanish: Batchelor & Pountain's Using Spanish: A Guide to Contemporary Usage (Cambridge 1992).

It's full of little semantic graphs like the one you reproduce (though not with the historical garnish), and has word lists that you just can't find elsewhere in one place, like all the Spanish words for places worldwide and people who come from them, all the gender-changing words and their semantic differences, etc.  A veritable Pandora's box of semantic foofaraw, with everything classified by region and register.

Keep up the good work on Language Log.

Thanks, John! I'm sure that we can expect to see your blurb ("A veritable Pandora's box of semantic foofaraw") added to the advertising for the next edition :-).


Posted by Mark Liberman at 03:18 PM

True tautologies from Bush

I recently pointed out that Philip Gourevitch was wrong with his characterization of a couple of typical Bushesque phrases as tautologous. But Tom Ace reminds me that Bush really has uttered tautologies on a number of occasions. So if you want to see some genuine Bushian tautologies, read on.

"It's very important for folks to understand that when there's more trade, there's more commerce." —George W. Bush, at the Summit of the Americas in Quebec City, April 21, 2001

"If affirmative action means what I just described, what I'm for, then I'm for it." —George W. Bush, during the third presidential debate, St. Louis, Mo., October 18, 2000

". . . the past is over." —George W. Bush, after making up with John McCain, Dallas Morning News, May 10, 2000

These statements really are tautologies in the logical sense. They simply cannot be false, no matter what the state of the world. If trade increases, commerce does. If something is indeed what you're for, then you're for it. And the past is truly over, for all of us.

Still, you have to be careful. Tom included this one:

"We want anybody who can find work to be able to find work." -- George W. Bush, 60 minutes II, CBS, December 5, 2000

It would indeed be tautologous to claim that those who can find work are able to find work (if can is taken in its dynamic "is able" sense, anyway). But the sentence above is not a tautology. If the Republican administration wants those who can find work to be able to find work, then they want a tautology to be true (and they're in luck, because they will get what they want: it is true). But it is only a contingent truth that they want that. One can imagine an administration so misguided that it wanted those who could find work to be unable to do so. They wouldn't get what they want, because it's a contradiction; but then some people want government spending to be increased in all categories while taxes are reduced and the deficit is eliminated. (Bush's speeches suggest that he actually wants that himself.) So one can certainly wish (stupidly) for a contradiction to be true, or for a tautology to be true. And if one does, it is a merely contingent truth, not a tautological truth, that those are one's wishes.

Tom also included this quote:

"Listen, Al Gore is a very tough opponent. He is the incumbent. He represents the incumbency. And a challenger is somebody who generally comes from the pack and wins, if you're going to win. And that's where I'm coming from." -- George W. Bush, Detroit, Sept. 7, 2000

But this chaotic verbal eruption too is nowhere near the logical territory of tautology. If we translate Bush's bumbling into the kind of more organized but wordy prose that people accuse Kerry of having as a native tongue, it might sound like this: "Gore is a member of the incumbent administration, hence in effect (though not currently the holder of the presidency) the incumbent in this race. The definition of a challenger is someone who comes from the pack to win (if indeed they do win), defeating a more powerful candidate already ensconced in a position of power. And I'm coming from within the pack. Hence I am a challenger, and if I win, it will be in effect the victory of a challenger over an incumbent." That at least is a coherent argument that might represent what Bush was attempting to blurt out. I see no tautologies in it.

Posted by Geoffrey K. Pullum at 02:34 PM

Here are the proofs of your grammer book

Proofs arrived for the new textbook that Rodney Huddleston and I have been writing for the past year. Always an exciting moment. But I have to confess that my heart sank when I saw the cover letter:

Dear Prof Pullum, [So far so good: that's my surname.]

Please find enclosed one set of page proofs for the book:
"A Student's Introduction to English Grammer". [Oops!]

Even though I have offered a conjectured explanation of why this misspelling is so common, it still shocked me to see it in a letter from an excellent typesetting company accompanying the proofs of a book that (I was glad to see) spelled the word correctly on the title page that followed.

Posted by Geoffrey K. Pullum at 01:34 PM

Horror and boredom in Castile

Once upon a time, Language Log was #1 for stupid ideas. On Google, that is -- where else? Now, we've been outpaced in that category by 18 other sites. Sic transit gloria googli. And if you ask Google about "boring", we're not anywhere in the top 100, despite my comments on "after boring of the task" (via Language Hat), and my etymological follow-up on the history of boring, tiring and wearying. I guess the competition is stiffer in the area of boredom than in the area of stupidity.

Of course, our goal is to have fun, not to compete in the rat race for page rank. So this is just a lead-in to telling you about a fascinating paper (brought to my attention by Mike Escárzaga): Christopher J. Pountain, "The Castilian reflexes of ABHORRERE/ABHORRESCERE: a case-study in valency".

Here's the abstract:

The purpose of this paper is to study the factors involved in the changing valency of the reflexes of ABHORRERE/ABHORRESCERE in Castilian, ie aborrir~aburrir and aborre(s)cer. I use the term 'valency' in a more restricted sense than is sometimes usual[1] to refer especially to properties of 'voice' and 'transitivity', although these properties in their turn relate to a number of combinatorial features of verbs. My main contention will be that while the semantic history of aborrir~aburrir in particular may be understood in simple terms as comprising a change in valency, the factors which bring about that change are extremely complex, involving a detailed consideration of the position of these verbs within the morphological, syntactic and semantic structures of Castilian.

The basic observation is that Latin abhorrere started out meaning "to shrink back from, have an aversion for, shudder at, abhor", but one of the Spanish descendents, aburrir, wound up meaning "to bore". So not only did the meaning change, but also the "valency" (in the sense of which verbal arguments go where). "I abhor you" turned into "you bore me".

But the space is more complicated:

and so is the history:

Read the whole thing.


Posted by Mark Liberman at 08:02 AM

October 23, 2004

Bushian tautologies? I don't think so

In his Campaign Journal article "Reality check" in the Politics Issue of The New Yorker (October 18) Philip Gourevitch twice suggests that George W. Bush's public pronouncements are characterized by tautologies. The literary term "tautology" means a redundant expression like I myself personally, but that doesn't seem to be the applicable sense in the context of what Gourevitch is saying. He appears to be invoking the logical notion of tautology. But mistakenly, as far as I can see. The reference to tautology doesn't seem justified by either of the examples given. The first is introduced thus (p.98):

Logic has never been his strong suit; in justifying his policies and actions, he prefers stonewalling (admit no error, and ignore or deny bad news) and tautology (I do what's right because it's right, and it's right because I do it).

But that latter remark is no tautology.

In logic, a tautology is a statement that depends on no state of affairs for its inescapable truth: a statement like "When you've gotta go, you've gotta go." A tautology simply can't be false.

But a claim that you do what is right because it's right is a claim about the reasons for your actions, and it could be false (if all cases of you doing what was right were actually due to inept and blundering attempts at evil).

And a claim that something is right because you do it could also be false (if there is anything that you do that is definitely not right, or if there is anything that is right for some reason other than the fact of your doing it). The conjunction of these two contingent claims is no more tautologous than either conjunct is.

A couple of pages later, Gourevitch refers to an episode in one of the debates where Bush "invoked, once again, the tautological imperative". Why "imperative"? An allusion to Kant's "categorical imperative", Jonathan Lundell suggests to me: I suppose the suggestion is that where Kant takes an action to be right if and only if implementing it universally would be a coherent thing to do in a viable society, Bush takes it to be right if and only if he implements it. But the allegedly tautologous quote from Bush is:

I assure them we're in Iraq because I deeply believe it is necessary and right and critical to the outcome of the war on terror.

"So we are in Iraq because he believes," says Gourevitch. But this isn't anything like a tautology either. It may very well be a true claim: Because Bush believes it is right and critical etc., he ordered American troops to invade Iraq, and they are there today. But if true, it is true only as an accident of history. It's not a tautology in any sense I can discern.

Journalists need a little education not only in linguistics but also in logic if they are going to comment on the logical or semantic aspects of what our leaders say.

Posted by Geoffrey K. Pullum at 11:42 PM

Their GO-mark'd love

I'm at the SOFG-2004 meeting ("Standards and Ontologies for Functional Genomics"). The lead-off plenary was given by Carole Goble. Her abstract, quoted in full below, may amuse some of you.

Two households, both alike in dignity,
In fair genomics, where we lay our scene,
(One, comforted by its logic's rigour,
Claims ontology for the realm of pure,
The other, with bless'ed scientist's vigour,
Acts hastily on models that endure),
When "being" drives a fly-man to blaspheme.
From forth the fatal loins of these two foes
Researchers to unlock the book of life;
Whole misadventured piteous overthrows
Can with their work bury their clans' strife.
The fruitful passage of their GO-mark'd love,
And the continuance of their studies sage,
Which, united, yield ontologies undreamed-of,
Is now the half hours' traffic of our stage;
The which if you with patient ears attend,
What here shall miss, our toil shall strive to mend.

Humanists, pleased that a professor of computer science is re-phrasing Shakespeare for her abstract, may forgive the mis-archaisms ("bless'ed"?) and the metrical misfortunes. Those in the biz will get the in-group references (e.g. who the "fly-man" is, and where the blasphemy took place). For those on the periphery of the field, it may help to know that GO is the Gene Ontology. And the Montagues are the AI and semantic web types, while the Capulets are the life scientists. Or maybe it's the other way around.


Posted by Mark Liberman at 08:59 PM

An evening with Nunberg

Those lucky enough to be in Santa Cruz for the Western Humanities Alliance conference this weekend had a chance to hear Geoff Nunberg giving a spectacular lecture entitled "The shadow cast by language upon thought" yesterday afternoon. His primary topic was the centuries-old view that if we could just shake off the shackles of our language, or (in a somewhat different version) stop using words so carelessly, we would at last be able to see reality with crystal clarity. It ties in with and presupposes the view that there are some old, pure, un-jargonized layers of our vocabulary that interfere with such direct and unmediated up-close perception less than certain contrived, dishonest, modern uses of language do.

Everyone and their dog seems to believe such twaddle. They talk and write about language in terms that suggest the mere use of a term like "senior citizens" will cloud our vision so we can't see grandma and grandpa clearly at all; if we would only revert to more direct and simple language (like "old folks", maybe? or "geezers"?) the scales would fall from our eyes and we would be able to see their frail bodies and trembling limbs plainly for the first time. I'm even more hostile to this dopey view than Nunberg is (I absolutely despise Orwell's essay "Politics and the English language", which did so much to popularize this view; Nunberg was relatively respectful toward it). His examples were consistently fascinating, his discussion of them constantly enlightening.

The humanists at the conference sat entranced as Nunberg developed his analysis, and I felt that unusual feeling where you wish an hour-long lecture would morph into a three-hour graduate seminar so it wouldn't have to end. When I stepped up to open the session and introduce the speaker it had been 4 p.m. By 6:20 p.m., the question period long over, people were still clustered around Nunberg in the partially cleared conference room, and it began to look as if our early dinner reservation was in jeopardy. Eventually Barbara and I broke into the knot of conferees who showed signs of wanting to talk to Nunberg all night, and — feeling slightly selfish — hauled him off to Oswald's bistro, there to talk philosophy and linguistics and politics and journalism with him ourselves for another couple of hours. A couple of hours when academic work and privilege and pleasure combined.

Posted by Geoffrey K. Pullum at 05:12 PM

Bush and "changing the language"

In one of the presidential debate discussions here on LanguageLog, Mark comments on an odd mention of "changing the language" in Bush's response to a question about health care costs:
There's just one point that still puzzles me. What did the president mean by saying that "we're changing the language"? I don't understand what that has to do with cutting costs by introducing high technology into the health care industry.
He conjectures that perhaps "changing the language" is a slogan of some kind, but that's not it.

Since I live ten or fifteen minutes from the National Library of Medicine, and have a student who works there, it only took a minute for the correct brain cells to click. Within the healthcare arena there is a large push toward language standardization in order to move toward electronic medical records. For example, NLM's Unified Medical Language System (UMLS) has been around for years as a knowledge source for biomedical natural language processing, and it incorporates terminological resources such as the International Classification of Diseases (ICD), the Physicians' Current Procedural Terminology (CPT), and, as of more recently, a standardized clinical terminology called SNOMED-CT. Now, the president has a high level information technology advisory committee (whose membership even includes a computational linguist, I'm happy to say), and recently that committee provided a report with recommendations that included a discussion of language standardization and electronic medical records. So when Bush said, "we're changing the language, we want there to be, um, electronic medical records to cut down on error as well as to reduce costs", it seems clear to me that he was responding from a recollection of a briefing on their report. (From the sound of it, a pretty vague recollection, but who can say?) Puzzle solved.

Mark goes on to wonder:

As a part-time computer scientist and full-time taxpayer, I'd be happy to believe that better automation of patient records would save 20% of health care costs -- but can this really be true?

My own two cents' worth: more automation in the handling of clinical language will be a huge win, given the amount of text out there, to say nothing of the medical research literature. That's one of the reasons biomedical informatics is such a booming field. But getting people to change the way they use language is likely to be a long, slow process, and not one easily accomplished by mandate.

Posted by Philip Resnik at 01:09 PM

Two quick eggcorns

From a ZDNet editorial, "just assume" for "just as soon" [via Kai MacTane on LiveJournal's dot_pedantic]:

In fact, the many hardcore server administrators would just assume do away with a lot of the ease-of-anything frills in return for a mean, lean, simple, command-prompt driven Web, database, e-mail, directory or database application server.

As jargon suggested in a comment, maybe this was an ASR dictation error. Then again, maybe the author (David Berlind) has learned a different idiom from the rest of us. If you think about it, "just as soon" isn't exactly transparent. "...would equally early do away with ..."? I don't think so. And searching the web for "would just assume" turns up lots of other examples:

DH would just assume throw most things away, but I'm a bit of a sentimental pack rat and find that hard to do.
I guess she would just assume keep to herself, and I would just assume she keep to herself too.
As of right now if Silo had a bevel like Modo I would just assume purchase it for 1/8 the price, until Luxology does a little bit of a better job whoo'ing the maya crowd.

By email from Jesse Clark: "pier-to-pier network(ing)" in place of "peer-to-peer network(ing)". Makes sense, right? Eliminate those troublesome shipping delays, and just virtually connect one dock directly to another? But this is one of those cases where it's hard to tell whether the writer is really thinking about juxtaposing docks, or whether they're just confused about how to spell peer.

Amplicon assists with "pier-to-pier" networking.
What is Pier to Pier networking and how is it different from Client Server networking?
I am adding a Win98 SE machine on a Windows 2000 pier to pier network and am trying to add the network printer.
Apple is working on Skype compatible software for pier-to-pier networking according to reports this AM.

Posted by Mark Liberman at 09:16 AM

October 22, 2004

The answer

The mp3 for yesterday's quiz was sent in by Stefano Taschini. It was "a fragment from Monday's radio news of the Radio e Televisiun Rumantscha, the swiss broadcasting company in Romansh". I believe that it is the variety called Rumantsch grischun in this page on il territori linguistic rumantsch.

Sauvage Noble transcribed accurately and guessed correctly. So did Chris Waigl at serendipity. (Not that they entirely agree, with one another or with me. The question of what is "right" in such cases is an interesting one, which I'll pick up another day, as I'm literally running out the door to catch a train). Ray Girvan didn't transcribe, but guessed the language correctly by email.

I'm a bit pressed for time today, so more comments and links will have to wait until tomorrow.

[Update: it's coffee break time at the NSF Cyberinfrastructure Workshop where I'm spending my day, so I'll take a minute to note that the folks over at Planet Debian think that it's the Surmiran dialect, not the Grischun dialect. I think they're right, in fact -- /pi/ vs. /pli/ looks like a clue, based on the (orthographic) renditions in the Fox and Crow passage linked above. I may have been misled by the mention of Grischun in the original recording. More on this later.

This seems to have been a popular exercise -- maybe we'll try it again in a month.]

Posted by Mark Liberman at 03:32 AM

October 21, 2004

They are a prophet

My student Nick Reynolds reports on a beautiful example of singular they found in an exchange of graffiti. Someone had scrawled this on the wall:

Vote Arnold 4 prez

— recommending a vote for Governor Arnold Schwarzenegger as President of the United States. Someone else, mindful perhaps of Schwarzenegger's ineligibility for that post, had scrawled something obscene below it about the first writer's ignorance. But a third person, mindful of how the future may resemble the world of the Terminator movies in which our governor had his greatest movie successes, added this response:

This person is not ignorant.
They are a prophet.
The machines will rule us

There are a couple of beautiful things about this particular use of the form they.

The pronoun form they is anaphorically linked in the discourse to this person. Such use of forms of they with singular antecedents is attested in English over hundreds of years, in writers as significant as Chaucer, Shakespeare, Milton, Austen, and Wilde. The people (like the perennially clueless Strunk and White) who assert that such usage is "wrong" simply haven't done their literary homework and don't deserve our attention.

The sequence they are exhibits, of course, the syntactically correct plural verb agreement. The following phrase a prophet is a singular predicative NP complement. This is again quite correct; we see the same thing in Anyone who claims they are a prophet should make sure they have some actual predictions to their credit. In that case we use singular they because the antecedent is a quantified NP, and neither he nor she is appropriate: we intend to refer to anyone of either sex who claims to be a prophet. And to use he or she would be desperately clumsy (Anyone who claims he or she is a prophet should make sure he or she has some actual predictions to his or her credit — gack!).

A minor point of interest about Nick's example is that the antecedent (this person) is a definite NP; singular they more commonly has quantified or indefinite NP antecedents, not definite ones.

But as Nick observes, the most interesting thing about his example is that the motivation for the use of singular they does not come from either indeterminacy of sex (as with antecedents like anyone) or ignorance about the sex of the referent (as in If you have a partner, you can bring them too), because the inscription was on the wall of a men's bathroom. Given the user population of such establishments, one can be entirely confident that the first writer was a male. That means the third writer could have put He is a prophet. But the fact is that singular they is becoming completely standard, at least among younger Americans, whenever the antecedent is of a sort that could in some contexts refer to either sex. I heard a radio piece about pregnant high-schoolers in which a girl said something like I think if someone in my class was pregnant I would be sympathetic to them. In such cases it's not the inability to assign sex to the referent that drives the selection of singular they, it's the mere fact of the antecedent being quantified or headed by a noun like person that can in other contexts be used of either sex. Mere inferred sex of the referent is not sufficient to force a choice of either he or she.

Posted by Geoffrey K. Pullum at 11:27 AM

Etymology porn

When I see a blurb-worthy quote on the web about Language Log, I squirrel it away in a file. Maybe someday I'll add a random blurb-o-matic to the weblog template? No, I don't think so. Anyhow, my second-favorite blurb is "Many thanks to ___ for pointing me towards Language Log. It's like etymology porn. Mmmmm."

Since Word Love is not universal, even among Language Log contributors, I should warn you that we're about to spend a few minutes poking around in odd corners of the OED. Along the way, we'll touch on some insulting 18th-century stereotypes about the French. But our main topic is the origin and progress of (the word) boring. You have been warned.

The backstory is a striking phrase from the New York Times Sunday magazine: "after boring of the task".

Current English dictionaries don't sanction an intransitive form of the verb to bore, nor the use of "of" for the source of boredom. The evidence from the net is that neither usage is very widespread. But anyone who believe in the power of grammatical consistency should find this odd:

That tires me. I'm tired of that.
*I'm tired with that.
I'm tiring of that.
??I'm tiring with that.
That wearies me. I'm wearied of that.
??I'm wearied with that.
I'm wearying of that.
??I'm wearying with that.
That bores me. *I'm bored of that.
I'm bored with that.
*I'm boring of that.
*I'm boring with that.

However exactly you assign the question marks and asterisks, this does not look like a consistent pattern. There seem to be some generalizations that have been missed -- not by the linguists, but by (most of) the speakers of English. In the original example ("before boring of the task"), Scott Anderson is just being logical, isn't he? I mean, it's not like anybody ever told me that I can't use bore intransitively. It just never occurred to me to do so. What's curious is that so few English speakers seem to have followed the same path as Anderson -- so far.

This is another good example of the quasi-regularity of language. Though I know very little about the philosophy of the law, this sort of thing reminds me of what I've read about the interplay of logic and precedent in legal reasoning.

The logic of this case depends on what you think about the grammar of transitivity and of prepositions. Depending on your views, Anderson's phrase might seem like an obvious generalization of a pattern, or an egregious failure to observe a distinction. Or both. But that's a topic for another time. This post deals with the precedents, the "case law" -- (some of) the relevant usage history of the words in question.

OK, first tiring. The verb to tire comes from from OE tíorian meaning "to fail, cease (as a supply, etc.); to diminish, give out, come to an end".

By 1000 A.D., there's an extended meaning "to become weak or exhausted from exertion", and simultaneously (in the same source) a causative (transitive) version "to wear down or exhaust the strength of by exertion".

As of 1500 A.D., there's another extension, meaning "to have one's appreciation, power of attention, or patience exhausted by excess", and again the same new source also exhibits a causative (transitive) version "to weary or exhaust the patience, interest, or appreciation of (a person, etc.) by long continuance, sameness, or want of interest".

From the first citations, this sense of tire uses "of" to express its cause:

1500-20 DUNBAR Poems lxvi. 94 Of this fals failȝeand warld I tyre.
a1578 LINDESAY (Pitscottie) Chron. Scot. XXI. xi. (S.T.S.) I. 307 The quenis grace tyrit of him and pairtit witht him.

Now wearying. The verb to weary comes from OE wériȝian (intr.) and ȝewériȝian (trans.), meaning "to grow weary" (intransitive/inchoative) or "to make weary" (transitive/causative). The OED's first cited use of "of" to express the source of weariness is from 1400:

c1400 Destr. Troy 12997 Thai werit of þere werke þe wallis to kepe.

In this case, not a lot has changed in the past millennium and a half, except that we're more likely to be wearied by tedium, ennui or heartsickness than by physical fatigue.

And finally, on to boring. Oddly enough, this one is both more recent and more mysterious. Suddenly, at some point after 1750, the British began using bore as a noun to refer to "The malady of ennui, supposed to be specifically ‘French’, as ‘the spleen’ was supposed to be English; a fit of ennui or sulks; a dull time":

1766 G.J. Williams Let. 30 Dec. in Ibid. 121 Your last letter was the most cheerful that I have received from you, and..without that d__d French bore.
1767 LD. CARLISLE Let. 8 Mar. in Ibid. 150, I enclose you a packet of letters, which if they are French, the Lord deliver you from the bore.

The OED confesses that the etymology is unknown. It's usually seen as a figurative extension of the verb to bore meaning "drill", in the sense of a persistent annoyance; but this is inconsistent with the historical record, which shows that nominal uses such as those above were first. The OED goes on to suggest that

The phrase ‘French bore’ naturally suggests that the word is of French origin; bourre padding, hence (in 18th c.) triviality, bourrer to stuff, to satiate, might be thought of; but without assuming some intermediate link these words do not quite yield the required sense.

I venture to suggest that the connection with annoying drills might have started as a late 18th-century eggcorn -- tedium is due to padding, or is that drilling? Whatever.

Of course the noun bore quickly took on the various expected metonymical extensions: "one who suffers from 'bore'"; "a thing which bores or causes ennui"; "a tiresome or uncongenial person". And a denominal verb very soon appeared as well:

1768 EARL CARLISLE Let. 16 Apr. in Jesse G. Selwyn I. 291, I pity my Newmarket friends, who are to be bored by these Frenchmen.

The -ed form bored was first used in print by Byon in 1823:

1823 BYRON Juan XIII. xcv, Society is now one polished horde, Formed of two mighty tribes, the Bores and Bored.

The adjectival -ing form boring first appears in 1840:

1840 T. HOOK Fitzherbert III. iv. 66 Emily was patiently enduring..Miss Matthews's boring vanities.

And the derived noun boredom appears first in 1864.

A hundred and forty years later, the inchoative generalization of the verb has shown up in the New York Times.

I'm shocking, are you?

[Update 1: Jonathan Mayhew writes

For me, the intransitive use of "bore easily" is a minor cliché, that is, a phrase I recognize when I see it as a set phrase, even though I might not see it all that often. Google reveals that this phrase comes up often in comparisons among breeds of dogs, in astrology, and in education. Gore Vidal, Geminis, middle-school students, and Irish setters all "bore easily," apparently. Once the intranstive use is established, it is not too much of a stretch to add a preposition, whether "of" or "with."

This is a good point, but I'm not sure that it's specifically relevant here. The VERBs easily construction is one that allows many verbs to show an intransitive-like face that they don't exhibit more generally: "This kind of balloon punctures easily", but not "??His balloon punctured."]

[Update 2: Mike Escárzaga wrote

In Spanish 'to bore' is aburrir, 'boredom' aburrimiento, etc. The Real Academia dictionary cites the Latin root abhorrere. Do you think this might deserve any credence as an ancestor of the English word, too?

Maybe so. But the French equivalent abhorrer retains the Latin meaning "to shrink back from, have an aversion for, shudder at, abhor":
ABHORRER. v.a. Avoir en horreur. Les honnêtes gens abhorrent les fripons. Les Saints abhorrent l'impiété. [From the 1762 edition of the Dictionnaire de l'Académie Française]

To get "bore", the Brits in 1760 or so would have had to borrow a term from Spanish (did that meaning for aburrir exist then?), drop the initial "a" and strip the "imiento" part in making a noun, and then for some reason use it to name a stereotypical property of the French nation. This is worth looking into, though I wonder whether it isn't just a likely that aburriri reflects a later influence from English "bore"? Anyhow, English abhor is part of the phonetosphere near bore -- as are boor and so on. ]


Posted by Mark Liberman at 10:33 AM

Snowclone sightings

Two snowclones cited by Leonard at Crummy: the Site (News You Can Bruise).

* don't * people ... and will the real * please stand up

Leonard refers to these as instances of the class "Google memewatch", but isn't sure how to pluralize the term ("Google Memewatches (Google Memeswatch?)"). He thinks snowclone is "far too precious". Well, at least we know what the plural is.

Posted by Mark Liberman at 12:33 AM

October 20, 2004

Am I boring, or are you?

Language Hat was taken aback -- linguistically -- by a phrase in Scott Anderson's 10/17 NYT Magazine piece on Darfur:

Pulling a stack of business cards from the pocket of his white robe, he read off a dizzying list of initials — W.H.O., W.F.P., I.R.C. — before boring of the task and setting them aside.

The Hat found this flat-out ungrammatical, and I agree with him. But as he says, "if there's one thing I've learned over the years, it's not to trust my own judgments about acceptability." Comments on Hat's post were mixed, suggesting that Anderson is not alone.

There are two different oddities here. The first one is using "to bore" to mean "to become weary by lack of interest", and the second one is using "of" rather than "with" or "by" to express the wearying entity.

First things first. The usual meaning for the verb to bore is "to cause to become weary by lack of interest." That is, if I'm going on and on about stuff that has no interest at all to you, you would usually say that I'm boring, not that you are. At least that's I think, and what Hat thinks. It's also what all the dictionaries that I've consulted say:

OED: trans. To weary by tedious conversation or simply by the failure to be interesting.
AHD: To make weary by being dull, repetitive, or tedious: The movie bored us.
Encarta: transitive verb / make somebody uninterested: to make somebody lose interest and so feel tired and annoyed / He bored us stiff with a detailed explanation of the itinerary for his vacation.

However, there's a general pattern in English of pairs of inchoative and causative meanings for verbs:

The water boiled / Kim boiled the water
The trees blew down / The wind blew down the trees
The wax melted / The heat melted the wax
The coffee spilled / I spilled the coffee

So it's not totally incoherent to invent an inchoative partner for the normal causative bore:

?The audience bored ← The lecture bored the audience

(where "the audience bored" is supposed to mean "the audience became bored").

The second problem is using "of" to express the cause of boredom. I'd naturally say or write "We were starting to get bored with the game", not "?We were starting to get bored of the game" (much less "we were boring of the game"!). However, we've found several other cases where unexpected prepositions sneak in, including things like "eligible of" instead of "eligible for".

The pattern "he|she|they|I|we bored of" gets 650 hits from Google, many of the right kind:

But when he bored of the indulgences of royal life, Gautama wandered into the world in search of understanding.
bored of the constraints of producing work for advertising, and he wanted to concentrate on producing paintings specifically with the intent of high quality.
We bored of it quickly.

A small pod of what appeared to be bottlenose dolphins accompanied the Midway for a little while on the way into Laysan until they bored of the pace of the ship...

However, "he|she|they|I|we got bored with" gets 15,600 hits, and "he|she|they|I|we became bored with" gets 2,770. And of the 650 "he|she|they|I|we bored of " hits, about 60% (on a sampling basis) are instance of "be bored of" (either of the form "are we bored of it yet?", or examples like "he's bored of that now", where Google ignores the 's).

So we've got 15,600+2,700 = 18,300, compared to .4*650 = 260, or 18300/260 = 70 to 1 against Anderson and inchoative bore.

And the pattern "he|she|they|I|we was|were boring of" gets only 78 hits, most of which are either irrelevant:

In fact, I might even go as far as saying they were boring. Of course, I didn't see much of them.

or seriously unidiomatic:

So after Tallika appeared I was tired, then we were waiting about one hour, I was boring of wait and then ecstasy of gold started...

There are a small number of real Andersonisms:

Hi, It is a long time now that I was boring of typing my block information header each time I add a new Function or Method.
Truth was she was boring of their conversation and needed an escape.
Rich and I were boring of our game of 'voices' as it was now called and so he quickly fixed his basketball hoop to the garage and we were playing one-on-one..

In fact, those three are all the half-way convincing ones that I could find. Google has two more cases of "before boring of", and eight relevant and reasonably idiomatic examples of "after boring of". So inchoative boring is even rarer than inchoative bore, and I conclude that Anderson is way out on the tail of a pattern of morphosyntactic variation. The surprising thing is that so many of Hat's commenters thought "before boring of the task" is fine, or at least acceptable. Either they're a suggestible bunch, or they're out there on the bleeding edge too...

The real story in Darfur, of course, is not the unacceptability of Scott Anderson's syntactic frame for boring, but the unacceptability of the Sudanese government's actions, and the cowardice and hypocrisy of the international community's reactions .

[Update: as several people have pointed out in email, tire and weary provides an obvious basis for analogy, since they have the causative/inchoative alternation already, as well as the use of "of" -- "long classes tire me", "I'm tiring of these battles".

Still, there's surprisingly little evidence of these analogies in operation out there on the internets. ]

Posted by Mark Liberman at 06:44 PM

No Left Turn

English no left turn sign CBC news reports that a justice of the peace threw out Jennifer Myers' ticket for illegally turning left on the grounds that the sign was written only in English, in violation of bilingualism legislation.Whether or not this has any larger consequences remains to be seen as the decision of a justice of the peace pertains only to the case at hand and does not set precedent. The city of Toronto plans to appeal the decision.

Ms.Myers' argument was that the no-left-turn sign had no legal effect because it was not bilingual. She could not argue that it violated her rights - she does not speak French. Some services are expensive to provide in more than one language, but in this case, there is a simple alternative:

International no left turn sign

Update: As one reader has pointed out, the languageless solution doesn't work if the ban on left turns is only on certain days of the week, or if other limitations, such as "without permit", must be expressed.

Further update: It turns out that the Toronto left turn sign in question actually was of the international variety; the problem was the use of English abbreviations for the days of the week. Here is a typical Toronto no-left-turn sign:

typical Toronto no left turn sign
Posted by Bill Poser at 01:40 PM

Language quiz

A reader sent in this mp3 as a quiz. If you know what language it is, congratulate yourself and go on with your day.

If not, then you might try to follow Roman Jakobson's (perhaps apocryphal) advice. As I heard the story, he was giving a lecture at Columbia on Eastern European folk epics. He decided to lecture in Bulgarian -- one of the couple of dozen languages that he spoke fluent Russian in. One of the audience members raised a hand and protested, in English, that most of the audience did not know Bulgarian. Jakobson's response, also in English: "You are linguist, no? So listen, and try to understand."

Even if you're not a card-carrying linguist, you might give it a try. In fact, in this case, you might be able to do better than just listening and trying to understand. You could transcribe the passage -- in IPA or in some more rough-and-ready orthography -- and identify the traits that you could use to figure out where this language comes from, given a map of what features are where in the relevant language family.

It's an interesting experience to try to transcribe a language you don't know, though the feeling is different, depending on your degree of familiarity with what you're hearing. It's one kind of challenge where the language is close enough to one you know well that you can understand most of it with careful listening and a bit of thought. For me that might be Glaswegian or Jamaican English -- or the language of this quiz, where I could get about 4/5 of the words. In that case, the main issue is to try to figure out the systematic sound patterns and function-word or other morphological substitutions that relate what you're hearing to what you know.

It's a different kind of experience to try to transcribe recorded speech in a completely unfamiliar language. One approach is to work with a native speaker who can repeat the fluent passages for you slowly, breaking them down into pieces and telling you what they mean. Another approach is just to do your best to write down what you hear, listening for repeated patterns and trying to make some structural sense of the whole thing.

If you're looking for a computer program to help you listen carefully and repeatedly to selected portions, and make time-aligned transcriptions or other notations, take a look at audacity, praat, transcriber and wavesurfer.

I'll identify the language and its source tomorrow.

Posted by Mark Liberman at 05:57 AM

October 19, 2004

More on Kerry's French

As reported earlier, I downloaded the Kerry clip from Slate, reduced the noise level, and listened to it. Having listened to it again after some food, I now concur with the news report that what he said was: "Vous êtes de Haîti? D'accord. Je vais aider les Haîtiens." The problem with the clip is not Kerry's French; it's the dreadful quality of the recording.

To begin with, there is a lot of noise, including the cheering and shouting of the crowd and probably various other sources. Secondly, the recording is badly clipped. That means that the signal level was too high for the analog-to-digital converter, which chopped off the signal when it was above or below the maximum and minimum. This causes distortion. You can see the clipping in the sound pressure waveform below. When the waveform is square against the top and bottom it has been clipped. The highlighted section at the end is the part where Kerry speaks French.

Sound pressure waveform of Kerry clip

A final factor is that the clip was in MP3 format, compressed to 64k bits per second. MP3 compression is lossy, meaning that it removes information from the signal. The amount of distortion this produces depends on the bitrate. So-called "near CD quality" MP3s use a bitrate of 128k bits/second, twice that of the clip. I don't know if the compression had much of an effect - the original recording was evidently pretty bad to begin with - but lossy compression doesn't help.

If you want to listen yourself, here is my noise-reduced version of the relevant section, as linear PCM data in wav file format KerryFrench.wav [635,744 bytes] and compressed using the lossless FLAC method KerryFrench.flac [293,768 bytes]. Of course, using FLAC compression on a file already distorted by MP3 compression doesn't undo the distortion.

Posted by Bill Poser at 10:04 PM

Tout le monde aime le Log

Well, John Kerry's campaign may have been keeping a lid on his command of French, but we've got our own linguistic skeletons in the closet.

For example, Ren and Stimpy's advertisement for the All-New International Log.

ANNOUNCER: Parlez vous Francais? Non? Then you need the All-New International Log. Just tug its twig, and you'll turn your Log into a talking tree fluent in five foreign tongues! There's French!
FRENCHWOMAN: Allez-viens mon coco.
GERMAN GUY: Ras bedeut [?], strudel.
SPANISH GUY: Las cucarachas entran, pero no pueden salir.
SVEN: I am a bearded lady.
ANNOUNCER: And of course New York-ese.
NEW YORKER: 'Ey! Can't you see I'm walking here?
ANNOUNCER: Yes, Log. All nations love Log. So, hurry now to your local store and be the first in your country to have the International Log.

What rolls down stairs
Alone or in pairs...
Rolls over your neighbor's dog?
What's great for a snack
And fits on your back?
It's Log! Log! Log!
It's Lo-og, Lo-og
It's big, it's heavy
It's wood!
It's Lo-og, Lo-og
It's better than bad
It's good!!!

FRENCHWOMAN: Tout le monde aime le Log!
SVEN: Yah. It's really fun.
NEW YORKER: I got your log right here.
Everyone needs a log,
Everyone wants a log,
You're gonna love it: Log.
MR. HORSE: Yes sir, I like it!

I just discovered this, honest.


Posted by Mark Liberman at 08:35 PM

More on Phonetic Gaydar

The recent paper by Pierrehumbert et al. discussed by Mark is interesting because of what it shows about the acoustic basis for the perception of sexual orientation, but it is not the first to show that people can judge a speaker's sexual orientation from his speech. They themselves cite references from 1998 and 2000, but to my knowledge the first study to demonstrate this is a paper published in 1994 by Rudi Gaudio: "Sounding Gay: Pitch Properties in the Speech of Gay and Straight Men," American Speech 69.1.30-57. (This paper is cited by Pierrehumbert et al., but not on this point.) He investigated the hypothesis that openly gay men's speech can be distinguished from that of straight men by means of intonation, especially the "dynamism" of the fundamental frequency contour. He found that listeners could indeed reliably judge whether a speaker was gay or straight by listening to a sample of his speech.

However, he was unable to identify the acoustic basis for this phenomenon. Previous observers had suggested that the relevant factor was the "dynamism" of the fundamental frequency contour. They did not define "dynamism", so he computed a variety of plausible measures of the "dynamism" of the F0 contour. (The manual page and source code for the program he used to compute the measures of dynamism are available here: pflux.) The results he obtained suggested that intonation does play some role, but he did not obtain a clear correlation between measures of F0 dynamism and the sexual orientation of the speaker.

Posted by Bill Poser at 05:36 PM

Kerry's French (or Haitian)

Slate Magazine reports that Kerry addressed a few words of French to a Haitian in the crowd at one of his campaign stops; only what he said in French has not yet been deciphered. They provide a sound clip that can be studied by someone better at spoken French interpretation than I am (Language Log does not provide translation services). What I hear is "Vous êtes d'Haïti? D'accord..." But the rest of it is lost in the noise of the crowd. Perhaps the bit no one has yet been able to understand reveals that Kerry's French is atrocious. That would be great news for Democrats who believe (as some people do) that to be identified as a fluent speaker of an alien tongue, especially French, could only be a vote-loser in a Presidential race as brutal as this one.

[Thanks to Mike Gillis for the reference.]

And this just in, via Grant Barrett: a Canadian news source reports that what Kerry said was: Vous êtes d'Haïti? D'accord, je vais aider les Haïtiens ("You're from Haiti? O.K., I'm going to help the Haitians"). Bill Poser downloaded the clip, reduced the noise level, and studied the track, and a first guess that the last phrase might be Je vais vous aider dedans, changed his mind and said that he agreed with the Canadian source (as he reports here. with some further acoustic details). So the charges that Kerry can speak French may be well founded. In fact the Canadian source says that he speaks several languages ("parle plusieurs langues"). Personally I think that's good. Given the job for which he is an applicant, I'm hoping the several languages include Iraqi Arabic, Farsi (Tehrani dialect), and Korean (northern dialect).

Posted by Geoffrey K. Pullum at 03:26 PM

Phonetic gaydar

In the October 2004 issue of the Journal of the Acoustical Society of America, there's an article by J.B. Pierrehumbert, T. Bent, B. Munson, A.R. Bradlow and J.M. Bailey, entitled "The Influence of Sexual Orientation on Vowel Production" (.pdf).

The abstract:

Vowel production in gay, lesbian, bisexual (GLB), and heterosexual speakers was examined. Differences in the acoustic characteristics of vowels were found as a function of sexual orientation. Lesbian and bisexual women produced less fronted /u/ and /ɑ/ than heterosexual women. Gay men produced a more expanded vowel space than heterosexual men. However, the vowels of GLB speakers were not generally shifted toward vowel patterns typical of the opposite sex. These results are inconsistent with the conjecture that innate biological factors have a broadly feminizing influence on the speech of gay men and a broadly masculinizing influence on the speech of lesbian/bisexual women. They are consistent with the idea that innate biological factors influence GLB speech patterns indirectly by causing selective adoption of certain speech patterns characteristic of the opposite sex.

They don't say anything about the distribution of ages and other characteristics in the 103 "Chicago-area self-identified GLB and heterosexual women and men" they studied ("26 self-identified heterosexual men, 29 self-identified gay men, 16 self-identified heterosexual women, 16 self-identified lesbian women, and 16 self-identified bisexual women"). However, these people were "participating in a broad-based social psychology study of sexual orientation", so presumably that information will come out eventually.

The interesting thing was that listeners were able to get some information about speakers' sexual orientation from neutral laboratory-setting readings of phonetically-balanced reference sentences like "It's easy to tell the depth of a well". The self-identified straight male speakers were given an average rating of 3.2 on a scale of 7 (1=totally straight, 7=totally gay), while the (s-i) gay men were given an average rating of average of 4.6. Among the women, the (s-i) straight female speakers had an average rating of 3.2, while the (s-i) lesbian and bisexual women averaged 4.3.

The paper considers and rejects the hypothesis that the effects are due to an overall scaling of vocal-tract resonances. Adult human males are about 8% larger on average than adult females (in linear dimensions like vocal tract length), and adult male larynxes sit lower in the neck than adult female larynxes do, leading to a overall difference of about 15% in average vocal tract resonance frequences (with a differential effect on front and back vowels). It's possible for people to control overall vocal tract length to some extent -- by retracting and protruding the lips, or by raising and lowering the larynx -- but this is was not what happened in their recordings. Of course, it's logically possible that there might be average anatomical differences as well, but again, this is not what they found.

Instead, the authors postulate that their speakers are using particular socially-evaluated ways of talking to signal aspects of their identities. One specific example they cite is the idea that LB females' further-back /u/ vowels might be connected to an earlier sociolinguistic study which found that "a back variant of /u/ was associated with membership in a group known for its 'tough' stance". The paper hints, but doesn't quite assert, that the (more diffuse) vowel differences between (self-identified) male straight and gay speakers might reflect a difference in precision of articulation, which might have other correlates as well, for example the differences in aspiration that have sometimes been found associated with sexual orientation differences among men. (An overall difference in precision of articulation between males and females is well established, as this paper mentions, and would provide a basis for development of greater precision as a norm or stereotype for gay males.)

It's important to note that the study doesn't (try to) demonstrate, and therefore doesn't claim, that the vowel differences they found are responsible for the perceptual effects that they found.

It's also worth noting, I think, that the study doesn't try to distinguish between empirical norms and stereotypes, nor to look at context effects, either in the actions of the speakers or the reactions of the listeners. Of course, that would require a different sort of experiment. Still, some clues could be gleaned from looking at this experiment's data in other ways. For example, they show us the (mean) values of vowel formants as a function of the speakers' sex and (self-identified) orientation --but they don't show us the data broken out in terms of the listener-identified orientation. Were there particular speakers (of whatever self-identified orientation) that were particularly strongly identified by listeners as straight or gay? If so, what were their formant measurements like? And so on...

Compare (my recent discussion of) the Foulkes et al. study of the perception of children's sex from short spoken stimuli. That experiment was the opposite of this one: Foulkes et al. looked at the relationship between phonetic measurements and listeners' judgments, whereas Pierrehumbert et al. look at the relationship between phonetic measurements and speakers' identities.

In both cases, in my opinion, it would be nice to see more of the data. Journals generally have rather strict constraints on page counts, but in this day and age, there's no reason not to put most of the raw data on the web. In the case of studies like these, there may be some reasons not to publish the speech recordings (due to confidentiality and informed consent issues), but there's no reason not to publish all the raw acoustic-phonetic measurements along with the associated sociosexual metadata, and all the raw listener judgments.


Posted by Mark Liberman at 11:07 AM

You can't always get what you want

A friend once told me about an idiom that nearly ended a relationship. He was northern European, not a native speaker of English, sojourning at a university in the midwest. She was American, reading a map in the passenger's seat of the car he was driving. "OK," she said, "at the next intersection, you want to turn right."

He was furious. Internally, of course. "How does she presume to know what I want?" There were other issues here, but her idiom crystallized his sense of psychic intrusion, and he brooded about it for days.

Since the beginning of linguistic time, we've been mixing up what we want, what we need, what we can get: the balance of power between us and the world, the forces that drive us forward and hold us back. The words for these things seem to circulate through desire, power and time like the Gulf Stream through the Atlantic Ocean.

Prescriptivists insist that English may should be used only for permission, not ability, but the OED tells us that it's cognate with "a Germanic verb meaning 'to be strong or able, to have power'" -- and also with Lithuanian megti 'to like, be fond of', and Latvian megt 'to be used to, to be in the habit of.' Despite the prescriptions for will and shall, American English has bleached all the desire out of will, whose indoeuropean cognates range over meanings that include not only "choose" and "wish", but also "allow", "hope" and "command"; and no varieties of modern English retain much of shall's historical meaning 'to owe, to be guilty' -- though perhaps there's still a bit of guilty debt in should.

As for want, the OED tells us that it used to mean "To be lacking or missing; not to exist; not to be forthcoming; to be deficient in quantity or degree." Think of the nursery rhyme "for want of a nail..." The earliest citations for want meaning "to desire, wish for" are from the 18th century:

1706 E. WARD Wooden World Diss. (1708) 2 All such as want to ride in Post-haste from one World to the other.
1727 A. HAMILTON New Acc. E. Ind. I. v. 52 If either want to be separated during the term limited, there must be a Commutation of Money paid by the separating Party to the other, according as they can agree.
1751 G. LAVINGTON Enthus. Meth. & Papists III. (1754) 127 Cheats mingle the Flower or Seed among the Food of those whom they want to defraud.

The old meaning for want has largely dropped out of use, except for a few expressions:

1865 ‘L. CARROLL’ Alice in Wonderland vii. (1866) 96 ‘Your hair wants cutting’, said the Hatter.

Would it have comforted my Finnish friend to know that his companion was referring, anacronistically, to his requirements rather than his desires? Maybe. This song was current at the time:

You can't always get what you want
You can't always get what you want
You can't always get what you want
But if you try sometimes you just might find
You just might find
You get what you need


Posted by Mark Liberman at 09:00 AM

Time and space again

In connection with my 10/16 post on the language of time and space, Chris at Mixing Memory emailed to draw my attention to an earlier post of his that referenced some actual research on the subject. My favorite part of what I learned is the interaction of spatio-temporal metaphors with the psychology of motivation. Given an ambiguous case ("Wednesday's event has been moved forward two days", which could mean either to Monday or to Friday), people tend to choose so as to make pleasant things come sooner and unpleasant things later. Chris implies that the spatial metaphor of approach and avoidance is playing a role, though I should have thought that wishful thinking would do the trick in a purely temporal realm. And shouldn't there be some types who naturally assume that the bad stuff is coming up quicker?

Meanwhile, (a different) Chris at serendipity adds the observation that "we leaf backwards towards the front of the book (while reading forwards to reach its back cover, or end), whereas we walk forwards towards the front of a crowd or parade."

Posted by Mark Liberman at 07:16 AM

October 18, 2004

How Shall a Thing Be Called?

Mark Isaak has an interesting site entitled Curiosities of Biological Nomenclature that explains how biological organisms are named. It covers the structure of scientific names and the rules that determine which name is regarded as official, when, as is often the case, more than one name is assigned to what turns out to be the same organism. It also gives etymologies for many organisms with interesting names, including those that are palindromes and puns. Check it out.

Posted by Bill Poser at 12:02 AM

October 17, 2004

Falafel, loofah, whatever

Francis Heaney has pointed out that the puzzling reference to falafel in Bill O'Reilly's alleged telephone courtship was certainly a substitution for "loofah". This spoils a number of excellent jokes about falaphilia, taboulehmania, etc. Or maybe it improves them?

Anyhow, whatever its psychodynamics, the substitution was a psycholinguistically normal one: same part of speech, same /l/ and /f/ flanking the stressed syllable, similar eastern-Mediterranean associations.

And the same language of origin. According to the OED and the AHD, loofah (or loofa or luffa) is from Egyptian Arabic lūfah (or perhaps Arabic lūf, singulative of lūfa) referring to a plant of the species Luffa ægyptiaca, whose pod produces a fibrous substance "used as a sponge or flesh-brush"; while falafel (or felafel) is from Arabic falāfil, plural of filfil "pepper", referring of course to falafel, which (for those not living in cities whose streets are lined with falafel-trucks), is "ground spiced chickpeas shaped into balls and fried".

Posted by Mark Liberman at 09:16 PM

Trevor's Law explained.

What's the source for slang about what's in and what's out? Trevor at kaleboel, channeling Bacchus, postulated that it's all about, well, in and out. I was puzzled about what it is, really, and so Trevor invoked the muse again to explain that

Old-style farmers spent huge amounts of time talking about reproduction ... [and] used magical rites to try to regulate the seasons. When people went to live in towns ... creativity slowly began to replace fertility in the public eye ... However, instead of phasing out the gods of biological reproduction, people kept on the sacred harlots, the corn spirit and the divine animal and upgraded them to become models ..., gardening experts and celebrity chefs. ... A corresponding, linguistic sublimation occurred, in which the holy lexicon of sex ... became the even more confusing terminology of style.

Read the whole thing. An amiable hypothesis, but I'm worried about def.

Posted by Mark Liberman at 07:10 PM

When the past piles up

In response to my post on the language of time and space, Trevor of kaleboel sent this quotation from Walter Benjamin:

A Klee painting named ‘Angelus Novus’ shows an angel looking as though he is about to move away from something he is fixedly contemplating. His eyes are staring, his mouth is open, his wings are spread. This is how one pictures the angel of history. His face is turned toward the past. Where we perceive a chain of events, he sees one single catastrophe which keeps piling wreckage and hurls it in front of his feet. The angel would like to stay, awaken the dead, and make whole what has been smashed. But a storm is blowing in from Paradise; it has got caught in his wings with such a violence that the angel can no longer close them. The storm irresistibly propels him into the future to which his back is turned, while the pile of debris before him grows skyward. This storm is what we call progress.

Trevor has also recently uncovered a plan to put Inspector Clouseau in charge of the European Justice System. No doubt Walter Benjamin is having a good laugh over this with Charles Baudelaire and Johann von Goethe, at one of heaven's many bistros.

Posted by Mark Liberman at 11:58 AM

October 16, 2004

Making the World Safe for "Democracy"


Dave Holsinger at Semantickler had a nice post a while back about the Republicans' propensity for referring to the "Democrat Party." Holsinger describes the manueuver a back-formation on the order of aspirate from aspiration. He's right, I think, but thereon hangs a tale.

Back in 1984, William Safire did a column on the "Democrat Party," label, saying:

Who started this and when? Acting on a tip, I wrote to the man who was campaign director of Wendell Willkie's race against Franklin Delano Roosevelt. ''In the Willkie campaign of 1940,'' responded Harold Stassen, ''I emphasized that the party controlled in large measure at that time by Hague in New Jersey, Pendergast in Missouri and Kelly Nash in Chicago should not be called a 'Democratic Party.' It should be called the 'Democrat party.' . . .'' Mr. Stassen, who is only four years older than President Reagan, is remembered as a moderate Republican; his idea is still used by the most partisan members of the G.O.P. Democrats once threatened to retaliate by referring to their opponents as Publicans, but that was jettisoned. Despite the urge to clip, Democratic and Republican the parties remain.

I had always assumed that story was right, but it was written in the age BC (before corpora), and these things are easy to check now. In fact it turns out that Stassen was exaggerating. (Hard to resist saying "Stassen, stop gassin'," though I doubt if many people now will recognize the allusion.)

The fact is that "Democrat Party" was in use well before Willkie's campaign. Hoover used the phrase campaigning against Roosevelt in 1932. And back in 1923, H. Edmund Machold, the Republican Assembly Speaker of NY State, was quoted  as saying:

The people of this State have chosen the Republican Party as the majority party in this House, and the representative of the opposite party, the Democrat Party, for the place of Chief Executive of the State, and have given to him the majority of the other house, the Senate. New York Times, Jan. 4, 1923.

The phrase occurs before then, but it seems to have been regarded more as a rusticism than as a partisan dig. In 1908, a wag used it in a poem accusing Williams Jennings Bryan of being a flip-flopper avant la lettre:

Nothin' at all to say, William; nothin' at all to say;
There ain't no Democrat Party, so go on and have your way.
Fix up th' platform to suit you; put in what planks you may choose;
You've been on all sides of everything, so you've got plenty to use.
The New York Times, July 29, 1908
But to judge from the citations in The New York Times and the Wall Street Journal, the phrase didn't really become a Republican tic until the 1950's or so.  The Republicans of Lincoln's time didn't think to refer to the Democrats as the "Democrat Party." (The 6000-volume Making of America collection contains just one 19th-century use of the term in an American context, in a 1896  reference to Franklin Pierce, but that was reprinted from the Westminster Review, so doesn't mean a lot.)

Why did it take so long for the gag to catch on? Probably because until the early 20th century, opponents wouldn't have seen much partisan advantage in referring to the "Democrat party," not when the word democrat still carried some of its original force . It wouldn't have been heard as a back-formation, but rather as implying "An adherent or advocate of democracy," as the OED defines the word -- not a connection that Republicans would have been eager to stress.

Until the early years of the 20th  century, in fact, it was the Democrats themselves who made that connection. William Jennings Bryan sometimes used Democracy as a  synonym for the party itself, as in an address to a party conclave held in Boston in 1902: 

I recognize... how much fidelity it requires to plead for Democracy in New England. Here in New England a man may be a Democrat with much credit. I am glad your committee called from the South a representative of Southern Democracy. ... Between one who is at heart an aristocrat and one who is at heart a democrat there is a great gulf fixed. The New York Times, July 25, 1902

That leaves us with the question of why this sense of democrat became thin on the ground over the course of the 20th century, to the point where Republicans could speak of the "Democrat Party" in the confidence that no one would associate the word with its original sense. Perhaps it had to do with a redefinition of the relevant oppositions. For Bryan the relevant distinction was between democrats and "aristocrats," the defenders of wealth and privilege; as he put it in his famous "Cross of Gold" speech:

The question we are to decide is, upon which side will the Democratic party fight -- upon the side of "the idle holders of idle capital," or upon the side of "the struggling masses"? ...The sympathies of the Democratic party, as shown by the platform, are on the side of the struggling masses who have ever been the foundation of the Democratic party. There are two ideas of government. There are those who believe that if you will only legislate to make the well-to-do prosperous their prosperity will leak through on those below. The Democratic idea, however, has been that if you legislate to make the masses prosperous their prosperity will find its way up through every class which rests upon them.

By mid-century, though, "small-d" democrat implies only a commitment to electoral democracy as opposed to other systems of government, and once the Cold War sets in, the word isn't much used in the American political context. A case-sensitive search on democrat in Nexis major papers between 1978 and 1980 turns up 169 hits, almost all of them either referring to foreign political contexts. The remainder either refer  to American historical contexts (as in a description of Benjamin West), or involve an explicit  contrast with a term like fascist ("they can't tell whether I'm an overaged hippie or a fascist in democrat's clothing," a high-school principal says).

Not surprisingly, the loss of the earlier meaning of democrat has  taken the phrase "economic democracy" over the side along with it, as witness its generally declining frequency in New York Times stories:

Period        Hits
1930-39      110
1940-49      162
1950-59      64
1960-69      37
1970-79      40
1980-89      54
1990-99      14

So the same shift that made the world safe for "Democrat  Party" made it safe for "democracy."

Posted by Geoff Nunberg at 03:32 PM

Fear North Dakota

The Oct. 10 edition of NBC's Meet the Press was partly devoted to a sort of debate between the Colorado Senate candidates Ken Salazar and Pete Coors, moderated by Tim Russert. At one point, Coors got a little tangled up and said "North Dakota" when he meant North Korea:

MR. RUSSERT:  But if you knew there were no weapons of mass destruction, the president is saying very clearly, "Even though there are no weapons of mass destruction, knowing what I know today, I still made the right decision to go to war."  Do you agree with that?

MR. COORS:  This is a war on terror.  And this is a--we can say "weapons of mass destruction," "no weapons of mass destruction"; clearly, we should be more worried today, actually, about Iran and North Dakota than we are--North Korea than we are about Iraq, based on weapons of mass destruction.  But I think that the conditions change on an ongoing basis, and we must look at the facts that we have before us at the time we make a decision.

Coors got a certain amount of ribbing about this. M.E. Sprengelmeyer's story in the Rocky Mountain News started out

Put down your accordions and man your battle stations.

For at least a split second Sunday, the home state of Lawrence Welk was declared part of the "axis of evil."

Coors' mistake is consistent with several of the patterns that psycholinguists generally see in slips of the tongue. The substitution was a single word -- though other units like phrases, syllables and morphemes can be substituted, the commonest slips involve either single words (like "Iran and North Dakota" for "Iran and North Korea") or single phonemes (like "you have tasted the whole worm" for "you have wasted the whole term"). The substitution preserved syntactic category, and the substituted word was prosodically similar to the target -- both are three-syllable words with second-syllable stress. This didn't involve the intrusion of another word from the same discourse (like "balls on base" for "base on balls") -- it was a "non-contextual substitution" in the psycholinguistic jargon -- and as usual in such cases, the subsituted word is similar to the target in meaning and in collocational properties.

However, this case does not seem to confirm the best-known psychological theory about misspeaking. There's no obvious sense in which saying Dakota instead of Korea in this debate revealed anything about Pete Coors' unconscious fears or desires. In other words, it doesn't seem to have been a Freudian slip.

Of course, Freud's analysis of Freudian slips was generally anything but obvious. In chapter one of The Psychopathology of Everyday Life, he devoted 1,200 words and a diagram to explaining why he himself once had trouble retrieving the name of the painter Signorelli. His explanation involved concerns about sexual disfunction among the Turks of Bosnia, and a message that he had gotten a few weeks earlier while staying in the town of Trafoi, and -- well, read it for yourself, I've reproduced it at the end of this post. So maybe Pete Coors substituted Dakota for Korea because he'd been thinking about bacterial growth in sanitary napkins, and was worried about some bad polling trends that he learned about while campaigning in Cripple Creek. Or something else equally obscure and random. But I doubt it.

A half-century of research into slips of the tongue suggests that Freud's attempt to provide them with unconscious motivations was at best unnecessary. We screw up in speaking because speaking is incredibly hard. Our poor overloaded frontal lobes are trying to select packages of multi-modal associations from the other end of the cortex at a rate of three or four per second, arrange them in complex patterns, and use them to coordinate the multi-dimensional wiggling of our eating and breathing apparatus so as to modulate sound waves in a way that will cause some mostly-unknown fellow humans to experience analogous patterns of structured associations, and consequently modify their mental state in ways advantageous to us. When it comes to talking, our unconscious fears and desires are the least of our problems.

[Some more discussion of this point, with links, can be found in the notes for lecture 17 of my intro linguistics course at Penn].

[From Psychopathology of Everyday Life (1901), chapter one]:

I vainly strove to recall the name of the master who made the imposing frescoes of the "Last Judgment" in the dome of Orvieto. Instead of the lost name -- Signorelli -- two other names of artists -- Botticelli and Boltraffio -- obtruded themselves, names which my judgment immediately and definitely rejected as being incorrect. When the correct name was imparted to me by an outsider I recognized it at once without any hesitation. The examination of the influence and association paths which caused the displacement from Signorelli to Botticelli and Boltraffio led to the following results:--

(a) The reason for the escape of the name Signorelli is neither to be sought in the strangeness in itself of this name nor in the psychologic character of the connection in which it was inserted. The forgotten name was just as familiar to me as one of the substitutive names -- Botticelli -- and somewhat more familiar than the other substitute -- Boltraffio -- of the possessor of which I could hardly say more than that he belonged to the Milanese School. The connection, too, in which the forgetting of the name took place appeared to me harmless, and led to no further explanation. I journeyed by carriage with a stranger from Ragusa, Dalmatia, to a station in Herzegovina. Our conversation drifted to travelling in Italy, and I asked my companion whether he had been in Orvieto and had seen there the famous frescoes of --

(b) The forgetting of the name could not be explained until after I had recalled the theme discussed immediately before this conversation. This forgetting then made itself known as a disturbance of the newly emerging theme caused by the theme preceding it. In brief, before I asked my travelling companion if he had been in Orvieto we had been discussing the customs of the Turks living in Bosnia and Herzegovina. I had related what I heard from a colleague who was practising medicine among them, namely, that they show full confidence in the physician and complete submission to fate. When one is compelled to inform them that there is no help for the patient, they answer: "Sir (Herr), what can I say? I know that if he could be saved you would save him." In these sentences alone we can find the words and names: Bosnia, Herzegovina, and Herr (sir), which may be inserted in an association series between Signorelli, Botticelli, and Boltraffio.

(c) I assume that the stream of thoughts concerning the customs of the Turks in Bosnia, etc., was able to disturb the next thought, because I withdrew my attention from it before it came to an end. For I recalled that I wished to relate a second anecdote which was next to the first in my memory. These Turks value the sexual pleasure above all else, and at sexual disturbances merge into an utter despair which strangely contrasts with their resignation at the peril of losing their lives. One of my colleague's patients once told him: "For you know, sir (Herr), if that ceases, life no longer has any charm."

I refrained from imparting this characteristic feature because I did not wish to touch upon such a delicate theme in conversation with a stranger. But I went still further; I also deflected my attention from the continuation of the thought which might have associated itself in me with the theme "Death and Sexuality." I was at that time under the after-effects of a message which I had received a few weeks before, during a brief sojourn in Trafoi. A patient on whom I had spent much effort had ended his life on account of an incurable sexual disturbance. I know positively that this sad event, and everything connected with it, did not come to my conscious recollection on that trip in Herzegovina. However, the agreement between Trafoi and Boltraffio forces me to assume that this reminiscence was at that time brought to activity despite all the intentional deviation of my attention.

(d) I can no longer conceive the forgetting of the name Signorelli as an accidental occurrence. I must recognize in this process the influence of a motive. There were motives which actuated the interruption in the communication of my thoughts (concerning the customs of the Turks, etc.), and which later influenced me to exclude from my consciousness the thought connected with them, and which might have led to the message concerning the incident in [p. 8] Trafoi -- that is, I wanted to forget something, I repressed something. To be sure, I wished to forget something other than the name of the master of Orvieto; but this other thought brought about an associative connection between itself and this name, so that my act of volition missed the aim, and I forgot the one against my will, while I intentionally wished to forget the other. The disinclination to recall directed itself against the one content; the inability to remember appeared in another. The case would have been obviously simpler if this disinclination and the inability to remember had concerned the same content. The substitutive names no longer seem so thoroughly justified as they were before this explanation. They remind me (after the form of a compromise) as much of what I wished to forget as of what I wished to remember, and show me that my object to forget something was neither a perfect success nor a failure.

(e) The nature of the association formed between the lost name and the repressed theme (death and sexuality, etc.), containing the names of Bosnia, Herzegovina, and Trafoi, is also very strange. In the scheme inserted here, which originally appeared in 1898, an attempt is made to graphically represent these associations.

The name Signorelli was thus divided into two parts. One pair of syllables (elli) returned [p. 9]

unchanged in one of the substitutions, while the other had gained, through the translation of signor (sir, Herr), many and diverse relations to the name contained in the repressed theme, but was lost through it in the reproduction. Its substitution was formed in a way to suggest that a displacement took place along the same associations -- "Herzegovina and Bosnia" -- regardless of the sense and acoustic demarcation. The names were therefore treated in this process like the written pictures of a sentence which is to be transformed into a picture-puzzle (rebus). No information was given to consciousness concerning the whole process, which, instead of the name Signorelli, was thus changed to the substitutive names. At first sight no relation is apparent between the theme that contained the name Signorelli and the repressed one which immediately preceded it.

Perhaps it is not superfluous to remark that the given explanation does not contradict the conditions of memory reproduction and forgetting assumed by other psychologists, which they seek in certain relations and dispositions. Only in certain cases have we added another motive to the factors long recognized as causative in forgetting names, and have thus laid bare the mechanism of faulty memory. The assumed dispositions are indispensable also in our case, in order to make it possible for the repressed element to associatively gain control over the desired name and take it along into the repression. Perhaps this would not have occurred in another name having more favourable conditions of reproduction. For it is quite probable that a suppressed element continually strives to assert itself in some other way, but attains this success only where it meets with suitable conditions. At other times the suppression succeeds without disturbance of function, or, as we may justly say, without symptoms.


Posted by Mark Liberman at 11:20 AM

Time is space: when fronter is farther behind

When I say "back in August", you know that I mean two months in the past, not ten months in the future. The past lies behind us, the future is ahead. We're striding through history, or maybe riding a runaway railway car, but anyhow we're facing forward. Um, ahead. That is, future time. It's obvious, right?

But if we were speaking one of the West African languages that I've studied, it would be equally clear that the equivalent of "back in August" would be a reference to the year 2005. After all, the past is relatively clear to us, so it must be in front, where we can see it, right? And the future is sneaking up on us, invisible, from behind. Obvious. To emphasize the point, I might illustrate my words with hand gestures. Talking about our grandchildren's time, I'd sweep my hand backwards, like throwing salt over my shoulder. At least, that's how it was all explained to me.

Before you start thinking about attitudes towards progress and agency, you should remember that the ancient Greeks and Romans seem to have agreed with the Africans. Thus Latin post means "Of place, behind, back, backwards", according to Lewis and Short, but "Of time, afterwards, after". We still use some Latin expressions involving these meanings: an "ante bellum house" is a before-the-war house; a "post hoc" explanation is one that's concocted after the fact. And some of the corresponding English words derive from the same metaphor: "before" is almost exclusively used for (earlier) time now, but it's original meaning was spatial "in front of".

I suspect that the temporal applications of ante and post to time came originally from thinking about a line of march of a group, not the visual field of an individual. If you're watching a parade, you see someone in the front of the procession before you see someone who is behind them. So "in front of " is "earlier than", and "in back of" is "later than". The same thing is true if you're marching in a column yourself. The people who are in front of you get to places earlier than you do, so again "in front of" means "earlier than". And it's possible that the apparent African "past is ahead of us" metaphors have a similar origin, I don't know.

Anyhow, even when we stick to the language of space in a single culture, people disagree about how to translate among frames of reference, as we've learned here recently in discussing route nomenclature. And Neal Whitman at Literal Minded has recently written about individual differences in talking about grocery carts: which end is front? Well, as he points out, it depends on whether you think of a grocery cart as being like a car or like a refrigerator.

When you combine the description of time and space, and move across cultures and over history, things get really mixed up. Consider the etymology of English after, according to the OED:

Orig. a compar. form of af, ... with compar. suffix -ter, THER = ‘farther off, at a greater distance from the front, or from a point in front’; and hence in the Teutonic languages ‘more to the rear, behind, later.’

Is that perfectly clear? I've often thought that Jacques Derrida missed his calling -- he really should have taken up the history of word meanings.

Next, an even greater source of confusion: desire is necessity (and what you want is what I suggest).


Posted by Mark Liberman at 08:24 AM

October 15, 2004

Ceci n'est pas un Bushism

Since I dedicated so much time picking apart Kerry's pronunciation of paraplegic and quadraplegic in the second debate, I thought I'd fairly balance things out by picking apart something Bush said during the third debate last night.

Note: this is not (necessarily) a Bushism. If you've come for that sort of thing, this is the wrong place. This is Language Log, after all.

I have not yet received my complimentary copy of The Cambridge Grammar -- I had thought it was standard issue to all recent Linguistics PhD recipients -- and, keep in mind, I'm not a syntactician (but I have been known to play one in the past). So I may have much of this wrong, and I welcome corrections of my premises (and reasoning).

Right at the beginning of his closing statement, Bush found himself in the (I think very unpleasant) position of having to form a notoriously difficult type of noun phrase (NP). The ordered contents of this kind of NP are [ Det N1 of N2 and pro ], where Det is a determiner (the, a, etc.), N1 and N2 are nouns, and pro is a pronoun.

In the Oval Office, there's a painting by
named -- by Tom Lee.

What's difficult about this type of NP is that (I think) most of us find it difficult to judge what the correct form of -- not to mention order between -- N2 and pro should be. We have no problem with identical NPs without pro, and no problem with identical NPs without N2 (Asterisks indicate ungrammaticality; your judgments may vary):

  1. a friend of Laura's / *a friend of Laura
  2. a friend of mine / *a friend of me

UPDATE 1, 10/15/2004, 10:00 PDT

Mark writes to point out that "some people might object to" my judgment on a friend of Laura:

I guess I share the judgment that with a single first name, and out of context, "friend of Laura's" is preferred. But remember "friend of Bill", abbreviated as "FOB" during the Clinton years?

Good point. I agree. Mark continues:

Because of the way that Google works, it's hard to get reliable counts, but checking a couple of pages by hand I get
"friend of Biff" 8
"friend of Biff's" 11
(and then other things like "friend of Biff Henderson's"...)
suggesting that the preference is a marginal one at best, far from meriting a star on the less-favored outcome.

And with full names, I think things go the other way:

Google gives us
"friend of Abraham Lincoln" 651 (- 40 = 611)
"friend of Abraham Lincoln's" 40

"friend of Thomas Jefferson" 452 (- 10 = 442)
"friend of Thomas Jefferson's" 10

See also UPDATE 2 just below.

Put the two together, though, and they all sound neither good nor bad, just odd in a strange, unfixable way. (The question mark on the last one is because I find it more strange than the rest -- but I don't know why.)

  1. a friend of Laura and mine
  2. a friend of Laura's and mine
  3. a friend of Laura and me
  4. ? a friend of Laura's and me

UPDATE 2, 10/15/2004, 10:00 PDT

Mark's message (cited above) concludes:

This doesn't detract from your observation that the construction
"friend of Laura and mine"
mixed types in the conjunction.

I didn't (explicitly) note this observation, but then again, I was really tired when I posted this. Looking back at the NPs above, I now see that (1) -- the wording Bush used -- is just as strange to me as (4). So, the above should read:

  1. ? a friend of Laura and mine
  2. a friend of Laura's and mine
  3. a friend of Laura and me
  4. ? a friend of Laura's and me

This is in line with Mark's observation (which he kindly credits to me).

Firm judgments about the form of pronouns are elusive in all kinds of conjoined NPs -- I know that Joe Emonds once wrote about this ("Grammatically deviant prestige constructions", in A Festschrift for Sol Saporta, M. Brame, H. Contreras, and F. Newmeyer (eds.), Noit Amrofer Press, 1986). Steven Pinker also writes about it in The Language Instinct (citing the Emonds paper).

OK, if this is not intended to be a Bushism-of-the-debate or something like that, why am I bringing it up? Only to suggest that this is a piece of evidence that Bush probably wasn't as well-prepared for this closing statement as he has been (accused of being) for the other two. The complete lameness of the painting reference, the stumbling for the completely irrelevant name of his and Laura's great friend the painter, the awkward NP -- all in the very first sentence of his closing statement? Why? Are Rove, Cheney et al. trying to deflect the charge that Bush was wired?

OK, the awkward NP thing is too subtle, I admit -- unless: they've noticed the influence of blogging in the media, and in the hopes that Language Log would bring this up ...

Nah. Too complicated, and it's late. Besides, W gave us the answer himself, earlier in the debate:

I tell the people on the campaign trail, when I asked Laura to marry me, she said, "Fine, just so long as I never have to give a speech." I said, "OK, you've got a deal." Fortunately, she didn't hold me to that deal. And she's out campaigning along with our girls. And she speaks English a lot better than I do. I think people understand what she's saying.

[ Comments? ]

Posted by Eric Bakovic at 03:03 AM

October 14, 2004

Is parapalegic like nucular?

Eric Bakovic and Bob Kennedy consider this question at length over at Phonoloblog. Warning: class V phonological passages; check your helmet and flotation devices.

Posted by Mark Liberman at 01:41 PM

Putting the buggy before the horse

According to the official transcript of last night's presidential debate, George W. Bush at one point said:

Thirdly, one of the reasons why there's still high cost in medicine is because this is -- they don't use any information technology. It's like if you looked at the -- it's the equivalent of the buggy and horse days, compared to other industries here in America.

There's a substantive point here, as well as a linguistic one. Anyone who's had any contact with the American health care system recently will be surprised to be told that "they don't use any information technology" -- as the president said in another context last night, "it's kind of one of those exaggerations." I'm sure that costs could be reduced by better IT systems for dealing with patient records and the like, but is medicine really "the equivalent of the buggy and horse days, compared to other industries"? Maybe so, I don't know. What I do know, though, is that the usual idiom is "horse and buggy". And despite what he said, I'm pretty sure that George W. Bush knows that too.

Google tells the statistical tale: "horse and buggy" has 61,900 hits, while "buggy and horse" has just 470: 132 to 1. In fact, the whole expression "horse and buggy days" has 5,230 hits, to just 1 for "buggy and horse": more than 5,000 to 1!

There's nothing incoherent about the expression "buggy and horse days", especially if you're speaking literally, as the author of the phrase's one previous citation was:

(link) I worried about mother. It was buggy and horse days, so it took more time to go and see her.

Still, "horse and buggy" is a fixed expression, a collocation, an idiom, and "buggy and horse" is not. This is not an isolated fact -- there are patterns to the typical order of words in conjoined expressions, and the key influence in this case was first described by the Indian grammarian Paṇini about 2,500 years ago -- but that's another story. Whatever the explanation, the preference for the idiomatic order "horse and buggy" is an indisputable fact of the English language as we use it now.

Why do I think that President Bush knows this, even though he said "buggy and horse"?

Well, listen to the sound clip. Here's a more careful transcript of this passage:

uh thirdly [pause 0.339]
uh one of the reasons why there's [pause 0.365]
still high costs in- [pause 0.235]
in medicine is because [pause 0.750]
this is the- they- they- they don't use any information technology, it's like if you looked at the- [pause 0.287]
it's the equivalent of the- [pause 0.790]
of the buggy [pause 0.526]
and horse days [pause 1.364]
compared to other industries here in America, and so we've got to introduce high technology into health care, we're beginning to do it, we're changing the language, we want there to be
[pause 0.542] um [pause 0.475] {total 1.195}
electronic medical records to cut down on error as well as to reduce costs. People tell me that [pause 0.457]
when- when the uh health care field is fully [pause 1.774]
integrated with information technology it'll wring some twenty percent of the costs out of the system

Bush starts out disfluently here. In the span of about 12.3 seconds from the start of "uh thirdly" to the start of "of the buggy" -- 33 words in the official transcript -- he racks up two uhs, six repetitions or self-corrections, and five silent pauses in inappropriate places. I'm sure that he knows what he wants to say, but he's having some trouble putting it into words.

(Parenthetically, before you take this as evidence for some sort of linguistic disability, compare it to Kerry's equally disfluent answer to another medical question, discussed here, which includes a truly spectacular 7-second lexical access failure. Public speaking is hard, especially under the kind of pressure that these two men face.)

Anyhow, the president is in a certain amount of linguistic difficulty from the start of this passage. He gets as far as "it's the equivalent of the". Then he pauses for almost 4/5 of a second, he repeats "of the", and out pops the "buggy" part. It's clear that he's having trouble retrieving the idiom, and "buggy" gets over threshold first, so he uses it. But he knows it's not the right thing, because he pauses again for more than a half a second before "and horse days", and then pauses again for almost 1.4 seconds before completing the phrase "compared to other industries here in America". By now the rest of the mental file card is well activated, so he presses rapidly through several clauses without stopping.

There's just one point that still puzzles me. What did the president mean by saying that "we're changing the language"? I don't understand what that has to do with cutting costs by introducing high technology into the health care industry. I doubt that he meant replacing COBOL with java. And I'm not interested in any snarky remarks about replacing English with Bushian.

My first thought was that a fragment of another sound bite got accidentally disinhibited enough to slip into the flow of his answer. "We're changing the language" sounds like a slogan of some kind, though not one about health care automation. However, searching the web did not turn up any evidence that this is a current White House slogan in any field at all. If you know the answer, email me (myl at cis.upenn.edu) and I'll tell the world.

[I'll indulge myself in one non-linguistic point here. As a part-time computer scientist and full-time taxpayer, I'd be happy to believe that better automation of patient records would save 20% of health care costs -- but can this really be true? If it were true, I'd expect some biomedical Wal-Mart to be out there cleaning up. I haven't seen this point discussed on any of the "fact check" sites.

More important, why is a conservative Republican president talking about introducing high technology into health care as if it were the responsibility of the executive branch of the Federal government? ("...we're beginning to do it...") Is he proposing socialized medical automation on top of a largely private-sector health industry? Given the recent IT record of the FBI, the FAA, the social security administration, etc., this would be a dismaying proposal if it were seriously meant. ]


Posted by Mark Liberman at 11:59 AM

October 13, 2004

Twenty greatest equations -- minus three?

Robert P. Crease discusses the results of an unusual popularity contest: the "twenty greatest equations ever".

I feel some affinity with this strange quest, since a few years ago I tried to persuade my colleagues that "The Ten Greatest Equations of All Time" would be a good organizing principle for a general-education course in mathematics and its applications.

However, three of my personal top-ten list are missing from Crease's top twenty: Shannon entropy, Bayes' Theorem and Euler's generalization of Fermat's little theorem. Entropy is arguably the most important new concept of the twentieth century*; Bayes' theorem is fundamental to most statistical pattern recognition, whether by computers or by animals, and Euler's totient theorem is the basis of most current public-key cryptography. Also, all three of these are easy to understand, both intrinsically and in their applications.

Crease's poll had a pretty small N -- he got 120 proposals, with the top of his list getting 20 votes and the bottom just 2. That makes it all the stranger that my three were missing. I guess it's because he was surveying mainly physicists rather than psychologists or engineers, but still...

There are a lot of other great equations that are not on that list. But it's just wrong to leave those three out. I guess Fermat's little theorem isn't as fundamental as the other two, except that it allows you to bring up the question of whether P = NP and related stuff.

Are there any equations that come out of linguistics that should be included in my hypothetical course? Well, Shannon entropy is all about the information content of messages, and so it belongs to linguistics as the field should properly be defined. One other candidate would be Zipf's Law (~ Pareto's Law, Benford's Law, etc.). Of course, many of the other Great Equations have obvious linguistic applications, though I haven't been able to come with any plausible ways to bring E=mc2 to bear.

*OK, Bolzmann wrote in 1877, and the Bolzmann equation S = k ln W is one of Crease's top twenty, but the implications of Boltzmann's work were not worked out until the 20th century, and Shannon's form of the entropy equation is more general and more important...

[via Ray Girvan at the Apothecary's Drawer weblog.]


Posted by Mark Liberman at 12:00 PM

There will always be an England

At least, as long as government ministers talk like this:

The core of Derrida's thinking is that every text contains multiple meanings. To read is neither to know nor to understand, but to begin a process of exploration that is essential to comprehend oneself and society. This is, however, the sort of pretentious bullshit language a minister for Europe can only use when speaking French.

[Denis MacShane, Minister for Europe]

I suppose that "MacShane" is a Scottish name, so make that "U.K." rather than "England". From across the Atlantic, these differences don't look as large as they should. Anyhow, I haven't seen a quote like this recently from a U.S. government official of similar rank. But maybe there's something equally British about this reaction:

Who? I don't know who you are talking about? I'm in a meeting with a group of City luminaries and none of them has heard of him. I can Google him for you if you are having difficulties.

[Ivan Massow, former chairman of the Institute of Contemporary Arts]

Stereotypes in action...

[Update: Des von Bladet emails invaluable context:

MacShane is one of my favourite politicians, and (not-coincidentally) a stalwart enthusiast of the Yoorpean Onion who speaks (by his own account) good French, German, Spanish and Italian, but he wasn't especially born MacShane:


His father was a Polish officer who fought the Germans in 1939, was wounded and then got to England where he became a commando. Denis MacShane was actually born Denis Matyjaszek and kept that name "without problems" through school and Oxford University.

"It was the BBC which gently but firmly suggested that such a name might not be 'readily pronounced' in the Midlands when I joined as a trainee in Birmingham," he said.

The BBC was like that then. "I chose my mother's maiden name instead. Her family is originally from Donegal in north-east Ireland and settled in Glasgow where I was born."

UKish is a better bet than English, for sure, (English is a folk-category with no standing in law that I'm aware of, so I would not even put these in the same category given that there's an 'R' in the month) but to claim that there will always be a UK implies some claims about the future of Northern Ireland, in particular, that would raise eyebrows in some circles.


folk-classifies himself as European

Aha. As our president said, "You forgot Poland!" (Here by "you" I mean "me".) ]

[More musings on Derrida here and here. ]

Posted by Mark Liberman at 09:08 AM

Getting kids (and politicians) wrong

Yesterday I heard a fascinating talk by Paul Foulkes, of the University of York, on the development of local dialect features in the speech of children in Newcastle, England. He reported a large number of interesting results that I won't discuss in this post. Instead, I want to speculate about one tantalizing fact that that he mentioned in passing: when he asked a large number of adults, both local and non-local, to guess the sex of two-to-four-year-old children by listening to short utterances, about two thirds of the judgments were wrong.

Here's the background. Some characteristic features of the local Newcastle dialect are gender-associated in young adults: glottalized variants of medial /p/, /t/, /k/ are found more in males than females, while pre-aspirated variants of final /p/, /t/, /k/ are commoner in females than in males. This differentiation begins to develop in children from about the age of three. Part of the reason is that mothers' speech to their children is differentiated in the appropriate way by sex.

Foulkes and his colleagues wanted to look at the perception of their features, by local and non-local listeners. Since it's hard -- maybe impossible -- to tell the sex of children at the age of 2-4 from their voice alone, they figured that these gender-associated dialect features might be used by local listeners. And they were.

The experiment was designed using a selection of naturally-occurring isolated-word stimuli, recorded during the original studies of the development of this structured variation. So the stimuli were not well controlled, as Paul was at pains to explain. That is, half of them overall came from girls and half from boys; the stimuli with medial glottalized stops were likewise half from girls and half from boys; the stimuli with pre-aspirated final stops were ditto; and so on; but the interaction of these features with speech rate, loudness, pitch range, and other stereotypically gender-associated features was not controlled. It was not exactly random, either -- it was just whatever it happened to be in the real-life recordings they had made.

It's probably because of this complex and uncontrolled variation that the results of the experiment were somewhat messy. The effect they were looking for -- evidence that the local listeners were sensitive to the gender-linked features of the local dialect -- could be seen in the data, but it was a small effect and a somewhat erratic one.

The thing that most interested me, though, was something that Paul didn't focus on his talk, because it wasn't part of the design or the planned interpretation of the experiment in question. He hadn't analyzed the effect in the experimental data, didn't have the exact figure on his slides, and only mentioned it in response to a question. So I don't want to put him on the spot with respect to any claims about this result -- for now let's just say that it's a speculation on my part about a result that might be true. If it turns out (after Paul checks) that I've gotten it wrong, I'll post a correction. However, effects of this general type do happen, and it's interesting to see why.

The way this experiment was designed, if the subjects had guessed "boy" and "girl" without any information from the stimuli, they would have been right half the time. A subject could have guessed "boy" all the time, or "girl" all the time, or "boy" and "girl" alternately, or "boy" and "girl" randomly, or "boy" through the first half of the experiment and "girl" through the second half. It doesn't matter -- they'd still be right half the time. (I think -- I didn't check for the relevant experimental design issues).

So if the subjects were wrong about 2/3 of the time overall, that means that they were getting some useful information from the stimuli. It was just wrong information, information that led them to a false conclusion. What could this have been? Paul did some multiple regression analyses that showed that subjects' judgments were being influenced by features like amplitude, pitch, speech rate and voice quality. Basically (as I recall) his subjects showed the effects of the conventional stereotypes that girls' speech is softer, slower, higher in pitch, and more breathy-voiced, while boys' speech is louder, faster, lower in pitch, and more creaky-voiced. These stereotypes are not random: they correspond to an image of girls as more polite and controlled, and boys as more, well, boisterous. (Except for the pitch business, which is complicated but probably reflects an influence of adult norms along with a confounding influence of the pitch-raising effects of greater loudness and vocal effort).

Now at least for some of these features, these stereotypes are known to be wrong. For example, girls talk faster than boys do, on average, according to the experimental results I've seen cited. In the particular stimuli used in Paul's experiment, it might well happen to be true that several salient features were distributed in the anti-stereotypical way.

Notice that it's not enough for the stereotyped features to be distributed randomly. Suppose that girls and boys are equally likely to be loud or soft, but listeners think that "loud" means "boy" and "soft" means "girl", and so whenever in the experiment they hear a recording of a kid yelling, they say "boy", and whenever they hear a recording of a kid speaking softly, they say "girl". The result will still be that they're right half the time.

For them to be wrong more often than right, the basis of their response has to be worse than wrong in the sense of unconnected with the facts. It actually has to be the opposite of the facts, to some extent. There's an interesting story to be told about why humans -- who are usually such accurate statistical learners -- can (all too often) develop shared perceptual associations that are so significantly at variance with the facts. (Positive feedback due to greater salience of stereotypical features will get you random beliefs, but sometimes there's something more...)

The point that I want to make here is that stereotypes have a powerful influence on the perception of the "subtle style cues" in speech that so much of current political journalism is focusing on. These stereotypes can be associated with groups (like Southerners or New Englanders), or with individuals (like George W. Bush or John Kerry). And despite their powerful influence on perception, they can be quite wrong.


Posted by Mark Liberman at 08:22 AM

October 12, 2004

Policy vs. "character" -- and its linguistic correlates

This evening, Kevin Drum's Political Animal weblog has the clearest short statement that I've seen about what's wrong with political journalism in America:

POLICY vs. CHARACTER....Jon Chait has an interesting piece in The New Republic today. Ostensibly it's about flip-flopping (bottom line: Kerry does it less than you think, Bush does it more), but about halfway through he switches to media criticism:

One of the curiosities of political journalism is that reporters tend to be assiduously even-handed about matters of policy (which can revolve around disputes over objective fact) but ruthlessly judgmental on questions of character (which are inherently subjective). In fact, most reporters don't know or care much about policy. They see politics primarily through the lens of the candidates' personal traits.

Now, this is hardly an original observation — in fact, it's downright pedestrian to anyone who reads Somerby regularly — but Chait does a good job of teasing it out and explaining how an increasing fixation on "character" has made political life almost impossible for moderates from either party — but especially for Democrats.

As the 9/26 article by Alex Williams in the NYT Fashion and Style section explains, we're talking about personal traits whose objective correlates are often things like "gesture, posture, syntax and tone of voice". It might be true, as Williams claims, that "subtle style cues" like these "account for as much as 75 percent of a viewer's judgment about the electability of a candidate". Perhaps it's always been that way, I don't know. It certainly seems to me that this stuff makes up a larger fraction of journalists' campaign coverage than it used to.

Ironically, most of these "subtle style cues" could actually be investigated empirically and objectively, if they really mattered. I guess you could make a case that they do indeed matter to some extent. In predicting how someone will act in the future, your estimate of their character has to be mixed in with your opinion of their promised policies and their past record, and your estimate of their character has to depend to some extent on these subtle cues. But in this area, as Chait, Drum and Somerby explain, journalists these days mostly work from conclusion to evidence. They decide what they think each candidate's character issues are, positive or (mostly) negative, and then they pick examples of "subtle style cues" to suit -- or they just invent things.

My own opinion about this stuff is that it would probably be better if journalists focused on candidates' statements, records and promises, along with the relevant factual background, leaving evaluation of the candidates' characters up to individual voters. But that's not the world we live in. Instead, we've got a media system with powerful positive feedback loops that rapidly create shared stereotypes, fed by clever political operatives (and journalists!) busily drawing caricatures with vivid and telling anecdotes that are often out of context, misinterpreted or outright false.

In this situation, linguists' insights can certainly be helpful to political operatives and other campaign junkies. I'm not as confident that we can help ordinary voters in a positive way. But when we can clear away some of the trash obscuring the issues (and also obscuring the candidates' characters!), we should try to do so.


Posted by Mark Liberman at 09:45 PM

How to Decide Who to Vote For

Public Service Announcement: If you're still not sure how to cast your votes on Nov. 2, some good places to start looking for information might be the website for the League of Women Voters, Public Agenda's First Choice 2004, or Project Vote Smart.

Some of my colleagues here on Language Log have made a cottage industry of pointing out the errors of critics of George Bush who pick on his alleged linguistic ineptness as evidence of his stupidity and/or ignorance. What I find more irritating than the questionable evidence for his ineptness is the idea that it is reasonable to judge presidential candidates by their linguistic skills. Its true that we make judgments on this basis, but for the most part we make such judgments when we don't have much else to go on and we recognize that they aren't terribly reliable for most purposes.

In the case at hand, we have a tremendous amount of much more useful information. The candidates have made many statements about their policies. In addition, we have considerable information about their performance in prevous political offices and military service. It shouldn't be at all difficult to decide between them on this basis. They have different policies on foreign affairs, the economy, civil liberties, the environment, and many other issues. They have different world views and moral values. These are the bases on which rational people should decide. I wouldn't be a linguist if I didn't think that language is interesting, but that superficial judgments of linguistic ability should enter into one's decision as to who to vote for is the height of silliness. Pace linguistics geeks and Jacques Derrida, language isn't everything.

Posted by Bill Poser at 07:38 PM

Beware linguistic and political stereotypes

Dick Hamming always used to say that you should beware of finding what you're looking for. He was talking about the dangers of accepting a mathematical model of natural phenomena just because you could adjust its parameters to fit a few experimental observations. However, his warning applies even more strongly to vaguer sorts of theories, especially those that are sanctioned as socially-accepted stereotypes. In this case, the "facts" that fit are often selected from among many that don't fit, and then exaggerated to boot. Sometimes things are just plain made up.

In today's Register, Thomas C. Greene has an opinion piece under the headline "Was Bush packing Wi-Fi in TV Debate?" Greene makes it clear that he thinks the answer is "yes", based mainly on his evaluation of the president's linguistic performance:

Wireless technology might explain why US President George W. Bush performed better than usual in the last two presidential debates with his opponent, Senator John Kerry.

Unless he's reading a well-rehearsed speech, the President is normally much given to malapropisms and incoherent syntax. [...]

Yet, during both presidential debates, he miraculously spoke in clear, organized sentences that were fairly relevant to the questions asked. He stumbled only occasionally, and then only briefly.

I think that Greene's arguments are nonsense. He's selecting and exaggerating some facts, while ignoring others that don't suit his prejudices.

We've pointed out in several posts here at Language Log that George Bush's contributions to the first two presidential debates were sometimes not particular clear, not particularly well organized, and not particularly relevant to the questions asked. I suppose that the logical way to try to refute Greene's argument would be to show that Bush's debate performance so far has not been been strikingly more coherent than his performance in other settings. But this would not really scuttle the wireless-prompting theory, because other attempts to promote it (like Dave Lindorff's 10/8 Salon article) have speculated that Bush is usually wired in the same alleged way.

The thing is, I don't believe that George Bush's public speaking is nearly as different from John Kerry's, in terms of linguistic coherence, as (many) people think.

Let's start out by noting that the arguments about coherence go both ways. Bush has been stereotyped as linguistically and cognitively inept; but Kerry has been stereotyped as distracted by details, unable to articulate the forest for parenthesizing about the trees. When Kathleen Hall Jamieson told a NYT reporter that "the language of decisiveness is subject, verb, object, end sentence", she was supplying quotes to bolster the reporter's theory that "Kerry has a tendency to ramble, when an audience wants punchiness", and that he uses too many hedges, "words and grammatical constructions that imply uncertainty or qualification".

If you think about it, the two men's different stereotypes can be applied to exactly the same behavior, giving alternative and roughly opposite explanations for the same facts. If Bush sputters or rambles, it's because he's got some sort of linguistic or cognitive deficit: he's not intellectual enough. If Kerry sputters or rambles, it's because he's trying to be too nuanced, not responding from the gut: he's too intellectual.

But roughly as often as not, the stereotypes don't fit. For example, consider this passage from Kerry's side of the second presidential debate:

And I believe ((that)) if we have the option which scientists tell us we do
of curing Parkinson's
curing diabetes
uh uh a- a-
you know
some kind of a- a- of a-
uh ((s- p- th- you know))
paraplegic, or quadraplegic, or
uh uh you know a spinal cord injury, anything
that's the nature of the human spirit.

This is hardly a paragon of linguistic facility, either syntactically or phonetically. There's that embarrassingly long (almost 7-second) delay in lexical access. If George Bush had experienced a lexical-access breakdown like this, we'd have commentary all over the "internets" about early senile dementia and the like. There's also a pronunciation issue here -- an extra schwa between [p] and [l] in paraplegic and quadraplegic, similar to the extra schwa in Bush's much-discussed "nucular" pronunciation of nuclear.

But did Kerry sputter and stumble here because he's trying to be too nuanced, or hedging, or explicating complexities? No, this is an enthusiastic projection of faith straight from the gut, and his difficulty seems to be a purely linguistic one -- he starts with a list of nouns for conditions that stem cells might cure, and then can't figure out what noun to use to describe the condition of suffering from a spinal cord injury. He apparently starts from the adjectival form paraplegic, and then can't decide what the right corresponding noun should be, or how to pronounce it. This is exactly the kind of linguistic muddle that is supposed to characterize George W. Bush. In fact, it's similar to the syntactic trap that resulted in Bush's famous "practicing their love" gaffe. But this kind of syntactic incoherence is a general human problem, and if you want to tag it as characteristic of Bush, you need to do more than to present some anecdotes in which he exhibits it.

John Kerry has gotten a stiff dose of stereotyping in the press. Geoff Nunberg has recently apologized for having repeated a false quotation originally published by the egregious Maureen Dowd, who apparently made something up that John Kerry might have said (but didn't) in order to make a point about the kind of person she thinks Kerry is. And Geoff also debunked the New York Post's gossip column, which reckoned (with no evidence, and as it happens, falsely) that Kerry's use of "sort of" (one of those hedges) "is a subtle indicator of upper-class origins or aspirations."

Those two cases both involved journalists (if we can use that word for Dowd and the Post's gossip columnist) who found what they were looking for. These "journalists" happen to have been looking -- not by chance -- for the same thing: a quote to show how John Kerry is an elitist. And of course, many commentators, journalistic and otherwise, have been looking for years for "Bushisms": evidence to confirm their prejudice that George W. Bush is a dolt. They don't have any trouble finding suitable "facts" -- but the discovery process involves a great deal of selection, exaggeration and over-interpretation.

I'm not trying to say that there are no linguistic or rhetorical differences between these two candidates. We've found plenty to say about that, and so have others, and all the commentary has barely scratched the surface. (For example, I might speculate that Bush is unwilling to fumble around for the right word for as long a span of time as others -- like Kerry -- are, and that this is part of the explanation for his occasional malapropisms.) But when you look into such things yourself, or read others' reports, be careful. If you find what you're looking for, and you care about the truth, beware.

And if you don't find what you're looking for, the explanation doesn't necessarily involve secret wireless transmissions from hidden controllers...


Posted by Mark Liberman at 02:04 PM

Trevor's Law of hip etymology

Newsweek recently printed an excerpt from John Leland's forthcoming book Hip: The History. Lelend traces the word hip back to Wolof verbs meaning "to see" or "to open ones eyes", following Clarence Major. Trevor at kaleboel is very skeptical, basing his objection less on any particular facts than on a general principle of historical linguistics. The principle in question strikes me as a plausible one, with a considerable range of potential applications. And I haven't seen it expressed elsewhere, so in keeping with the tradition for such things, I propose to name it after its author: Trevor's Law. The only trouble is, I'm not quite sure what the principle is.

Here's the background. First, Leland:

Clarence Major, in his study “Juba to Jive: A Dictionary of African-American Slang”, traces the origins of hip to the Wolof verb hepi (“to see”) or hipi (“to open one’s eyes”), and dates its usage in America to the 1700s. So from the linguistic start, hip is a term of enlightenment, cultivated by slaves from the West African nations of Senegal and coastal Gambia. The slaves also brought the Wolof dega (“to understand”), source of the colloquial dig, and jev (“to disparage or talk falsely”), the root of jive.

Now, Trevor (slightly abridged):

[M]y principal complaint is of the improbability of a word like hip emanating from seeing or looking, instead of, as I think is normal with such terminology, from f*cking. The OED service I was using with great pleasure has vanished, but I do think that such words are generally knicker-born (I am not aware of what gentlemen--or prison wardresses--wear these days), and in the following fashion:

  1. Cooler (woman, late C17-early C19: low, ex the cooling of passion and bodily temperature after sexual intercourse (Partridge)) → cool (impertinent, audacious, colloquial ca 1820-1880, then standard English (also Partridge)) → cool in the modern sense.
    1. Hip (lump of plastic contributing not inconsiderably to China's export earnings) → various sexually-tinted expressions involving the biological substitute for former (see Partridge) → hips, she is|was|were (anecdotal, mid-C20 sex trade, still used ... → hip (record company sales exec, tight trews, dandruff)).
    2. Hebben (Dutch, to have and to hold; b → ~p in many parts--just ask Barry, Scheveningen's finest drummer-florist ) → (Nieuw Amsterdam) → hep, in the (not so) modern sense.
    3. If you consider this bo!!ocks with bells on, try Cecil Adams' hypothesis; I am sceptical: his hypothesis is asexual, and thus floored (no seeling here, thank you, South Asian scholars).
  2. Dig (early C19: clobber successfully (Partridge), ie related to f*ck-type terms implying violence) → dig in the modern sense.

I'm afraid that the OED is no help with respect to the specific case in question: it gives the etymology of hip and hep as "origin unknown" and "of unknown origin", respectively, and the first citations are

1904 G. V. HOBART Jim Hickey i. 15 At this rate it'll take about 629 shows to get us to Jersey City, are you hip?

1908 Sat. Even. Post 5 Dec. 17/1 What puzzles me is how you can find anybody left in the world who isn't hep.

I don't own a copy of Clarence Major's Juba to Jive, so I don't know what evidence he cites for am 18th-century origin of "hip" or "hep" in the U.S., or for its connection to Wolof. I'll try to stop by the library at some point and check it out.

But what mainly interests me here is not the history of hip, but rather the nature of Trevor's Law. Trevor says that it's "normal" for "such terminology" to "emanate" from "f*cking". He restates his principle more directly, though more euphemistically, by saying that "such words are generally knicker-born".

I believe that I know what Trevor means by "emanate" and "knicker-born". And as for "normal" and "generally", that's just a question of what the domain of quantification is and what the statistics turn out to be. What's obscure to me is what he means by "such terminology" or "such words".

He could mean "slang in general" -- but that's clearly not true, though sexual metaphors are no doubt well represented in slang. He could mean "slang terms for positively evaluated characteristics of style" -- but that wouldn't cover dig, which he mentions along with hip and cool as examples of "such words". Is is "slang terms for in-group perception or evaluation"?

This could be a very popular Law, I'm sure. If I could only figure out what it is. Sorry to be so unhip, Trevor.


Posted by Mark Liberman at 11:12 AM

Personal and intellectual history of sentence diagrams

Like Bill Poser, I recently read Kitty Burns Florey's essay on diagramming sentences, following the link from A.L.D. Unlike Bill, I was taught sentence diagramming in elementary school, so the content of Florey's analyses didn't surprise me.

I'll confess, though, that this approach to grammatical description never made much sense to me, back when I was nine or ten years old. Questions about why the lines should go one way rather than another were treated like questions about why there are three feet in a yard -- that's just how it is, kid, shut up and finish the exercise. At nine, I probably wasn't ready to understand a better answer, but in any case, diagramming sentences wasn't something that I ever did for fun.

When I studied syntax again in college and graduate school, I never really connected what I learned to my elementary-school experience. To the extent that I thought about it at all, I saw those old diagrams as an alternative notation for surface syntactic structure, some sort of informal way to encode lexical dependencies. Linguistics-course syntax was all about "why this and not that", and variant notations for the basic observations were not of much interest. And there was not a great deal of respect for earlier traditions of analysis -- the required "History of Linguistics" course at MIT was familiarly known as "Bad Guys".

In any case, syntax was still not my favorite subject. This time my problem was exactly the opposite of my problem with sentence-analysis in elementary school. In the third grade, everything seemed to be carved in stone, but in college and graduate school, the field was written on water. There was no stable description of the phenomena. The theory kept changing, not only in terms of explanations but in terms of the entities and relations of the basic descriptions. Doing syntactic analysis felt like trying to lay out a garden on an avalance. Exciting, at least at first, but it always seemed like a gamble whether you could get any significant piece of work done before everything changed out from under you.

Anyhow, I never asked myself, before now, just where the techniques of "sentence diagramming" that I was taught in grammar school came from. Who invented them, and when?

As usual, the answer is available on the internet.

There's an extensive practical introduction to "diagramming sentences" on line at Capital Community College in Hartford, Connecticut (about 30 miles west of my elementary school, FWIW). This is part of a larger "Guide to Grammar and Writing", which gives advice on numerous topics at the "Paragraph Level" and the "Essay and Research Paper Level" as well as "Word and Sentence Level".

For a quick explanation of how sentence diagramming works, there's a powerpoint presentation, an html page on Basic Sentence Parts and Patterns, and another one on Sentence Types and Clause Configurations. Here's an index to the whole site.

The "Brief Introduction" explains that

There are other ways to represent graphically the structure of a sentence, but the most popular method is based on schemes developed by Alonzo Reed and Brainerd Kellogg over a hundred years ago.

Several works by Reed and Kellogg are available as etexts from Project Gutenberg, including Graded Lessons in English and Higher Lessons in English. A sketch of older history can be found here (at "polysyllabic.com"):

The sentence diagrams found in schoolbook grammars today are known as Reed-Kellogg diagrams. This system can be found in Alonzo Reed and Brainerd Kellogg. An Elementary English Grammar. (1878), and a simple example is shown to the right. Aparently they had an earlier incarnation, as in the introduction to this work, Reed and Kellogg remark, "We invite attention to our system of Diagrams They have grown out of the suggestions of different teachers in the Polytechnic Institute. They were copyrighted in 1868 by A. Reed and O. H. Hall; the copyright now stands in our own name." (Reed and Kellog, p. 5). I do not, however, have any bibliographical information for this earlier work.

Other systems existed in the 19th century too. These pages attempt to show the various older diagramming schemes. It is a work in progress, and I'm adding to it as time permits.

The earliest system I've seen is by S. W. Clark., A Practical Grammar: in which Words, Phrases, and Sentences are Classified According to their Offices, and their Relation to Each Other. Illustrated by a Complete System of Diagrams. (1847). He has a system of balloons drawn around words.

The polysyllabic.com author has scanned "Clark's entire introductory section on diagrams" and reproduced it on his/her site.

I still don't know the intellectual history of sentence diagramming before Clark, nor its influence on later developments. I suppose that most American linguists educated before 1960 or so must have learned this system, but I don't know whether there are any other connections. But I know where to start, I guess.

[The CCC grammar site is sponsored by the Capital Community College Foundation, and was written by Charles Darling, a faculty member at CCC].

[The "Polytechnic" referred to by Brainerd and Kellogg is Brooklyn Polytechnic , whose web page includes a picture of one or the other of this pair, it's not clear which.]


Posted by Mark Liberman at 09:59 AM

Down the gopher poll

With all due respect to both Geoff and the gophers, I feel compelled to join Eric in defense of TAPS. And partly on linguistic grounds, no less.

I suspect that Eric's right about the reasons why "eastward" and "westward" seem so natural in the context of UCSC geography: the inhabited part of the campus is considerably top-heavy, and the time when the direction of the shuttle matters most is in getting from one college to another. East-bound loops are the quickest way to get from west campus to east campus, and vice versa for the west-bound loops. (If your goal is to get down to the base of campus, on the part of the route that involves most of the north/south and counter-nominal east/west action, then any shuttle will do—though admittedly you'll have a slower and curvier ride if you go the long way.) Furthermore, getting from college to college is probably considered by TAPS to be the "primary" function of the shuttles. In fact, their own description emphasizes the "cross-campus" aspect (which I take to mean between colleges), while at the same time exhibiting doubt that people will understand the relation between the "main campus" (at the top) and the lower section on Hagar Drive:

Loop buses run both directions through campus, at 7–minute intervals. Cross–campus trips take as little as 20 minutes on the Loop! Loop buses do not enter Quarry Plaza; Loop buses enter the East Remote parking lot when northbound (travelling uphill) on Hagar Drive.

Naturally, no one going from Crown to Kresge cares about more subtle issues like which way the gophers are facing, whether the bus is planning on turning around after they get off, or what would happen if you snoozed and stayed on for an entire loop which wearing a GPS tracking device.

Geoff does care about subtle issues, though.

It is one of the things that makes him a great scholar, after all. He wants things to be named as accurately as possible. (Maybe he worries that outsiders might think that the campus is filled with ignorant or delusional people. Maybe he worries that one morning when he is sleepy or preoccupied, they will trick him into accidentally getting on the wrong shuttle.) But I think he is being overly harsh on the TAPS folks, by asking them to uphold a superhuman standard. Humans have limited perspective, and limited brain capacity, and better things to worry about. Not only does it take enormous thought (and several Language Log posts) to settle on a name that isn't misleading in one way or another, but once you do, few will understand it.  If there's no communicative payoff anyway, why not stick to something that's intuitive, even if you can't defend (or even explain) it?

But that was the practical defense. Now for the linguistic one. In recognition of the fact that routes are, well, intrinsically relational things, and suffer from all the problems that Geoff has pointed out, it seems that they are frequently named with the following convention: pick some salient spot along the route, and name it according to where the bus/train/mule/jitney would take most immediately you if you got on at that spot. I conjecture that at UCSC, the salient spot is Science Hill (about two-thirds of the way to the west along the roughly east-west portion of McLaughlin Drive). My evidence comes from (1) the fact that this is where the greatest number of students seem to get off in the morning, and on in the afternoon, and (2) this is where Santa Cruz Metro bus schedules print their "turn-around" times at. (It's also the only part of campus up there that isn't residential, so using it as a reference point avoids any claims of Orientalism among fervid partisans of east or west colleges.)

This same line of reasoning is why, when I am in downtown Boston, I must sometimes hop an "inbound" subway train to head back out to Cambridge. (I am already "in", but in this case the designated turning spot is Park Street). North/south and east/west street directions in cities almost always have this flavor, too: North becomes south at 1st St, east becomes west at Main St, and so on. We could use more subtle approaches, like finding the true geographic midpoint, or doing studies to pick the point with the greatest traffic flow, but the fact is that these defining points are conventionally determined by where the biggest buildings are, or the most opportunities to transfer, or by some accident of history. If we weren't allowed to pick arbitrary, conventionalized vantage points, then we'd never even be able have words like clockwise (which would mean something quite different if you're the clock, or the number 6).

Previous posts have already pointed out that the east/west designation depends on a particular vantage point, and that the point is one that makes sense to human riders. (Sorry, gophers. You can eat the lines that provide internet to the campus in revenge if you want, but it won't make you any more influential, linguistically.)  Geoff, wanting absolute truth, was not swayed. But what he seems to have overlooked is that all naming is, at some level, conventional and arbitrary. As Mark points out, he should be glad that they at least picked names that reveal the opposing nature of the two Loops. And now that I work on a campus that was evidently not designed with humans in mind, I can add that he should be glad that at least it makes sense somewhere—to humans, not to gophers.

First postscript, even before Geoff has debunked:

It is worth noting that the Santa Cruz Metro also runs buses in both directions, but they sidestep the naming issue entirely. Buses 10, 12, 16, and 20 follow the westbound/counter-clockwise/outer loop, while 13, 15, and 19 follow the eastbound/clockwise/inner loop. According to the list of routes, the last three are simply called "Reverse". It's contrastive, and true at every point along the route. But also utterly uninformative.

Second postscript, also before Geoff has debunked:

I agree that the suggestion of "outer" and "inner" loops is brilliant for getting around the loops problem, but unfortunately, it would cause a different problem: there are already Core and Perimeter routes, which aren't loops (they stop and turn around). As a phonologist, I'm troubled by the potential loss of contrast between "outer" and "perimeter".


Posted by Adam Albright at 12:35 AM

October 11, 2004

Diagramming Sentences

Kitty Burns Florey has a nostalgic essay on diagramming sentences, an activity now very rare in American schools. The essay is illustrated with numerous diagrams. Florey points out that, like her, Gertrude Stein enjoyed diagramming sentences. She quotes Stein as writing:

I really do not know that anything has ever been more exciting than diagramming sentences.

Surprised though I was at Florey's and Stein's enjoyment of diagramming, I was even more struck by the analysis implicit in Florey's diagrams, which is that of a form of transformational grammar. Wh-questions, for example, are analyzed as having the wh-word in situ. The sentence What is the dog doing? is diagrammed as The dog is doing what?. Ditransitives are analyzed as containing a phonetically null preposition. The dog gave us his paw. is analyzed as containing a prepositional phrase to us in which the preposition to is phonetically null. It is hard to imagine that transformational grammar as developed by Chomsky can have had any influence on Catholic school education in the 1950s, so assuming that Florey's diagrams reflect what she learned in the sixth grade from Sister Bernadette and not more recent influence from linguistics, this must reflect a variety of traditional or folk grammar.

Posted by Bill Poser at 11:30 PM

Language rage in Spain

Miguel Ángel Moratinos, foreign minister of Spain, has asked the European Union (says The Economist: here by subscription; page 48 of the October 9th print edition) to make four of Spain's non-Spanish languages official: Basque, Catalan, Galician, and Valencian. So that would of course make the proud Basques (north central-east coast), Catalans (northeast coast), Galicians (northwest coast), and Valencians (central east coast) very pleased, wouldn't it?

No. Language politics are always a little stranger than you would think. The Catalans are furious. They regard Valencian as just a southern dialect of Catalan, so this move has actually undercut their status. Another crisis. Call in a linguist. (Not that linguists can hold out a lot of hope for exact, well motivated, and uncontroversial decisions concerning language boundaries.)

If the change is put into effect, it will be a new catastrophe for the EU's translation service. The number of translators the EU logically needs (assuming very optimistically that every translator from language A to language B can go the other way as well) goes up from the 190 conservatively calculated here on a basis of 20 languages, to a new figure of (242 – 24) ÷ 2 = 276 types of translator (you need Basque/Maltese, Galician/Latvian, Valencian/Dutch, Catalan/Estonian... and 272 others).

Posted by Geoffrey K. Pullum at 02:02 PM

"Sort Of": As American as Heinz Ketchup

The New York Post's Page Six gossip column made a little foray into sociolinguistic analysis recently, with an item on Kerry's choice of hedges:

THAT John Kerry had better watch his tongue -- it's starting to betray his elitist leanings. The other day, he said, "This president sort of wandered back." Language mavens say the use of "sort of" as an adverb is a subtle indicator of upper-class origins or aspirations. You won't catch any good ol' boys in those vital swing states saying "sort of."

Not hardly. "Sort of" was country before country was cool -- in fact, before "cool" was cool.

The Post's writer didn't indicate which "linguistic mavens" he had in mind, but the idea that "sort of" is an upper-class affectation came as news to Paul Kay, whose 1984 paper on "sort of" is still the locus classicus for the topic. And in fact "sort of" is a staple of country lyrics, though often spelled as "sorta." For example:

I went to the country just the other day/To see my Uncle Bill and sorta pass the time away. Hank Williams, "Everything's Okay," Hank Williams

Across the alley from the Alamo lived a pinto pony and a Navajo/ Who sang a sort of Indian Hideho. "Across the alley from the Alamo," Bob Wills

And his eyes turn sorta misty as his heart begins to glow... "Ballad of the Alamo," Marty Robbins

You put me through some changes, Sorta like a Waring blender "Poor Pitiful Me," Terri Clark/Linda Ronstadt

Well I`ve been sorta worried About Santa Claus this year. "Santa's Gonna Come in a Pickup Truck," Alan Jackson

It was misty in the canyon, the light was sorta dim. "Rider on the Rim," Red Steagall

But the pride sorta died when the man got weary eyed. "The Christmas Trail," Don Edwards

One boy sorta snickered when the roll was read. "Vidalia," Sammy Kershaw

Yet I always sort of missed her / Since that last sad night I kissed her "Spanish is the Loving Tongue," Numerous artists

It could leave you with the suspicion that the Page Six crowd doesn't actually know who Bob Wills was.

True, Kerry tends to say "sort of" more than "sorta," but if that's the variable the Page Six writer had in mind, you could charge Bob Dole and Ronald Reagan with having elitist leanings, too. But it's Kerry's misfortune that anything he does is liable to be tainted as effete . Windsurfing may be "the poor man's sailing," but Kerry has managed to singlehandely turn it into a rich boy's diversion.

Posted by Geoff Nunberg at 11:36 AM

The question of the question and the question of the place

In juxtaposition with Jacques Derrida's death, we here at Language Log have undertaken a post-structuralist analysis of route direction nomenclature. As a result, the Transportation and Parking Services department at the University of California at Santa Cruz has already changed the naming system for its campus buses from East vs. West to Clockwise vs. Counterclockwise, although the US Department of Transportation is still stonewalling on the issue of Beltway signage.

Meanwhile, I imagine that Jacques Derrida has been disputing the inscriptional fallacy with the Recording Angel. Since Raduriel surely reads Language Log, perhaps the issue of loop direction naming has come up in their discussions. Jacques' initial reaction was no doubt dismissive: "But of course: difference is never in itself a sensible plenitude." If Raduriel then introduced the point-of-view problem -- topologists vs. cartographers, students vs. gophers, those in the loop vs. the outsiders -- the discussion will have become more animated, with Jacques quoting from one of his recent essays:

The very form of this question concerning a question -- namely "where?, in what place can a question take place?" -- supposes that between the question and the place, between the question of the question and the question of the place, there be a sort of implicit contract, a supposed affinity, as if a question should always be first authorized by a place, legitimated in advance by a determined space that makes it both rightful and meaningful, thus making it possible and by the same token necessary, both legitimate and inevitable.

According to the French idiom -- and already the usage of this idiom, the effective authority of this idiom, brings us back to the question of the cosmopolitical and would by itself enjoin us to ask this question -- one would say that there are places where there are grounds for asking this question.

Score one for the gophers.

Meanwhile, back at Language Log, our plain duty is to deconstruct chapter two of Of Grammatology, the one entitled Linguistics and Grammatology, on behalf of Derrida's many fans and anti-fans among our readers.

My earlier post on Derrida's 9/11 essay has stimulated as much email as anything else I've ever written. Most of the correspondents have either praised me for showing that Derrida's writings are nonsense, or attacked me for suggesting the same thing. Only Chris Waigl noticed what I actually said, in the end: that Derrida's ideas on 9/11 have content and are wrong. It's an appropriate tribute to his life and work, to believe the same thing about his views on language. If I were to discuss them, that's where I'd end.

[Signage via Reality Control via Nathan's Notebook.]


Posted by Mark Liberman at 08:55 AM

Westward on the eastbound shuttle; or, what a long strange trip that would be

In his first post on the shuttle loops at UCSC, Geoff states:

The idea that you can distinguish a clockwise from a counter-clockwise circular loop by saying that one goes to the west and the other doesn't is more than just wrong, it's a screamingly obvious geometrical impossibility.

And in his second post, he concludes:

There is no guaranteed, unambiguous, intuitive way to say of a shuttle running a loop that it is in general running westward or eastward.

Even before Geoff conceded a wrinkle in his proposal to rename the shuttles "Clockwise" (formerly "Eastbound") and "Counterclockwise" (formerly "Westbound"), I thought back to my shuttle-riding days as an undergrad at UCSC 10-15 years ago. I don't recall myself or anyone else ever even stopping to think about the names of the "Westbound" vs. "Eastbound" shuttles. They just made sense, in exactly the way that Geoff's report of John Cowan's steering wheel analogy makes sense.* I offer here five more arguments that the "Westbound" and "Eastbound" shuttle names make sense -- possibly more sense than the alternatives that Geoff has positively entertained.

Argument 1. The Colleges.

UCSC is divided into 10 colleges (in my day, only 8). If you want to get from a college on the east side of campus (Crown, Merrill, Cowell, Stevenson) to a college on the west side (Kresge, Porter, Eight, Oakes), a Westbound shuttle takes significantly less time than an Eastbound one, which will go far south before coming back north to where the action is. (Ditto in the opposite direction, mutatis mutandis.)

Four potential objections, and how to counter them:

  1. What about going from one college to another on the same side of campus? On the east side there's no benefit to taking a shuttle from college to college -- it doesn't go up/down the hills that comprise most of the distance between them. On the west side, Kresge to/from Eight/Oakes is a hike worthy of a shuttle ride -- Oakes to Kresge, in particular, is largely uphill. Here we have a potential problem, since Kresge is further west than Oakes, but you take the Eastbound shuttle to get from Oakes to Kresge. But it's obvious that Kresge lies along the shortest shuttle path between Oakes and the indisputably east-side colleges, Oakes is closer to UCSC's West Entrance, and so on. No problem.
  2. The two new colleges (Nine and Ten) are kind of in the middle. First of all, you don't shuttle between Nine and Ten. Otherwise, you have the east colleges to the east of you and the west colleges to the west of you. No problem.
  3. Do students really identify with the colleges? Over time, the college system at UCSC has lost some of its significance (centralization of adminstration, establishment of departments), but each college still has its own dorms, dining hall, classrooms and offices, general education requirements, architecture, ambience, academic and social reputation, and so on. Judging from my own experience at UCSC and from what I see of the similar college system we have at UCSD, most undergrads do identify pretty strongly with their colleges.
  4. Not everyone is shuttling from college to college; there are major classroom buildings, libraries, the health center, the bookstore, student services, etc. You have to learn where these places are and how to get to them at some point. Not all of them are accessible with this shuttle anyway. Will "Clockwise" vs. "Counterclockwise" or "Outer Loop" vs. "Inner Loop" really help? Get used to it.

Argument 2. Top heaviness.

While UCSC's Main Entrance is at the southern ("bottom") end of the loop, the overwhelming bulk of campus buildings themselves are on the northern ("top") end, facilitating the understanding of the steering wheel analogy. (In case you haven't yet, see the map.)

Argument 3. West = Left = Counterclockwise, East = Right = Clockwise.

John Cowan noted that "left" and "right" are conceptually linked with "counterclockwise" and "clockwise", respectively, and with "west" and "east", respectively. By transitivity, Cowan argues, there's a conceptual link between, e.g., "clockwise" and "east". But there is an even more direct link in the case of the shuttle loop, because it isn't circular -- there are turns. Every major turn made in the westbound/counterclockwise direction is a left, and every major turn made in the eastbound/clockwise direction is a right.

Of course, this assumes that you allow (as Geoff may not) that there's a conceptual link between e.g. "east" and "right" in the first place. Cowan says this is "based on the notion that north = top"; instead of "top" I would say "straight ahead", but it amounts to the same thing.

Argument 4. External evidence.

The steering wheel isn't the only possible basis for Cowan's analogy. Key locks are another. I don't think an instruction to turn a key left or right is very confusing; right is clockwise, left is counterclockwise. Even if you think that a prototypical lock is on one side or the other of a door and that we may thus conceptualize a prototypical key-turn as "away" (= unlocked) or "toward" (= locked) the door jamb, you still need "top" as a reference point for this distinction, on which see Argument 3 just above. (And note how disorienting it is when someone has installed a lock upside down.)

Argument 5. The proposed alternatives are more confusing.

Of course, your judgment/experience may differ, but I have to stop and think about "clockwise" and "counterclockwise" -- not because I picture myself as a gopher, but because I really do feel that I have to conjure up a clock face and map it on to whatever it is (the shuttle loop, in this case). Understanding "westbound" and "eastbound" correctly is effortless by comparison.

Peter Maydell's suggestion (as reported in the "Note added later" at the bottom of Geoff's third post) that "mentioning three stops along the route will unambiguously identify it" also requires more processing than I think is necessary in the case of "westbound" vs. "eastbound". Bertilo Wennergren's "Inner Loop" vs. "Outer Loop" distinction is interesting, but again I think that the conceptual links leading to an understanding of "westbound" and "eastbound" are less complex than those needed to map knowledge of driving rules onto campus shuttle routes. (And why put folks from left-hand-side-driving societies, not to mention 18-year-olds who have probably had a fake ID longer than they've had a driver license, at a disadvantage?)

So, would I advocate keeping the shuttle names as-is (or "as-are")? Not that I have a stake in it -- this issue is entirely between Geoff and TAPS, as far as I'm concerned -- but yes, on the basis of the above, I think I would. Geoff is right, of course, that the above alternatives to "westbound" and "eastbound" have the distinct advantage of actually being accurate, but I would say that arriving at that realization involves more conceptual processing -- a claim I base entirely on personal introspection and intuitive conviction.

Whatever names are used, though, are probably going to be internalized by regular shuttle bus riders as typical unanalyzed proper names. New and irregular shuttle bus riders will have to puzzle things out a little no matter what, so any nonarbitrary naming system is as good as another. Institutional inertia seems like a good enough reason to leave well enough alone.

Update: October 11 2004 9:00am PDT

Geoff writes to note a serious problem:

The Loop is new. In your day it made sense to talk about eastbound and westbound: they stopped and turned around at the West Remote and at the Main Entrance. The Loop doesn't: it goes out the gate and round and back in, marking out a complete closed curve that never comes back the way it went. An Outer Loop shuttle NEVER at any time travels on the inner side of the road loop. The shuttles you remember always did: after going from Science Hill to the West Remote via College VIII they came back along the same piece of road in the opposite direction from College VIII and Science Hill. So although you have a useful observation or two, you've sort of missed the point.

Geoff's right. My memories of being an undergrad (or, of riding the campus shuttles as an undergrad) have deceived me. Still, I am comforted by the fact that none of my five arguments were dependent on this deception -- only my opening statement that I didn't recall being confused by the shuttle names. So, while it's true that I have absolutely no way of really knowing whether or not I would be (or rather, would have been) confused by the names of the Loop shuttles when confronted with them in real life, I'm fairly confident I wouldn't have been-- my intuitions about them at a distance were correct, and my five arguments remain unassailed (for now, anyway).

I do admit a potentially serious problem for at least one of my arguments: individual variation. My wife Karen and I once debated about how clear an instruction to turn a key "to the left" is -- the steering wheel analogy didn't help much there. (By the way, I argued that "left" means "counterclockwise"; Karen argued that "left" is simply ambiguous.)

More recently, Karen and I installed some ceiling fans in our house. Most contemporary ceiling fans have a switch that toggles between clockwise and counterclockwise motion; the counterclockwise direction pushes air down (for getting a breeze in warmer months) and the clockwise direction draws air up (which in turn pushes the warmer air down in colder months). (Which direction achieves which effect technically depends on the angle of the fan blades, but "counterclockwise in summer" and "clockwise in winter" appears to be an industry standard.)

When we were done with the installation and the time came to set the switch, Karen and I temporarily disagreed on which setting constituted counterclockwise motion. I took the person's eye view of the situation, looking up at the installed ceiling fan and imagining that it was a clock face. Karen took the bird's eye view, looking down from the ceiling. My interpretation, of course, was the one that provided the breeze and thus prevailed.

Although I'm still thrown by Karen's bird's eye view perspective on the ceiling fan, I think that individual differences in perspective are not likely in the case of the shuttle loop -- Geoff's gopher's eye view problem notwithstanding. But it's still intuitively clear to me that thinking about "clockwise" and "counterclockwise" (or "inner" and "outer") is more demanding than thinking about "eastbound" and "westbound". It's just too bad that won't work in the case of the ceiling fan.

* The careful reader may have noticed that, just as carefully, Geoff neglected to note that John Cowan's common-sense theory makes the right prediction while Fernando Pereira's topology-intensive alternative makes the wrong one. back.

[ Comments? ]

Posted by Eric Bakovic at 03:13 AM

Faulty Intelligence

I did a little interview on the Friday presidential debates on a local TV news program the other day, and in an effort to be "balanced," I noted some of the linguistic deficiencies of each candidate. To make a point about Kerry's difficulties in striking a demotic note, I mentioned the quote that Maureen Dowd ascribed to him, "Who among us does not like NASCAR?," observing that sentences that begin "Who among us does not like..." were more appropriately ended with something like "Placido Domingo."

"Wrong!," said my partner Michelle after we had watched the interview. "Kerry never said that." And indeed, he never did, as Bob Somerby pointed out in The Daily Howler last week. What Kerry said was "There isn't one of us here who doesn't like NASCAR..."

I suppose I could blame Dowd (who herself said she was told of the quote by someone else), or for that matter Frank Rich or Tim Egan or other Times writers who repeated the story. Or I could blame the Boston Globe reporter who has covered Kerry who originally told me about the quote.

But the fact is that the quote always sounded too apt to be true, like that quote ascribed to Bush about how the trouble with the French is that they have no word for "entrepreneur" — another discredited line.

But I repeated the line because I had a nice bit of shtick to go with it, which always got a laugh in media interviews, and because I figured that if Dowd and the rest had vouched for it, I was off the hook.

Still, I believe that if person makes an error based on faulty intelligence that the person should have received more skeptically, except the person had ulterior motives for wanting it to be true and figured he could get away with saying it because he'd heard it from ostensibly credible sources and anyway, everybody else was saying it, then when the story is revealed to be bogus, the person ought to own up to his mistake. So, I'm sorry about that.

Posted by Geoff Nunberg at 12:16 AM

October 10, 2004

Sex in private

When can a sexual act be regarded as having taken place "in private" as the public decency laws in most places require? I do not ask for purposes relating to any activities of my own, you understand, good heavens no. As always with Language Log posts, a linguistic point has center stage.

An Italian judge recently ruled that having sex in the toilet at a bar does not breach public decency laws, so long as the door to the stall is shut. (The story has been spread around the world by Reuters.) A Swiss couple were accused of committing obscene acts after the owner of a bar in the northern Italian town of Como caught them having sex in the john. State prosecutors demanded a six-month prison term for the male defendant and a five-month term for his partner. (Why the one month difference? Who knows. Perhaps she was a naive innocent who showed instant remorse and and he was an evil seducer who accused the arresting officer of being a sexless fascist meathead.) Anyway, in this case what seems to me like intuitive common sense prevailed: Judge Luciano Storaci threw out the case, saying public decency was not offended because the door was closed.

This seems so obviously right that one wonders how the interior of a toilet cubicle when the door is shut could ever be considered other than private. How has it been possible for so many people to be successfully prosecuted for the offense of engaging in sexual acts in surroundings that are not only unobservable to passing members of the public but are protected by physical arrangements whose whole purpose is to ensure privacy for excretory acts? If the sex does not count as being in private, how on earth can it make sense for more normal uses of a public toilet not to yield prosecutions for defecating in public?

I'm not a lawyer, but it seems to me that the answer is likely to lie in the interpretation of public decency statutes that in their references to sex use the highly polysemous word private. Webster's Third lists its several senses in this order:

1a : intended for or restricted to the use of a particular person or group or class of persons : not freely available to the public
  b : belonging to or concerning an individual person, company, or interest
  c (1) : restricted to the individual or arising independently of others
  (2) :carried on by an individual independently rather than under institutional or organizational direction or support
  (3) :being educated by independent study, under the direction of a tutor, or in a private school
  d (1) :affecting an individual or small group : RESTRICTED, PERSONAL
  (2) :affecting the interests of a particular person, class or group of persons, or locality : not general in effect
  e :of, relating to, or receiving hospital service in which the patient has more privileges than a semiprivate or ward patient (as in having his own doctor, a room to himself, and extended visiting hours)
2 a (1) :not invested with or engaged in public office or employment
  (2) :not related to or dependent on one's official position : PERSONAL
  b :of military personnel : of the lowest rank : having attained no title of rank or distinction
  c (1) :manufactured, made, or issued by other than government means
  (2) :issued by private not public authority but acceptable as money either because of intrinsic value or exchange value guaranteed by issuer d of clothing : CIVILIAN -- used especially by the Salvation Army
3a :sequestered from company or observation : withdrawn from public notice
  b :free from the company of others : ALONE
  c :not known publicly or carried on in public : not open : SECRET; especially : intended only for the persons involved -- compare CONFIDENTIAL
  d : having knowledge not publicly available : holding a confidential relationship to something
 e :obsolete : peculiar to a particular person
  f :being or considered unsuitable for public mention, use, or display -- used especially of the genital organs

The space inside your toilet cubicle is certainly private in sense 3b, "free from the company of others"; but not in sense 1a, "not freely available to the public", since any member (or couple of members) of the public can use the same facility as soon as the previous person has finished doing whatever the hell it is that they're doing in there. The details of exactly what is meant by being in private or in public would have to be spelled out in very careful detail in any relevant law. And of course there are thousands of such laws, specific to states, provinces, counties, or cities, and they won't be all that carefully written, because (a) laws generally aren't very carefully written anyway, and (b) laws governing any kind of behavior that someone could think lewd are likely to get even less scrutiny than a tax cut, since no legislator who wants to get re-elected is going to stand up and quibble if that would appear to put him in the position of arguing for the likes of couples who want to be free to slip into the john for a quick screw. And (some Italian reader correct me if I'm wrong) Judge Luciano Storaci probably doesn't have to run for election. An American county court judge who has to gain public support for re-election to keep his or her job might not have been as inclined as Judge Storaci to dismiss a public indecency case.

Posted by Geoffrey K. Pullum at 04:59 PM

Angry at the guy asking?

From liveblogging the second presidential debate at NRO's The Corner:

WHY DOES BUSH... [Jonah Goldberg]
Sound like he's angry at the guy asking about making drugs cheaper?

Posted at 09:43 PM

Jonah is referring to the exchange that starts this way:

HORSTMAN: Mr. President,
why did you block the reimportation of safer and inexpensive drugs from Canada
which would have cut 40 to 60 percent off of the cost?

BUSH: I haven't yet.
(( Just)) want to make sure they're safe. When a drug comes in from Canada, I want to make sure it cures you and doesn't kill you.
And uh that's why the FDA and that's why the surgeon general
are looking very carefully to make sure it can be done in a safe way.
I've got an obligation to make sure our government does everything we can to protect you.
And um
And what ((I-)) my worry is and- is that, you know, it looks like it's from Canada, and it might be from a third world.
And we've just got to make sure, before somebody thinks they're buying a product,
uh that uh
that it works. And that's- that's- that's why we're doing what we're doing.

To me, in this part of the exchange, Bush doesn't sound like he's angry. Listen to this sample and see what you think. In this direct answer to Horstman, I think the president sounds earnest and friendly.

However, after John Kerry's 90-second rebuttal, Bush breaks in without waiting for the moderator's instruction, and now he sounds vehement, even angry:

If- if- If they're safe, they're coming.
I want to remind you
that it wasn't just my administration that made the decision on safety.
President Clinton did the same thing
((and)) we have an obligation to protect you.
Now, he talks about Medicare.
He's been in the United States Senate twenty years.
Show me one accomplishment toward Medicare
that he accomplished.
I been in Washington DC three and a half years
and led the congress to reform Medicare so our seniors have got a medern-
modern health care system.
That's what leadership is all about.

Again, listen to this sample and make your own evaluation. To me -- and to Jonah Goldberg -- it sounds ranting and angry. Presumably Bush is angry at Kerry, but he's addressed someone as "you" who can't be Kerry: "we have an obligation to protect you"; "show me one accomplishment ... that he accomplished." In these phrases, "you" must be either the audience or Hortsman, so it's plausible for Goldberg to feel that Bush sounds "angry at the guy asking".

There's another whole set of questions about what makes someone sound "earnest" or "angry" or whatever. And there's a long litany of answers, hundreds if not thousands of papers, monographs and books full of details, which unfortunately can be summed up in one phrase: we don't really know.

That's a bit too pessimistic -- we know a lot of facts, some of which are even true, and people have a lot of theories, some of which even make sense. But it's interesting to contrast the emotional and attitudinal content of speech with its lexical content.

When we ask what words someone said on a particular (recorded) occasion, the answer is a subjective one. If people disagree, the best we can really do is to appeal to the perceptions of a panel of unbiased native speaker/hearers. No one would ever say "well, we can't agree about what this person said, let's look at the output of this speech recognition program."

However, the lexical perceptions of unbiased native speaker/hearers are pretty consistent. If transcription conventions are shared, independent careful transcripts of a passage of conversational speech are likely to disagree about no more than 4-5% of the lexical tokens. And many of these disagreements will be fairly inconsequential things -- was that "uh" or "a"? Was it "Cathy" or "Kathy"?

This is a phenomenon we might call "word constancy". We have an extraordinary shared subjective consciousness of words. On a plausible definition of "word", roughly similar to what dictionary makers would use to decide what a dictionary entry is, average American high school graduates have a passive vocabulary of at least 40,000 items, not counting lexicalized phrases, proper names, acronyms and some other sizeable categories. When I teach introductory linguistics, I have my students administer a similar sort of evaluation to themselves -- based on random samples from a dictionary headword list -- and the numbers that come back average about 65,000. There's a lot more to said about this question, but any way you look at it, the members of a speech community inhabit a common lexical world made up of tens to hundreds of thousands of word-like categories.

Lexical knowledge is not the consequence of literacy, but its essential condition. You can demonstrate word constancy in young children long before they've learned to read. For example, play a game with some four- or five-year-olds, where you name a word ("baseball", "sweater", whatever) and offer a prize to the first kid to raise a hand when you use it again. You'll a lot more arguments about whose hand went up first than arguments about whether you actually said "baseball".

Now try the same game with emotional or attitudinal loadings of speech. "The first one to raise a hand when I speak in an angry voice (a happy voice, a sad voice...) gets a prize." You can even start by demonstrating what you mean.

This time, the arguments will be of a very different kind. First, the number of distinct emotions that most people are willing to try to try to differentiate isn't very large -- half a dozen or a dozen or so. Second, the degree of individual confidence and joint agreement is not very high at best, and gets lower as the set of distinctions is increased.

"I win, that was a happy voice". "I think he sounded like he was just in a hurry, not happy". "Maybe he was a little happy, I'm not sure". "It sounded like an ordinary voice to me, that's how he talks all the time." It gets worse as you multiple the categories to cover more of the complex cognitive structure of emotion -- happiness because of personal accomplishment vs. pride in the accomplishment of a loved one vs. satisfaction from the misfortune of an enemy vs. unexpected luck; unselfconscious joy vs. suppressed glee; and so on. Psychologists' attempts to study the perception of emotion in speech tend to involve more like 6 categories than 60,000, and even so, the degree of intersubjective agreement is more like 50% than like 95%. (Of course, the details vary enormously -- I'll discuss some specific studies in a future post).

Now, there are particular questions about particular utterances where most people can agree. Here's another sound clip. Does the speaker seem relaxed, or upset? Submissive, or aggressive? Force such choices and most people will agree on the outcome. But there's a third problem that becomes clear at this point. How you describe an instance of emotional expression depends not only on your evaluation of someone's state of mind, but also on your opinions about them and their circumstances.

There are lots of ordinary-language terms for someone who feels negative-valence arousal directed towards others: aggrieved, angry, annoyed, bitter, enraged, furious, huffy, irate, incensed, irritated, outraged, pissed, upset, wrathful, and a couple of dozen others. These involve not only different degrees of arousal ("annoyed" vs. "enraged") and shades of emotion ("bitter", "outraged"), but also different evaluations of the person in question ("huffy", "wrathful"). There are also more neutral words for aroused emotional states of an aggressive character: fierce, impassioned, vehement.

I don't think that anyone will consider that George Bush's tone of voice, in that last clip, was "relaxed" or "soothing" or "amused" or "seductive". But whether you think that George Bush was "huffy" or "impassioned", "irritated" or "vehement", probably depends at least in part on what you think about him and his policies.

Though it's true that Jonah Goldberg, who is certainly a strong supporter of George Bush and (most of) his policies, perceived him as being "angry" in this sequence.

I'll try to say a few things in some later posts concerning what we do know about the expression of emotion in speech. It's a symptom of the poor state of our understanding, both of emotion and its expression, that it's going to be hard to do this briefly.


Posted by Mark Liberman at 11:47 AM

The gopher's eye view

The ground on which they built the Santa Cruz campus of the University of California is honeycombed by the activities of ground squirrels, pocket gophers, and various other burrowing mammals — running across a UCSC meadow is a good way to break an ankle. And below the tunnel-ridden soil is a sort of limestone sponge, riddled with caves, tunnels, sink holes, and water channels.

You're probably wondering how this is going to be linked to a linguistic topic, aren't you? Trust me. A link to a linguistic issue is coming, in two deft strokes. You just have to decide to read on.

Recently I argued that the signs on UCSC shuttles in early October 2004 talking about "westbound" loop shuttles having bike racks on the fronts of the buses are utterly meaningless. Thereafter, two suggestions were emailed to me that claimed to map a particular shuttle loop direction naturally onto an eastbound or westbound direction, but I pointed out that they contradicted each other. My claim was that "clockwise" and "counterclockwise" were the terms to use, and Transportation And Parking Services (TAPS) at UCSC has decided that I am right.

Don Olivier, who actually runs Linux on his digital wristwatch, has written me to suggest that in fact I am wrong. Consider again a shuttle route going up Hagar Drive, turning west at Coolidge Drive after Quarry Plaza, heading out of the West Entrance, and down Empire Grade to the Main Entrance (see the interactive aerial photograph here). It looks counterclockwise, I observed. But that is because I am tacitly taking the view from above ground. Suppose I take the viewpoint of a gopher located under the ground near the McHenry Library in the center of the campus, looking up. Now the same shuttle appears to be travelling clockwise. The clockwise / counterclockwise opposition is relative to observer viewpoint.

Paradoxically, there appears to be no general way to name the direction of travel of a loop shuttle. Leftward/rightward and westward/eastward are self-evidently useless because a loop must involve just as much travel to the left (or west) as to the right (or east), and just as much to the top (or north) as to the bottom (or south), if it is ever to get back to where it once was and thus continue in a loop. But the clockwise/counterclockwise opposition also fails if we cannot agree on where the observer is to be stationed — in three dimensions — relative to the set of points on the loop. (You might, foolishly, have spent a second or two thinking that one could distinguish the two routes by explaining that the one with the bike racks on the fronts of the buses visits Quarry Plaza on Hagar Drive before it gets to the Cowell Health Center on Coolidge Drive. But of course both routes do that. Think about it.)

Now I know that some people will say that the problem is solved by the contingent fact that students (especially those bringing bikes up the hill to the central campus) spend virtually all their time above ground, and that gophers virtually never travel by shuttle bus. But that is just the sort of anti-intellectual observation that we practitioners of lofty theorizing despise. There is a deep issue here. We sit in our campus shuttles travelling a three-dimensional path on the uneven surface of a spheroidal rotating planet orbiting a minor star that drifts in three dimensions in a galaxy drifting along with many other galaxies in a galaxy cluster in a rapidly expanding universe of possibly many dimensions... And the nonsensical reference to westbound loop shuttles in the notice TAPS put up should remind us that they face a serious problem: there is no general way to refer to the direction of travel in a directed loop that does not rely on a known orientation relative to the geography of the loop that can be agreed on by all observers.

God may be able to see the whole universe all at once without any perspectival bias; but if so, He will be unable to tell us unambiguously how to determine whether there will be bike racks on the fronts of our shuttle buses without referring to where we are now and which way we are looking. That is a deep and cosmic fact. And you read it here on Language Log.

Note added later (October 10, 4:15pm Eastern time):

Or at least I thought it was when I hit the Save button. But one can always be mistaken. And I appear to have been wrong once again. What I said was perhaps a deep and cosmic fact if you take shuttle loops to be one-dimensional paths with no referenceable named points; but those assumptions do not hold here, and they are crucial. Here are the two suggestions I received this morning from several readers:

Peter Maydell (and Sweth Chandramouli independently just a few hours later) pointed out that mentioning three stops along the route will unambiguously identify it. So the route with the bike racks could be called (rather longwindedly) the Quarry Plaza - Cowell Health Center - West Entrance route, the other being the West Entrance - Cowell Health Center - Quarry Plaza route. He notes that the London Underground train system does this with the Circle Line: "This is a Circle line train for Liverpool Street via King's Cross," they say, tacitly lining up the station where you are, the King's Cross station, and the Liverpool Street station in such a way as to unambiguously identify the direction of travel as clockwise (counterclockwise if you are on another underground train deeper in the earth and looking up). I missed this one by not extending the reasoning about stop sequence far enough: two points on a closed curve do not identify a direction on it, but three or more do.

Bertilo Wennergren (and again, Sweth Chandramouli independently just a few hours later, and Vardibidian a couple of hours after that, and Anders McCarthy from Seoul after that) pointed out that one of the shuttle routes will have buses travelling a larger loop than the other, because (this being the USA) buses travel on the right hand side of the road. This makes what I was calling the counterclockwise route (clockwise for gophers) trace out a larger loop, strictly outside the other one all the way round. So the route with the bike racks on the buses could be called the Outer Loop shuttle, the other being the Inner Loop. When an Outer Loop shuttle passes an Inner Loop shuttle, the Outer Loop bus is always on a path that is further out from any arbitrary point (say, the McHenry Library) inside the loop than the Inner Loop shuttle is.

Those two ideas are the best I have met with yet, and Bertilo's seems neater: the Outer Loop and the Inner Loop. Hugo Quené reports that they already do this in Paris, calling the two directions on the Périphérique the inner (interieur) and outer (exterieur) directions. I missed this possibility because I was thinking of the loop route as a one-dimensional line enclosing a two-dimensional area on the surface of a three-dimensional planet, an oversimplifying assumption that makes what I said true (unless you can name particularly points on the loop; see the previous paragraph). But a road is in fact a two-dimensional strip: it has width, and buses on it have positions on it relative to the two edges. TAPS might consider adopting that one. Though right now they are busily implementing the clockwise/counterclockwise idea and making new signs. It's always a mistake to move too fast on such things. You have to let the pajamarati of the blogosphere complete their analysis first.

Posted by Geoffrey K. Pullum at 01:24 AM

October 09, 2004

His refusal to disgrace herself

Chinese has no case distinctions or gender distinctions in the inflectional paradigm of its third person singular pronoun. In fact there is pretty much no inflection at all (you can make an argument that a small number of phenomena might be treated that way, but it would be fairly iconoclastic to say that Chinese was even modestly inflectional). The word ta1 (the 1 is a tone indication) does duty for "him", "her", and "it" (interestingly, they're different in writing, but identical in speech; see the note at the end of this post). My student Matthew Thomas Davis points out to me a paragraph in which this problem really comes through. It's from this page put up by the Xinhua news agency, and the opening paragraph says:

New York Times reporter faces up to 18 months in jail in CIA leak case

www.chinaview.cn 2004-10-09 02:33:45

WASHINGTON, Oct. 8 (Xinhuanet) -- A New York Times reporter is facing up to 18 months in prison after a federal judge held him in contempt of court on Thursday for refusing to name her source to prosecutors investigating the disclosure of the identity of a covert Central Intelligence Agency (CIA) agent, media reports said Friday.

A difficult error to spot if you're a Chinese copy editor, accustomed to mentally translating him and her to ta1. After all, ta1 translates back into English as "either him or his or her or it or its", so where's the mistake?

John Cowan writes to me to point out that the story of how ta1 came to have three different characters in the writing system ("hanzi", he calls them) is really quite strange, and involves foreign influence on Chinese:

In the beginning Mandarin had the the spoken pronoun ta1, which referred to persons and was written by a single hanzi. Then the Chinese caught on that "progressive" (i.e. European) languages distinguished between "he" and "she", and introduced this distinction into writing by modifying the hanzi for ta1 when it referred to females. This happened somewhere around 1911. Much later, probably in the 1960s, ta1 was extended to mean "it" as well, primarily as a product of translation from Russian and other languages (the normal colloquial Mandarin for "it" is zero, although ta1 can be used if there's a syntactic demand for it). As a result, another novel hanzi was introduced.

Linguistics is always even stranger than you would think.

Posted by Geoffrey K. Pullum at 06:43 PM

Gibson scores a "Bushism", with an assist to Kerry

Talking is probably the most complex and difficult kind of skilled motor activity that any of us ever engages in. For a phonetician like me, it's a wonder that anyone can make it through a sentence. To get through a speech or debate without a major breakdown of planning or execution would be extraordinary. So it's no surprise that the participants in last night's debate made a few little slips here and there.

By reputation, George W. Bush is a less skilled verbal athlete than most politicians. However, he's not the one who committed the most striking speech error of the second presidential debate. Nor was the guilty party John Kerry, though he was a sort of accessory before the fact. The booby prize, in this case, belongs to Charles Gibson, the moderator.

This is doubly surprising. In the first place, Gibson did much less talking than either of the candidates. He didn't even ask the questions, he just introduced the questioners from the audience, and added a couple of follow-ups. In the second place, Gibson is a professional talking head. A kind of occupational Darwinism ensures that such people are way up on the upper tail of the curve of verbal facility.

And yet.

One of Gibson's few follow-ups dealt with the likely timing of the next terror attack within the U.S.:

I- I want to extend for a minute, Senator, and I want- I'm curious about something you said. You said it's not when, but if. You think it's inevitable? Because the sense of security is a very basic thing with everybody in this country.

Now, I have to say that even if Gibson hadn't swapped if and when, this would have been one of the sillier questions that have been asked in these debates. I won't insult your intelligence by explaining why. Let's just say that a plausible off-the-record answer from either candidate would have been "duh!?"

But of course Gibson did swap if and when.

Kerry kindly corrected Gibson by repetition, and then tossed in a couple of suitable sound bites.

Well, the president and his experts have told America that it's not a question of if; it's a question of when. And I accept what the president has said. These terrorists are serious, they're deadly, and they know nothing except trying to kill.

I understand that. That's why I will never stop at anything to hunt down and kill the terrorists.

But you heard the president just say to you that we've added money.

Folks, the test is not if you've added money; the test is that you've done everything possible to make America secure. He chose a tax cut for wealthy Americans over the things that I listed to you.

(More tests, by the way. Maybe Kerry would do better to avoid that word for a while, don't you think?).

Kerry himself started to make the same if/when speech error earlier in the debate, and then corrected himself. This probably helped prime Gibson's gaffe.

Kerry was answering a question from Ann Bronsing:

Senator Kerry, we have been fortunate that there have been no further terrorist attacks on American soil since 9/11. Why do you think this is? And if elected, what will you do to assure our safety?

Kerry's answer (in my transcription).

Thank you very much, Ann. [pause 0.657]
Um I've asked uh in my security briefings why that is, and I can't go into all the d- answers et cetera, but let me say this to you. [pause 1.243]
This president, and his administration, have told you, and all of us, [pause 0.807]
it's not a question of when, [pause 0.574]
it's a question of- [pause 0.312]
excuse me, not a question of if. [pause .680]
It's a question of when. [pause 0.513]

We've been told that. [pause 0.723]
The when, I can't tell you. [pause 0.676]

This set the stage for Gibson to get his ifs and whens crossed, a bit more than three minutes later.

This wasn't Kerry's only disfluency by any means. Although I haven't checked systematically, my impression is that he was roughly as disfluent as Bush in this debate. For example, shortly after his if/when confusion, he said (in my transcription)

And I'm gonna [pause 0.258]
put ((in)) place a better homeland security. [pause 0.239]
Effort. [pause 0.259]
Look at it. [pause 0.256]
95% of our containers coming into this country are not inspected today. [pause 0.769]
When you get on an airplane, your car- your [pause 0.421]
bag is- is- is [pause 0.184]
((b-)) x-rayed [pause 0.494]
but the cargo hold isn't x-rayed. Do you feel safer? [pause 0.686]

(387 KB .wav file here).This is implicitly repetitive as well as disfluent. If Kerry hadn't needed to add "effort" as an afterthought, and hadn't gotten tangled up in the luggage, this whole stretch would have been a more-or-less verbatim repetition of sound bites from the first debate.

There's nothing wrong with that, but let's not get the idea that Kerry is some kind of robotic talking machine who never stumbles, substitutes, misorganizes or repeats himself.


Posted by Mark Liberman at 11:40 AM

decisions remember yet Europe: ladies gentlemen left behind

Last week, I did a simple word-frequency analysis of the first presidential debate, looking for the words whose frequencies were most different between the two candidates. This morning, I fetched the transcript of the last night's (second) debate from the official site, and ran it through the same programs as before.

Without further ado, here's the top of the list:

Word Bush count Kerry count    Word Bush count Kerry count
al qaida

(Again, let me say that my overall evaluation of the debate was consistent with what seems to be the conventional wisdom: Bush did much better than before, and Kerry was OK too. Score it a draw.)

One fact that's obvious in this table is that Bush was much less verbally repetitive this time -- in the second debate, Kerry repeated more words and phrases more often than Bush did.

Some of the candidates' themes are also clear in these simple-minded word counts. For example, in the first debate Kerry used spend or spending 3 times, Bush twice; but in the second debate, Bush used spend or spending 12 times, to none for Kerry. On the other hand, in the second debate, Kerry used kid, kids, child, children, young 27 times, to 3 for Bush.

The total word counts show a fairly consistent pattern:

  Debate1 Debate2
Bush 6,135 6,893
Kerry 7,136 7,717

Kerry used 12% more words than Bush this time, as opposed to 16% more in the first debate.

Both men used more words in the second debate than they did in the first one (Bush 12%, Kerry 8%). This might have been because they spoke faster (perhaps due to the more informal atmosphere), but I'm not certain that the number of questions and one-minute-extensions was exactly the same in the two debates.


Posted by Mark Liberman at 09:28 AM

October 08, 2004

A Milli Vanilli president?

An article by Dave Lindorff in Salon today, "Bush's mystery bulge", seems designed to bolster the rumors that in last week's debate, "President Bush was literally channeling Karl Rove" via a hidden earpiece. Technically, Lindorff just reports on the rumors, but the article has been widely cited as adding to their credibility.

The "evidence" cited includes Bush's "peculiar" pauses ("On several occasions, the president simply stopped speaking for an uncomfortably long time and stared ahead with an odd expression on his face."), the now-famous "let me finish" aside, and a bulge under Bush's coat, in back, thought to represent an electronics module of some sort.

Last Sunday I compared the pausing patterns of Bush and Kerry , and on Monday, I discussed the "let me finish" passage. At that time, I was skeptical of the "secret audio prompter" theory as an explanation for the "let me finish" aside, and argued that the aside seemed more plausible, in context, as a remark addressed to Lehrer. However, given that these rumors are now seeping into "big media", I thought I'd give the question another look (and listen). This prompted me to make some additional phonetic measurements, which turned out to bear on the question in a way that I didn't expect. It's by no means determinative, but it's enough to lead me to a tentative conclusion about these charges.

The main theory that Lindorff reports (taken from blogger Joseph Cannon among others) is this:

[T]he president and his handlers may have turned to a technique often used by television reporters on remote stand-ups. A reporter tapes a story and, while on camera, plays it back into an earpiece, repeating lines just after hearing them, managing to sound spontaneous and error free.

There are reports of several earlier news clips of presidential speeches where the audio is said somehow to have picked up the audio prompting as well as the live speaking, for example this CNN clip. Attempts to use this technique might certainly result in unusual pausing patterns, or other oddities of presentation. However, I have remained rather skeptical that the president used such a technique during last week's debate. At least, many of his answers seem spectacularly unlikely to have been recorded in advance by the kind of process that Lindorff suggests.

Geoff Pullum cited a characteristic example a couple of days ago: Bush's answer to Lehrer's question

Do you believe the election of Senator Kerry on November the 2nd would increase the chances of the U.S. being hit by another 9/11-type terrorist attack?

which consisted of the following sequence of phrases (with linking verbiage removed):

  • I don't believe it's going to happen.
  • I've shown the American people I know how to lead.
  • I understand everybody in this country doesn't agree with the decisions I've made.
  • People out there listening know what I believe.
  • This nation of ours has got a solemn duty to defeat this ideology of hate.
  • We have a duty to protect our children and grandchildren.
  • Ten million citizens [in Afghanistan] have registered to vote.
  • They're given a chance to be free.
  • They [the Afghans] will show up at the polls.
  • Forty-one percent of those 10 million [Afghans who have registered to vote] are women.
  • It's a phenomenal statistic.

Geoff wrote that "my reading of the whole answer is that we're looking at a man in a panic who has no idea what to say to the question. He has been taught a whole slew of tough-sounding clauses to reiterate, but can think of nothing to do but hurl them around at random." Whether or not you agree with Geoff's evaluation, and whether or not you like what Bush has to say, does it seem plausible to you that Bush would have recorded this particular meandering sequence as a package to echo back during the debate? By all accounts, he has excellent speechwriters who know how to craft a rhetorical structure.

The other trouble with the "echoing prepared material" theory is that it doesn't actually explain Bush's "let me finish" aside. If he's just echoing what's coming over the air, why not go on echoing? There's another theory, about handlers giving more indirect advice through the hypothetical earphone, which fits that piece of evidence better. But now the earphone-evidence argument is turning into a whole set of subtheories, one for each observation -- sometimes the earphones were feeding pre-recorded speechlets, which accounts for the pauses; and sometimes they were feeding advice, which accounts for the "let me finish" aside; and sometimes they were were feeding nothing, which accounts for the meanders. At this point, though, the theory has lost its explanatory force, as we linguists say.

This is not enough to prove that there was no earphone, or that an earphone played no role in Bush's pauses or his "let me finish" aside. But it seemed much more likely to me that his pausing reflected the normal cognitive stress of trying to select, arrange and reproduce chunks of memorized material in an appropriate sequence, and that his "let me finish" aside was meant to prevent Lehrer from going on to the next question.

So I decided to look in more detail at the timing of Bush's pauses and speech segments in the answer containing the "let me finish" aside. To start with, I reproduce below my transcript of the passage leading up to the "let me finish" business, with a couple of additional turns added, and adding duration measurements for the speech as well as the silences.

Jim Lehrer: Ninety seconds, Mr. President.
George W. Bush:
[pause 0.030]
[speech 2.156] Uh my opponent just said something amazing, he said [pause 0.465]
[speech 2.524]Osama Bin Laden uses the invasion of Iraq [pause 0.951 ]
[speech 2.180] as an excuse to spread hatred for America [pause 0.811]

[speech 2.833] Osama Bin Laden ((isn't)) gonna determine how we defend ourselves [pause 2.231]
[speech 1.952] Osama Bin Laden doesn't get to decide [pause 1.428]
[speech 1.233] The American people decide. [pause 0.762]
[speech 0.730] I decided [pause 0.306]
[speech 1.889] the right action was in Iraq. [pause 0.542]
[speech 2.068] My opponent calls it a mistake -- it wasn't a mistake. [pause 0.330]
[speech 1.355] He said I misled on Iraq. [pause 0.274]
[speech 4.117] I don't think he was misleading when he called Iraq a grave threat in the fall of 2002. [pause 1.303]
[speech 3.685] I don't think he was misleading when he said [pause* 0.126] that it was right to disarm Iraq [pause 0.965]
[speech 1.779] in the spring of 2003. [pause 0.609]
[speech 1.797] I don't think he misled you when he said that, [pause 0.565]
[speech 0.476] you know if you- [pause 0.372]
[speech 5.731] anyone who doubted whether the world was better off without Saddam Hussein in power sh- didn't have the judgment to be president, I don't think he was misleading. [pause 1.375]
[speech 1.128] I think what is misleading [pause 0.233]
[speech 1.239] is to say you can lead [pause 0.680]
[speech 1.096] and succeed in Iraq [pause 0.604]
[speech 1.680] if you keep changing your positions [pause 1.390]
[speech 1.335] on this war. And he has. [pause 1.256]
[speech 4.625] As the politics change, his positions change. [pause* 0.126] And that's not how a commander in chief acts. [pause 1.482]
[speech 3.273] I- t- I- uh uh w- let me finish ((here)). The intelligence I looked at [pause 0.807]
[speech 1.999] was the same intelligence my opponent looked at, [pause 1.520]
[speech 1.603] the very same intelligence. [pause 0.779]
[speech 6.879] And when I stood up there and spoke to the Congress, I was speaking off the same intelligence he looked at to make his decisions to support the authorization of force.

You can see that both the speech segments and the silent pauses vary quite a bit in duration. The question that I'm going to ask is how they co-vary: does Bush tend to pause longer before longer speech segments? If he's being fed lines in an earphone, I'd expect that he should have to listen to a longer stretch of prompt before launching into a longer phrase. Of course, if he's making it up as he goes along, it also might be taking him longer to plan a longer speech segment.

However, the correlation coefficient between his silences and the following speech segments in this passage is -0.05. This is not statistically distinguishable from zero. So there's no effect.

If there were an effect, we'd still have to decide whether this was due to prompting or just to the effects of compositional effort. But the fact that there's no effect makes it seem less likely, at least to me, that is is really being fed his lines. (I also have to say that it's hard to read this transcript and believe that the answer was composed as a whole in advance).

However, if we ask whether Bush tends to pause longer after longer speech segments, we get a very different answer: in this case, the correlation coefficient in this passage is 0.54. That's a respectable effect.

I'd confess that I didn't expect this at all. When I looked at the list of numbers, though, it kind of jumped out at me, and the correlation coefficient confirms the relationship. I have no idea why the effect holds -- maybe it's a rhetorical move, to give the audience more time to assimilate the longer phrases; maybe it's a compositional effect, with some short-term memory buffer taking longer to clear after finishing a longer phrase. It could be an accidental consequence of something about this particular answer.

Is this a general effect in public speaking? I don't know, and based on a quick literature search, I can't find any phonetic research that addresses the question. If some such research exists, I'll try to find out about it and let you know. If no one's ever looked at this simple question, it's a black eye for us phoneticians, I think.

So what's the conclusion? The rather meandering rhetorical structure of many of Bush's answers leaves me skeptical that he was getting his lines fed to him through a hidden microphone. The lack of any correlation between pause length and the following speech segment length tends to support this doubt, in my opinion.

The positive correlation between pause duration and the preceding utterance duration is fascinating, and worth following up on. If it's a general fact about debating, it's interesting to learn that. If it's specific to George W. Bush, or to a certain class of people including him, I'd like to know why. I can't think of any way to explain it in terms of the effects of a secret audio prompter, though.

As for that bulge in the back of Bush's coat, I have no clue what it is. I imagine that we'll find out, though.

[Note: the "Milli Vanilli" line is from Lindorff's Salon piece. It's kind of unfair, since what made Milli Vanilli a sacrificial llamb for pop inauthenticity, back in 1990 was lip syncing, not getting prompted.]

[Update 10/9/2004: an acquaintance who worked for some time as a producer at CNN told me that he has never heard of what Cannon describes as the "technique often used by television reporters on remote stand-ups", to have a pre-recorded version of their remarks played in their earphones as a prompt. That doesn't mean that it never happens, but at least it's not a routine method. And I haven't seen any validation of Cannon's assertion on this point by someone in a position to know. ]

[Wonkette pretty well sums it up: "Yes, we've seen the pictures. But we also watched the debate. If Bush was listening to some kind of radio signal, it was between stations." ]


Posted by Mark Liberman at 05:46 PM

Paulos vs. Dvorkin

In an October 4 column entitled The 'Innumerates' Among Us, NPR ombudsman Jeffrey A. Dvorkin wrote:

One of the rarely admitted secrets about journalists is that many of us are functional "innumerates" -- another way of saying "mathematically illiterate." Oh sure, we can add and subtract reasonably well. But with some exceptions, journalists generally don't know, understand or aren't interested in numbers. As for more complex subjects such as statistics and probability, well... many journalists would be hard pressed to tell the difference between "average" and "mean." [emphasis added]

Um, could that be because there isn't any difference?

As mathworld explains, "The quantity commonly referred to as 'the' mean of a set of values is the arithmetic mean ... also called the (unweighted) average". And if you look up average in the American Heritage Dictionary, you're told to "See arithmetic mean".

It's true that there are other kinds of mean -- geometric mean, harmonic mean, quadratic mean and so on. And it's true (as Ray Girvan pointed out to me) that statisticians sometimes use "average" just to mean "measure of central tendency", which could be mean, mode, median or midrange. But I don't think that's what Dvorkin had in mind. He might have meant "the difference between mean and median", I guess, since that comes up for journalists a lot in things like the difference between "mean income" and "median income". Anyhow, the bold-face material from Dvorkin's column, quoted above, seems to be a real-world example of a self-annihilating sentence.

Perhaps journalists should be given a sort of global (i.e. comprehensive) education, giving them command not only of means and modes and medians, but also of anaphors and antecedents, the difference between pausing and punctuation, description of pronunciation, basic grammar, and the use of dictionaries, among other relevant topics. And why stop at journalists?

Here, perhaps, John Allen Paulos and Geoff Pullum can agree. Let's establish universal competence in basic matters dealing with numbers as well as letters. Then John can get back to watching TV, while Geoff heads for Vegas and further adventures in field lexicography.

[Update: I originally had "Dworkin" throughout this post, but of course it's "Dvorkin", as Ray Girvan pointed out to me. Sorry for the mistake. I've been referred to as "Lieberman" often enough to know what this feels like from the other end. ]


Posted by Mark Liberman at 03:33 AM

October 07, 2004

The self-styled grammarian: no respect

I noticed recently that John Allen Paulos opens his well-known book Innumeracy: Mathematical Illiteracy and Its Consequences (Hill & Wang, 1988) thus:

Innumeracy, an inability to deal comfortably with the fundamental notions of number and chance, plagues far too many otherwise knowledgeable citizens. The same people who cringe when words such as "imply" and "infer" are confused react without a trace of embarrasssment to even the most egregious of numerical solecisms. I rmember once listening to someone at a party drone on about the difference between "continually" and "continuously". Later that evening we were watching the news, and the TV weathercaster announced that there was a 50 percent chance of rain for Saturday and a 50 percent chance for Sunday, and concluded that there was therefore a 100 percent chance of rain that weekend. The remark went right by the self-styled grammarian, and even after I explained the mistake to him, he wasn't nearly as indignant as he would have been had the weathercaster left a dangling participle.

Oh, yes! That's us grammarians, stuffy old bores who drone on about lexical differences and can't tell when to add percentages and when to divide them by the number you first thought of! But hey, at least when we grammarians go to a party it doesn't involve first standing around distinguishing adverbs and then sitting down to watch the TV news! We have fun! There is such a thing as a grammarian who is a super fun wild and crazy guy, O.K., Mr Snooty Math Guy!

Us grammarians: we don't get no respect. I went out and bought a laptop. I chose an Apple. It had a worm in it...

Posted by Geoffrey K. Pullum at 01:46 PM

Loopy defenses of the shuttle bus sign

I have just a small afterword on the strange case of the shuttle bus signs at UC Santa Cruz that describe the counter-clockwise direction loop shuttle as "westbound" when by the nature of loops it must go east just as much as it does west. And I must caution the reader who might be inclined to go on: Warning — high level of nerdiness.

John Cowan writes confidently from a perspective informed by naive geography and an analogy with the everyday cognitive/motor experience of using a steering wheel:

Conceptually, left is treated as counter-clockwise and right as clockwise; consider the way we talk about steering wheels (we turn them left, meaning that points near the *top* move leftward). Likewise, west is normally mapped onto left and right onto east, based on the notion that north = top. So what we need to know to interpret the phrase "westbound shuttle" is what counts as the top of the loop. Fortunately, the loop is in three dimensions, and so the top is ... the top, the point of highest elevation.

So he is telling me that I should see the shuttle route as a steering wheel with the top at the north (up the hill, roughly at Crown College), so that west is left, and regard a shuttle as westbound if it is going the way a left turn would move the wheel. But meanwhile the delightfully, brilliantly nerdy Fernando Pereira of the University of Pennsylvania (how nerdy? Fernando's shirt buttons are microchips and he runs Debian Linux on his microwave oven) writes to say:

Consider a single directed loop on the Earth's surface that does not cross the Equator. Consider a homotopy between it and the Equator in which each intermediate loop is also on the Earth's surface, and no such loop crosses the Equator. If it maps the direction of the loop to the westward direction on the Equator, the original loop is westward, otherwise the original loop is eastward.

What he means (and I did have to ask him: I can handle rocket science but I need a little help with topology) is that if you stretch the line of a directed loop shuttle route in the northern hemisphere as much as necessary and move it down to collapse it onto the equator, you can call the shuttle route westward if it collapses to an east-west equatorial route (one that proceeds through the time zones the same way the sunrise does) and eastward otherwise.

And what I now have to point out to you is this. I have carefully considered what these two defenses of UCSC's Transportation and Parking Services (TAPS) department entail, and I have to tell you that they predict differently. One identifies the direction of the shuttles that have the bike racks as westbound, and the other identifies those shuttles as eastbound. They contradict each other!

So I rest my case. It's not me, it's TAPS. There is no guaranteed, unambiguous, intuitive way to say of a shuttle running a loop that it is in general running westward or eastward. (It's very much like the way there is no ambiguous common-sense basis for the interpretation of the terms incall and outcall as used by massage services.) The shuttle must make eastward progress at some point and westward progress at some other point if it is ever to come back to its starting point. So the guys who wrote the signs are completely nuts. (They're probably relatives of the guys who paint the signs on the road that say >A href="http://itre.cis.upenn.edu/~myl/languagelog/archives/001183.html"> ONLY LANE BIKE.) Language Log has spoken.

And in fact cognizant and duly empowered officials at TAPS have already accepted that this is true (everyone who is anyone reads Language Log), and they are going to change the signs. Language Log has once again been a positive force for the good of humankind.

Posted by Geoffrey K. Pullum at 01:02 PM

Statistical misspelling

Tenser, said the Tensor ("now with a new twitchy logo"), points to a puzzle: how to put an extra "a" in "Michelangelo"? The problem starts with an AP wire story about how a new mural outside the Livermore CA library misspelled the names of 11 historical figures. According to a more extensive treatment by Lisa White in the Contra Costa Times, the artist misspelled 9 names, but rather than just give a list of the names and their renditions in the mural, the article gives some clues in the style of a puzzle page: "'Shakespeare' is missing the second 'a,' 'Michelangelo' includes an extra 'a' and 'Einstein' has one 'n' too few", adding later that "Alquilar also misspelled van Gogh, adding a rogue 'u'" to the Dutch painter's name".

As TstT explains

So, that's "Shakespere" for sure, either "Einstei" or more likely "Eistein", and probably "van Gough", but where did she squeeze in an extra "a" in "Michelangelo"? I suppose "Michaelangelo" is the best bet.

I agree. In fact, I calculate that the probability of this being the correct misspelling is approximately .97, based on the following Google counts:

Spelling WhG

Mesdames et messieurs, les jeux sont faits.

Can someone in the Livermore area tell us how the game comes out? (A digital photo would be nice, taken before the mural gets fixed...)

And does anyone have a model of serial order errors in spelling (or more likely, typing) that explains the apparent pattern of "leakage" from the focus of error in the table above?

[Update: here is the artist's web page on the Livermore work, but none of the problematic names are visible. ]

[Update: Charles Belov emailed to observe that I left out "Michelangelo". Oops. Fixed now. As always, in case of less than full satisfaction, the Language Log marketing department will cheerfully refund your subscription fees in full. ]

[Update 10/8/2004: wolfangel emailed:

The list of misspellings used: http://tinyurl.com/64594.

It was Michaelangelo, which I accidentally misspell that way more often than I would like to admit (in my defense, I'm related to and friends with Michaels but no Michels).


[Update 10/11/2004: more on this story, in an article by Leslie Fulbright in the SF Chronicle of 10/8. It seems that the artist may not correct the misspellings after all, since the public reaction has been so "nasty".]


Posted by Mark Liberman at 10:06 AM

"It's hard to know where to start"

I feel like Dick Cheney in Tuesday's Vice Presidential Debate. He kept beginning his answers with "it's hard to know where to start." OK, in fact he did it only twice:

Well, Gwen, it's hard to know where to start; there are so many inaccuracies there.
Well, Gwen -- I'm sorry, it's hard to know where to start.

Still, it's one of the things that stuck in my mind from the debate. Another piece of evidence that small n-gram (word-sequence) counts can be psychologically significant. But the reason that I feel like Dick Cheney -- and it's not a good feeling -- is because of what Jane Perrone wrote in the Guardian's Newsblog yesterday evening, under the heading "Language Matters".

Here's Perrone's entry:

Bloggers are a resourceful bunch: they like nothing better than to "fact check [insert name of candidate or journalist]'s ass".

MIT Media Lab graduate student Cameron Marlow has done wonders with Perl to create a tool to help bloggers analyse transcripts of the presidential debates. Just plug in a well-worn phrase - say, "war on terror" - and up pops a phrase count (Bush 11, Kerry 7, for the record).

Marlow lists the candidates' top 25 phrases during the first Bush-Kerry clash, and repeats the exercise for last night's vice-presidential debate: Cheney's top three phases were Saddam Hussein (11), fact of the matter (10) and United States (10), while Edwards' were John Kerry (36), American people (28) and tax cuts (16).

For more analysis of the candidates linguistic skills, see Language Log, which finds that John Kerry's sentences are, on average, 17.7% longer than George Bush's. Language Log's sober analysis is that, of four reasons for the statistic: "First, Kerry might have talked faster. Second, he might have used shorter pauses. Third, he might have paused less often. Fourth, he might have used intrinsically shorter words", the second is the key factor, sidestepping the well-worn debate over whether Bush is stupid, as evinced by this piece in Slate.

Perhaps I should update the old expression, and conclude that "I don't care what they say about me, as long as they spell my URL correctly." (Another connection to Dick Cheney, who sent millions of viewers to George Soros' blog by talking about factcheck.com instead of factcheck.org). However, Perrone wasn't talking about me, she was talking about linguistic analysis.

Now the whole point of Language Log, aside from having fun, is to encourage people to think and talk about language. But without being excessively pedantic, we'd also like to encourage people to think and talk about language in a way that's sensible, factual and logical.

I'm having trouble getting to the point here, because I can't understand how Perrone, an intelligent and accomplished journalist, could have got things so wrong. On her own fascinating horticultural blog, she would never confuse a comparison of beetroot weights with an analysis of the causes of caterpillar infestation. Yet in her short "Language Matters" paragraph, she manages to mix up two aspects of linguistic analysis that are just as different.

She combines quotes from two of my posts about the first presidential debate: one in which I compare the candidates in terms of the average length of their sentences in words, measured from the official transcript; and another in which I examine the reasons for a difference in the candidates' overall rate of speech, and show that the key factor was a difference in pause length, measured in an audio recording. Now, these are logically different things. You can talk slow or fast in long sentences, and slow or fast in short sentences. You can pause more or less often, for shorter or longer amounts of time, independent of how many words you put in your sentences. And my explanation for Kerry's overall faster speech rate -- that Bush's pauses were similar in number but much longer -- had nothing to do with the relative length of their sentences.

I measured these very, very simple things -- sentence length in words, overall word count per unit time, duration of silent pauses -- for two reasons. First, these things are really easy to measure. You don't have to parse the sentences or measure vowel formants or anything time consuming, so the empirical part of the research just took a few minutes. And second, these things are really easy to understand. When Geoff explains about "fronted negative adjuncts" and "long sequences of supplements and appositives", you've got to keep your wits about you. But how hard can it be to understand the count of words in a sentence, or the duration of a silence in seconds?

Too hard, apparently. Well, looking over those posts, I can see that it's my fault. I never said explicitly that there's a difference between counting words and sentences on the one hand, and counting speech time and silence time on the other. I never defined my terms: words, sentences, seconds, speech, silence. Seriously, it takes some sophistication of thought to keep these things straight. Pauses are not periods. Cabbages are not compost. I have to remember to explain that stuff.

As for Perrone's indirect swipe at President Bush's intelligence, she would have done better not to combine that with a display of intellectual carelessness on her own part.

[Update 10/8/2004: Jane Perrone has posted a gracious correction on the Guardian's newsblog site:

I concur with Dan Gillmor when he says in the introduction to his book We the Media: "I take it for granted, for example, that my readers know more than I do - and this is a liberating, not threatening, fact of journalistic life."

So I'm glad that Professor Mark Liberman of Language Log called me on my sloppy summation of his analysis of the Bush-Kerry debate. My apologies to Mark: socks are being pulled up as I type.

I feel less like Dick Cheney already. And next time, I'll try to write more clearly. ]


Posted by Mark Liberman at 06:39 AM

October 06, 2004

Un système où tout se tient, and east is west

Geoff Pullum has described a campus bus loop where counterclockwise buses are called "westbound" and clockwise ones "eastbound", or maybe vice versa, he's not sure. The uncertainty bothers him. But I can tell you that it's not always a comfort to know.

I'm now sitting just below the "r' in "Spruce St." in the map fragment on your right, about a half mile west of Interstate 76. Nobody around here calls it I-76, though -- it's the Schuylkill (pronounced "skookle") Expressway. Since north is up as usual, you can plainly see that the Schuylkill runs from southwest to northeast through this section of Philadelphia. Reasonably enough, some of the local access signs even offer you a choice between the "north" and "south" directions, with "north" here meaning "northeast". But even-numbered routes in the U.S. are east-west routes, logically speaking, and I-76 was at some point nominally amalgamated with the Pennsylvania Turnpike, which runs east-west through the state. Though I don't know the political history, my guess is that this had something to do who pays for what. Anyhow, to maintain consistency with the rest of I-76, if you leave Penn in a geographically northeastern direction, the official signs tell you that you're going "west", while if you head in a geographically southwestern orientation, you're officially going "east".

More complex versions of this problem afflict the ring roads that partially or completely encircle many American cities. You'd think that on a circular road, the signs would say things like "Clockwise to Braintree" or "Counterclockwise to Silver Spring", but not so. Instead, compass directions seem to be assigned by some random and historically unstable mixture of local tradition, true geographical orientation, continuity considerations and even-odd number-direction parity.

This reminds me of a story that Roman Jakobson liked to tell. It seems that a certain African culture used "talking drums", with the traditional mapping of syllable count and timing onto drumbeats, using high and low pitched strokes for syllables with high and low lexical tone, respectively. Messages from the center of the village were sent by drumming on a large hollow log, one side of which sounded a low-pitched note while the other sounded a higher pitch. But then the log cracked, with the result that striking the previously lower-pitched part now actually yielded a higher pitch than the other part. What did the drummers do? Well, they went on drumming as before, except that now linguistic high tones mapped onto a lower-pitched drum sound, while linguistic low tones mapped onto a higher-pitched drum sound.

Jakobson liked this story because it illustrates the structuralist prejudice that it's only the system of contrasts that matters, not the content. But I'm almost certain that the story was false, all the same. Not the talking drum part, that sort of of thing was and is widespread in Africa. The thing is, though, if a drum had broken, I'll bet that the drummers would have fixed it or replaced it, or at worst turned it around.

Here in the U.S., we're not so sensible. The federal and state departments of transportation are hotbeds of unreconstructed structuralists. A road only goes two ways, right? All integers are even or odd, right? A plane surface, or the surface of a sphere, is two dimensional, right? Do you see where this is going? I mean, this is a system of oppositions! Clockwise/counterclockwise, east/west, north/south, even/odd, what's the difference? You're fretting about mere implementation details.

Roman Jakobson's good friend, that arch-structuralist aristocrat Nikolai Sergeevich Trubetzkoy, famously said that phonetics is to phonology as numismatics is to economics. Well, I'm a phonetician, myself, and it bothers me every time I drive northeast on I-76 West.

[Update: Emily Bender emails to point out that they do everything on a grander scale in California, even confusion:

I can beat your driving northeast on I-76 West story, by a bit: In the San Francisco Bay Area, there is as stretch of road that is both I-580 East and I-80 West (in one direction) and I-580 West and I-80 East (in the other). It actually runs North-South, of course.

(To locals, it would never be called I-580 or I-80, though. Just 580 or 80.)

Again, I blame the structuralists.


[And in defense of the midwestern forces of disorder, Daniel Drucker emailed a pointer to this picture of a north-south road segment (near Burlingame, Kansas) which is simultaneously 56 East and 31 West (or the other way around in the other direction, of course). He also diagrams the road graph that gives rise to this state of affairs:

<- 31 W _____ _____ 56 E ->
<- 56 W _____|_____ 31 E ->

I believe that I've seen clusters of road signs that encompass three of the four compass directions at once for the same stretch of road, though I can't cite an exact reference much less provide a picture. Has anyone seen a cluster that covers all four compass directions? That would be something to treasure, like the legendary intersection where four one-way streets converge, all pointing inwards. ]

[Update: Q_pheevr introduces some nifty terminology, and a picture of a three-direction road-sign cluster (421 north, 66 south, 80 west). Q even points to a picture of a four-direction cluster, from Kentucky -- but it's at an intersection, which is cheating. I'm still waiting for a cluster in which routes nominally in all four compass directions are oriented in the same physical direction on the same stretch of physical highway. This is clearly a graph-theoretic possibility, and I'm betting it happens a few times across this great land of ours.]

[Update 10/8/2004: Joshua Guenter (Editor of Pronunciation at Merriam-Webster) emailed that

There is, believe it or not, a single example of cardinal direction-sensitive highway numbering in the United States. I refer to U.S. Highway 101, which runs most of the length of the Pacific Coast. From Los Angeles to around Beaver, Washington, it's labeled pretty much what you expect of highways: North or South, regardless of what cardinal direction one may be actually heading in at the time. The thing is, though, it begins to loop around the Olympic National Park in Washington, first going east, then going South to Olympia. If this highway kept the same directional labeling system of all other highways, you'd soon find yourself going South on 101 North for about 90 miles. I guess this was just too much, so the west-east section of 101 to the north of Olympic National Park is actually labeled 101 West and East while the section to the east of Olympic National Park between Gardiner and Olympia is re-labeled 101 North and South, but corresponding to the cardinal directions.

In other words, if one drove from Los Angeles to Olympia, never leaving U.S. 101, you'd be first on 101 North (for quite a bit), then on 101 East, then on 101 South, all the time staying on the same road. This is the only example of this I know. Maybe there are others?



Posted by Mark Liberman at 11:10 PM

A westward loop

The closest buildings to the Main Entrance of my campus are a mile up the slopes of Ben Lomond mountain. Shuttle buses are used to get non-motorists up the hill. Cyclists often bring their bikes up on the shuttle and then cycle home downhill at the end of the day. One shuttle bus route goes around the campus clockwise (see map here), turning right outside the Main Entrance (at the bottom of the map and the bottom of the hill), going uphill and some way westward on a road called Empire Grade, turning right on arrival at the West Entrance, and following campus roads around to the east to come downhill to the Main Entrance to turn right again. The other goes counter-clockwise, turning left outside the West Entrance to come downhill heading eastward and then left again to re-enter the campus at the Main Entrance and head uphill and then proceed westward before turning left again and travelling eastward once again. The shuttles on these two routes are labeled "The Loop". On the shuttle buses is a sign warning cyclists that the crucial bike racks on the fronts of the buses are found "only on westbound Loop shuttles."

What in heaven's name can they mean? All the clockwise loop shuttles are going west from near the Main Entrance up Empire Grade toward the West Entrance. All the counter-clockwise loop shuttles are going west from the top of Hagar Drive toward the West Entrance. All the shuttles do some northward and southward travel as well. They have to, if they are to travel a route that is a loop. The idea that you can distinguish a clockwise from a counter-clockwise circular loop by saying that one goes to the west and the other doesn't is more than just wrong, it's a screamingly obvious geometrical impossibility. How could so many intelligent people — the Transportation And Parking Services people on campus, the sign writers, the administrators, the thousands of passengers, the top syntax-and-semantics graduate student I rode with on the shuttle this morning — have failed so utterly to see that the sign is nonsense and you cannot tell by reading it which direction you should take if you hope to see bike racks on the fronts of the buses? Is it me, or is it them? It's them, right?

Posted by Geoffrey K. Pullum at 02:02 PM

Decisiveness and clause structure

I think Mark's series of linguistic analyses of the first presidential candidates' debate (see them here, here, here, here, here, here, here, here) has been really important, and I'm still pondering on some of the issues. I have spent a particularly long time on the question (raised in this one) of what Kathleen Hall Jamieson (director of the Annenberg Public Policy Center at the University of Pennsylvania) could possibly have meant when she said (if she indeed said), "The language of decisiveness is subject, verb, object, end sentence." Is there any sense that we can suck out of this apparently loopy remark?

Mark probed a little in a scripted speech of Bush's, and found only about 5.6% of the sentences were simple Subject + Verb + Object clauses with no embellishment at all. But it seems intuitively obvious that Professor Jamieson couldn't have meant this anyway: she would not have intended to claim that you will sound decisive if you say This makes sense but not if you say This is sensible. (Sensible is an adjective, so it isn't an object.) Jamieson must at least have been thinking we could allow Subject, Verb, other stuff — object if the verb is transitive or otherwise whatever complement is appropriate. But that can't be right either. Consider Winston Churchill's stirring and famous words about the Royal Air Force fighter pilots during the Battle of Britain:

Never in the field of human conflict has so much been owed by so many to so few.

That has a fronted negative adjunct and inversion of the subject and auxiliary. Surely Professor Jamieson cannot have meant that because of it Churchill missed sounding decisive because the subject failed to precede the verb. Examples could easily be multiplied: if A crucial choice lies before us sounds decisive (it has Subject + Verb order), would anyone really say that Before us lies a crucial choice does not, just because it inverts things and has the subject last? Surely such style differences (the topic of Chapter 16 in The Cambridge Grammar, if you want to reflect more on this) do not convey a difference in whether you sound like decisive leader.

I think the only potentially salvageable part of the Jamieson claim is that long sequences of supplements and appositives should be avoided because they might make you sound dithery. She is reported as saying that people fail to sound decisive if they "speak in sentences that contain parenthetical phrases" or if they "add a series of illustrative examples before they end the sentences." But after some study of the transcript of the first presidential debate, I don't actually think that avoiding such sequences characterizes either decisive-sounding speech or George W. Bush's style.

For a start, there is nothing indecisive-sounding about this sentence of Kerry's, with its series of illustrative examples and its succession of parenthetical phrases:

I have a better plan to be able to fight the war on terror by strengthening our military, strengthening our intelligence, by going after the financing more authoritatively, by doing what we need to do to rebuild the alliances, by reaching out to the Muslim world, which the president has almost not done, and beginning to isolate the radical Islamic Muslims, not have them isolate the United States of America.

Yet there can be plenty of indecisiveness in a stream of fairly simple clauses if they are all over the map in terms of subject matter. The opening question Jim Lehrer asked of President Bush in the first debate was this:

Do you believe the election of Senator Kerry on November the 2nd would increase the chances of the U.S. being hit by another 9/11-type terrorist attack?

This was a clear allusion to the suggestion Dick Cheney publicly made, which sounded to a lot of people like a claim that a Kerry victory would cause new and devastating terrorist attacks (I have commented elsewh ere on how difficult it is to say whether Cheney really committed himself to that claim or not). Lehrer wanted Bush to get off the fence concerning whether it was part of the Republican position that just by not being Bush, a President Kerry would draw down hijacked airliners on us like a magnet. The challenge to be parried, in other words, was one about whether the Republicans hadn't been overstating the dangers of a Kerry presidency.

Bush responded with numerous short, Jamieson-compliant clauses in standard word order. But he never actually answered the question, or came close (he may not have intended to, of course), and what was more alarming was that his sentences were all over the map semantically, wildly off the topic and getting progressively more so. He made the following claims, among others, in this order, and in these words (though I omit linking remarks):

  • I don't believe it's going to happen.
  • I've shown the American people I know how to lead.
  • I understand everybody in this country doesn't agree with the decisions I've mad e.
  • People out there listening know what I believe.
  • This nation of ours has got a solemn duty to defeat this ideology of hate.
  • We have a duty to protect our children and grandchildren.
  • Ten million citizens [in Afghanistan] have registered to vote.
  • They're given a chance to be free.
  • They [the Afghans] will show up at the polls.
  • Forty-one percent of those 10 million [Afghans who have registered to vote] are women.
  • It's a phenomenal statistic.

Hold on a minute. Lehrer's question was basically "Will Kerry's election cause terrorist attacks in America?" How did we get from there to the claim "The percentage of women among registered Afghan voters, at 41%, is surprisingly high"? I don't really know. You can try to reconstruct it yourself by reading the transcript. But although many of the sentences were nice short Subject-Verb-Object canonical clauses (e.g., "an enemy realizes the stakes"), my reading of the whole answer is that we're looking at a man in a panic who has no idea what to say to the question. He has been taught a whole slew of tough-sounding clauses to reiterate, but can think of nothing to do but hurl them around at random. He demonstrates, with his succession of mostly Jamieson-compliant clauses, real intellectual weakness and indecisiveness when faced with a challenging question.

You might allege that I have amplified this by omitting connecting passages that made the logic clear. By all means read the raw original and make up your own mind. But to some extent it does Bush a favor to leave stuff out. Some of what I left out was ungrammatical (e.g., *That's how best it is to keep the peace). And there were transitions like this:

This nation of ours has got a solemn duty to defeat this ideology of hate. And that's what they are.

What? What is what who are? What are they? Ideologists? Haters? Baffled, we move on to see if it gets any clearer:

This is a group of killers who will not only kill here, but kill children in Russia, that'll attack unmercifully in Iraq, hoping to shake our will.

Quite apart from the content (no single "group of killers" has been involved in all of the atrocities mentioned), this rambling, flailing sentence has — under the Jamieson doctrine — a syntactically very loose sequence of parts. This is the subject, and the predicate has the form is + noun phrase. That noun phrase is about killers, and a series of loosely connected modifiers follows:

a group of killers who will not only [verb phrase #1] but [verb phrase #2]... that will [verb phrase #3]... [subjectless gerund-participial clause adjunct]...

Cut it any way you like, this is decisively not Jamieson-compliant syntax.

Neither of the current stereotypes about styles of speech seems to be true: Kerry does not engage in long-winded unstructured rambling; Bush sometimes does; and neither limits himself to the sort of syntax alluded to by Jamieson, even if we interpret her with maximum charity. But what is true is that you can ask Bush a question about overstatements in Republican campaign rhetoric and he will ramble off into an answer that starts talking about the gender balance in Afghanistan's electoral roll. That doesn't look like decisive clarity incisive focus to me.

Posted by Geoffrey K. Pullum at 01:32 PM

Web Crow

You've probably seen this news@nature.com piece on Web Crow, which can apparently solve crossword puzzles in any language by using Google searches. It's a cute idea. There's more context at aaai.org.

Marco Gori, one of the program's developers, explains that "the idea is not to spoil the enjoyment of players". Francis Heaney's reaction:

Well, that's a relief. I was worried that computers were going to break into my house and start solving my crosswords.

Posted by Mark Liberman at 01:00 PM

Wednesday morning grammar maven

How could I have missed this tragic gem of linguistic mis-analysis, from Greg Easterbrook's 8/24/2004 Tuesday Morning Quarterback column at nfl.com?

In the new Microsoft slogan -- "Your potential inspires us to create software to help you reach it" -- the antecedent of "it" is "software." So your potential inspires us to create software to help you reach software. This slogan must have gone through the Microsoft Word grammar-checker tool! Then again, the line does pretty much sum up Microsoft sales strategy.

So true, and yet so false!

Back on August 28, I blogged Easterbook's discussion of new proposals for the Washington Redskins' name, and specifically his plan to change his vote from the "Potomac Drainage Basin Indigenous Persons" to the "Washington Wohnata". But except for the discussion of the Eagles' prospects, I just glanced through the rest of that (long, long) column. So I missed the paragraph titled "TMQ, Grammar Snob", tucked in between the recommendation of tough love for Randy Moss and the evaluation of the Saints' defensive line.

Let's look at that Microsoft slogan again:

"Your potential inspires us to create software to help you reach it"

This slogan certainly demonstrates that short sentences are not necessarily clear sentences. But is it really true that "the antecedent of 'it' is 'software'", as Easterbrook confidently asserts?

On the linguistic merits, Easterbook is full of it -- and I don't mean that he's full of Easterbrook.

I can't persuade myself to understand this sentence the way Easterbrook thinks I should. Except as a sort of pencil-and-paper joke, I can only understand the antecedent of "it" as "your potential". In fact, I was unable to invent a sentence where pronoun linkage works the way Easterbrook thinks it does in the Microsoft slogan, even when I try to choose the material to nudge the meaning in the right direction:

??Alan Turing inspired me to create an AI program to help me write it.
??She told us to build a raft to let us float on it.

(where the boldface nouns are supposed to be the antecedents for the boldface pronouns).

There's a linguistic literature about "non-coreference conditions" that's probably relevant here, depending on what you think those conditions really are, and what you think the structure of these sentences is. But to say that Easterbrook's construal violates some grammatical rule gets it backwards. Any such rule is a summary of norma loquendi: what we (tacitly) know about our language and how we use what we know in practice. I'd be happy enough to discover that some of the people, some of the time, construe such sentences the way Easterbrook says they should. But looking to Google for guidance, I can't find any examples of roughly comparable structures where the pronouns work Easterbrook-wise:

We will write, edit, co-write or ghostwrite your book - and, if you're going to self-publish, we'll also design it inside and out, and even create material to help you promote it.
I love the fact that my job allows me to interact with people in different fields and learn about what they do and create software to help them do it .
We also offer special private life coaching in which you find out your purpose in life, form your goals, create strategies to help you achieve them, and have on-going support to help you stay motivated and follow through.
Since there may be times when you want to save email messages, Messenger allows you to create folders to help you organize them.
If you have many projects to complete in a given time period, create Tasks to help you sort them.
Where I have the contributor's permission, I'll also create links to help you contact them for additional advice.
Then we'll work with you to identify the decisions that you need to make and create solutions to help you make them.
If people have rights, they will create organizations to help them defend them.
Explain to the class that some of these words are similar and ask the students to help you group them together in logical clusters.
Then, bring all of the pizzas out and ask the children to help you sort them according to size.
Once, in my father's presence, he claimed that he had flung his resignation in the King's face, and that he had controlled the voting in the Conclave, forgetting that he
had asked my father to beg the King to take him back ...
He knew that the chances of them being alive at the end of his long sentence were extremely remote, and so he asked my help to request the authorities to let him visit them just once.
Pray for this work - and make a gift to help us sustain it.

As far as I can tell, the norms of English usage are 100% against Easterbrook and in favor of Microsoft. In fact, I'm skeptical that Easterbrook's linguistic intuitions really support his own assertion. I suspect that he's blindly applying a piece of false grammar-maven lore, namely the view that the antecedent of an anaphoric pronoun must be the most recent noun phrase that agrees in number and gender.

Now, it's a good idea to worry about confusing pronoun linkages, but insisting that the antecedent must always be the string-wise-closest previous suitable noun phrase is nonsense. In Easterbrook's own column, there are several counterexamples that are perfectly clear and reasonable English:

"Wohnata" would take a bit of getting used to, but is no harder off the tongue than commonly spoken team names like Knickerbocker. It even works in the fight song... ["it" refers to "Wohnata", not "Knickerbocker" or "the tongue".]

Let's see, the Falcons were 3-1 with Michael-Mike Vick and 2-10 without him. What could that mean? Could the CIA figure it out? Don't answer that! Could the new national intelligence czar proposed by the 9/11 Commission figure it out? ["it" refers to "what that could mean", but not to "the CIA" or "the 9/11 Commission"]

There are lots of reasons why skipping the string-wise nearest possible antecedent might work -- but surely the fact that it's structurally unsuitable -- as in the Microsoft slogan -- is a pretty good one!

I like Easterbrook's TMQ column, and I especially like the self-deprecating TMQ logo. It fits this case particularly well.

[Easterbrook "Grammar Snob" link from Mike Pope's web log. ]

[Note that I'm no fan of that Microsoft slogan. It's aesthetically awkward -- an unmotivated little tangle of small clauses. It's psychologically inauthentic -- surely it's our money that mainly inspires Microsoft, not our potential. It's bad faith morphologized. But it's not ungrammatical. ]


Posted by Mark Liberman at 10:14 AM

October 05, 2004

Vociferously global

Maybe now John Richetti will see what Kathleen Hall Jamieson meant when she said that "words found on the SAT verbal exam should not appear in candidate's speeches." John, as an English professor, wrote that "[t]his disgracefully simple-minded ... analysis ... is an insult to me and I should think to all of us who teach writing and communications." But in last Thursday's debate, both candidates violated Kathleen's rule at least once, and for both, it was a mistake. And Kerry's vocabulary blunder was much more serious and consequential than Bush's.

Bush used at least one potential SAT verbal test item in a way that was surprising, if not completely wrong:

In Iraq, no doubt about it, it's tough. It's hard work. It's incredibly hard. You know why? Because an enemy realizes the stakes. The enemy understands a free Iraq will be a major defeat in their ideology of hatred. That's why they're fighting so vociferously.

Vociferous means "Making, given to, or marked by noisy and vehement outcry", according to the American Heritage Dictionary. Though there are certainly plenty of noisy and vehement outcries in Iraq, the problem is not the outcries but the bullets, bombs and beheadings. Bush may have meant viciously or vigorously or something like that, and substituted vociferously as a malapropism; or he may think that vociferous means something like "strong and active in an unpleasant way"; or maybe he really did mean that the Iraqi enemy is fighting in a vocally noisy way.

At best, vociferously was a distracting word choice. At worst, though, it was just another small verbal slip, annoying to some of the people who dislike Bush, and irrelevant to everyone else.

But when Kerry violated Jamieson's Rule, it was Big Trouble.

No president, through all of American history, has ever ceded, and nor would I, the right to preempt in any way necessary to protect the United States of America.

But if and when you do it, Jim, you have to do it in a way that passes the test, that passes the global test where your countrymen, your people understand fully why you're doing what you're doing and you can prove to the world that you did it for legitimate reasons.

The problem here is that global has several meanings (again from the AHD):

1. Having the shape of a globe; spherical. 2. Of, relating to, or involving the entire earth; worldwide: global war; global monetary policies. 3. Comprehensive; total: “a . . . global, generalized sense of loss” (Maggie Scarf). 4. Computer Science Of or relating to an entire program, document, or file.

So which meaning did Kerry intend? The first two meanings for global are "spherical", which doesn't make sense in the context of Kerry's sentence, and "worldwide", which does. The Republicans have jumped on this, to argue that Kerry's second sentence (in the quote given above) commits him to asking permission from the likes of Jacques Chirac before acting on the world stage. Worse, the quote seems to put international relations into the frame of academic test-taking, leaving Kerry wide open to this clever satire.

Rivka at Respectful of Otters argues, as others also have, that Kerry really meant sense #3, "comprehensive" or "total", and that the context makes this clear. I think she's probably right, if only because the alternative is allow Kerry's second sentence to directly contradict his first one. However, Kerry immediately goes on to emphasize the importance of proving legitimacy to the world, reinforcing the "worldwide" meaning of global.

At best, global was a disastrously foolish word choice, requiring Kerry's listeners to ignore a commoner meaning whose policy implications are opposite to those of the rarer sense he had in mind. Try a Google search on global: among the first 100 hits (all that I checked), every single one means "worldwide". At worst, global was a verbal slip that expressed Kerry's true views on international action.

If Kerry meant "comprehensive test" or "total test", he should have used one of those terms. Comprehensive is a somewhat rarer word than global overall (11.5M to 29.5M web hits on Google), while total is somewhat commoner (59.3M WhG) -- but the "comprehensive, total" sense of global is very rare indeed. Definitely SAT verbal exam territory. He shoulda listened to Kathleen.

There's also a "framing" issue here, and one with genuine content, not just a matter of lexical choice. Talking about a test, of whatever kind, invokes a certain conceptual framework: something or someone is tested, some person or group evaluates the test, etc. Here what is being tested is "the way you take preemptive action", but the graders of the test are not very clear -- apparently it's both "your countrymen" and "the world". Kerry might have quoted Thomas Jefferson, whose view of the responsibilities of those who take strong action was that "a decent respect to the opinions of mankind requires that they should declare the causes which impel them" to act. Jefferson did not take the position that the rebellious colonists needed to prove anything to the rest of the world before acting, though he certainly felt that their actions were principled ones, which could and should be explained. I believe that even Kerry's supporters will agree that his confusing word choice in the debate reflects a more general unclarity -- at least of presentation and maybe of conception -- on this difficult point.


Posted by Mark Liberman at 10:27 AM

Good theory, bad practice -- or contrariwise?

Over the past month, several weblogs have posted long and thoughtful evaluations of George Lakoff's ideas about framing political discourse. Several l. and t. evaluations each, in fact.

Chris at Mixing Memory has a half a dozen recent posts on Lakoff, whom he calls "one of my least favorite linguists (his work gives embodied cognition a bad name), but one of my favorite political commentators": 9/09 "Framing the convention", 9/16 "Karl Rove the Feminist Bankteller", 9/21 "Lakoff in the Blogosphere", 9/22 "Understanding Frames with an Eye Toward Using Them Better", 9/27 "Lakoff's View of Metaphors", and 10/02 "Lakoff is Everywhere!"

Meanwhile, Semantic Compositions has been arguing the other side, in posts on 9/30 "Maybe try thinking of a donkey", 9/30 (again) "What George Lakoff knows about the mind" and 10/1 "How not to test a hypothesis". [Update: and now (10/5) another installment: "Excellent, excellent".] SC thinks that Lakoff has "[taken] a worthy theory about cognition and metaphors, and [turned] it into a rather less impressive theory of political speech". In slogan form, SC calls this "good theory, bad practice". (Since Chris argues that Lakoff has taken a bad theory of cognition and turned it into good political commentary, I can't resist noting that SC is thus acting as the anti-Chris. Sorry, SC).

If you haven't already had enough, this search will net you a baker's dozen Language Log posts that mention George Lakoff, although with less length and thoughtfulness. (You'll also find one post citing Robin Lakoff, as a bonus).

And the 1996 Cognition paper by Gregory Murphy, which Chris cites as a source of qualms about Lakoff's theory of metaphor, is here: Murphy, G. 1996. "On metaphoric representation", Cognition 60: 173-204 (.pdf). Its abstract:

The article discusses claims that conceptual structure is in some part metaphorical, as
identified by verbal metaphors like LOVE IS A JOURNEY. Two main interpretations of
this view are discussed. In the first, a target domain is not explicitly represented but is
instead understood through reference to a different domain. For example, rather than a
detailed concept of love per se, one could make reference to the concept of a journey. In the
second interpretation, there is a separate representation of love, but the content of that
representation is influenced by the metaphor such that the love concept takes on the same
structure as the journey concept. It is argued that the first interpretation is not fully coherent.
The second interpretation is a possible theory of mental representation, but the article raises
a number of empirical and theoretical problems for it.

Not mentioned by Chris, but also worth reading, are the response by Raymond Gibbs ("Why many concepts are metaphorical"), Cognition 61: 301-319, and the re-response by Murphy "Reasons to doubt the present evidence for metaphoric representation", Cognition 62: 99-108 (.pdf). Gibbs and Murphy manage to disagree sharply while not only remaining civil, but also learning from one another.

I may say something more about all of this later on, but I hope you enjoy the links, anyhow.


Posted by Mark Liberman at 12:16 AM

October 04, 2004

Kerry's words and Bush's silences

The secret of John Kerry's debating success, as imagined by Harry Shearer, reported by the Philadelphia Inquirer, and inspired by Kathleen Hall Jamieson.

And we counted the results. But while Kerry was learning to limit his words, it seems like nobody was working on W's silences.

[via Metafilter]

Posted by Mark Liberman at 03:27 PM

Language scholarship and language teaching

Lee Smith has an interesting article in Slate today, under the headline "The Language Gap: Why Middle Eastern linguists are hard to find, even though the government has been funding the field". One key quote:

One problem is that language instruction is typically not a high priority in academia, where other disciplines enjoy more prestige. "Universities have tended to relegate language pedagogues to the status of lecturers, who don't get the same salary or tenure rights as professors," Amy Newhall, the executive director of the Middle East Studies Association told me. Federal funds evidently haven't done much to change the calculus.

This is certainly often true, but not always. For example, my colleague Roger Allen, professor of Arabic Language and Literature at Penn, has long been a tireless promoter of language teaching, both at Penn and on the national scene. As his web page explains, Roger "is a certified Arabic proficiency tester for the American Council for the Teaching of Foreign Languages (ACTFL) and in 1986 was asked to serve as ACTFL's national Trainer of Testers. Since then he has led a large number of workshops on language teaching and learning, involving materials preparation, classroom instruction, and testing. Along with Adel Allouche he has completed a proficiency-based textbook for standard Arabic using computer-assisted instructional methods, Let's Learn Arabic [1988]." Roger teaches Arabic courses at all levels himself, as I first learned in 1990 when he asked me for help in getting instructional tapes digitized.

I'll confess, though, that it's rare these days to find people in positions like Roger's -- he's a full professor with tenure and a strong reputation as a literary scholar -- who are also committed to language teaching. This is just as true among linguists, alas, as it is among literary types. Though it's hard to separate cause and effect, this dereliction of duty is probably not unconnected to the strength, in some language-teaching circles, of attitudes similar to those of the Whole Language movement in reading instruction. I'm talking about the general idea that adult language learning should work just like child language learning does, i.e. without any component of explicit analysis. Milder forms of this disease merely forbid giving students any explicit analytic guidance, whereas stronger forms try to prevent teachers from focusing on imparting particular constructions or morphological devices, or even targeting particular vocabulary. The argument here is that children learn their first language without any conscious analysis and without any particular planning on the part of those they learn from, so ...


Posted by Mark Liberman at 10:39 AM

Crib notes and earphones

Wild rumors are flying about dirty tricks at last Thursday's debate. Some Bush partisans are complaining about an object that Kerry apparently transferred from his jacket pocket to his podium, speculating that it might have been an index card full of key facts. On the other side, some Kerry partisans are suggesting that Bush's handlers were feeding him lines through a wireless earphone or bone-conduction system.

Here at Language Log Labs, we've been working to evaluate both rumors in a professional and scientific fashion. Well, at least I put in a few minutes over my breakfast coffee.

Let's take up the Bush business first. According to Kevin Drum's Political Animal weblog at Washington Monthly,

It all started when Bush looked up halfway though an answer during Thursday's debate and snapped petulantly, "Let me finish." This is a trademark Bush line and normally wouldn't draw any comment except for one thing: no one had interrupted him. He had plenty of time left, Kerry hadn't said anything, and Jim Lehrer hadn't said anything either. So who was he talking to?

The theory making the rounds is that he was wearing an implanted earpiece of some kind and was reacting to advice from whatever handler was on the other end.

The "let me finish" aside followed a rapid sequence of disfluent false starts. Here's my transcript, with a link to an audio file of the crucial bit:

I decided [pause 0.314]
the right action was in Iraq. [pause 0.539]
My opponent calls it a mistake -- it wasn't a mistake. [pause 0.306]
He said I misled on Iraq. [pause 0.292]
I don't think he was misleading when he called Iraq a grave threat in the fall of 2002. [pause 1.28]
I don't think he was misleading when he said [pause 0.126]
that it was right to disarm Iraq [pause 0.983]
in the spring of 2003. [pause 0.625]
I don't think he misled you when he said that, [pause 0.585]
you know if you- [pause 0.407]
anyone who doubted whether the world was better off without Saddam Hussein in power sh- didn't have the judgment to be president, I don't think he was misleading. [pause 1.382]
I think what is misleading [pause 0.244]
is to say you can lead [pause 0.723]
and succeed in Iraq [pause 0.597]
if you keep changing your positions [pause 1.381]
on this war. And he has. [pause 1.252]
As the politics change, his positions change. [pause 0.126]
And that's not how a commander in chief acts. [pause 1.494]
I- t- I- uh uh w- let me finish ((here)). The intelligence I looked at [pause 0.81]

was the same intelligence my opponent looked at, [pause 1.503]
the very same intelligence. [pause 0.784]
And when I stood up there and spoke to the Congress, I was speaking off the same intelligence he looked at to make his decisions to support the authorization of force.

Listening to this, and looking at the video, I believe that one of Kevin Drum's commenters (identified as Robert Earle) has it right (except that his estimate of the preceding pause length is a half-second off):

It is pretty obvious that Bush is looking down at Jim Lehrer while talking at that moment, and while answering Bush has paused for about two seconds just before he says 'Let me finish'. Obviously Lehrer thinks Bush is done, and is just about to ask the next question, which is a question to Bush.

The immediately previous passage in Bush's response sounds very much like an ending, both in its content and in the way he delivered it. So it would make sense for Lehrer to assume he was done, and begin to ask the next question. And if Bush started his peroration just as he saw Lehrer preparing to ask the next question, that would explain both the disfluency and a sotto voce aside to Lehrer, "let me finish here".

It's also relevant that Bush commonly uses "let me finish" or similar expressions as a way to control the flow of interviews and press conferences. And the surrounding passage does not seem to have the rhetorical polish that one would expect from lines being fed in via a hidden communication system.

As for the object that Kerry removed from his jacket pocket at the start of the debate, this accusation seems to be a more serious one. Preliminary results from the Language Log Labs image analysis department tend to support the hypothesis, first advanced by some sharp-eyed bloggers, that the object was a slim, rectangular can of whup-ass.

[Earphone rumor via Prentiss Riddle]

[More on this topic here.]


Posted by Mark Liberman at 08:13 AM

October 03, 2004

This isn't rocket science

How did the phrase This isn't rocket science came to have its idiomatic meaning "This isn't all that advanced or hard to understand"? I've got a few cliché dictionaries, but they don't cover it. Why is rocket science a byword for arcane advanced science? Rocket technology is thousands of years old. Sulfur, saltpeter, and charcoal powder in a tube, light and retire. A few tests and a little trigonometry will tell you where it will land; a little calculus and some data on thrust and combustion rates and you can work out the acceleration and the trajectory and everything. It's applied basic Newtonian physics and math, but although space flight demands some advanced science, the science of firing shouldn't really be emblematic of the most difficult stuff scientists ever got into. I thought about this as I read today's New York Times exposé of how the Bush regime ignored the advice of senior researchers and went with an exaggerated version of one junior analyst's idea that Iraq was purchasing aluminum tubes for uranium-enrichment gas centrifuges. The tubes were in fact almost certainly for rocket bodies. And the Bush administration had been told that. They hushed it up, and looked me straight in the eye and lied to me about it, and that makes me angry.

In the early 1990s, Iraq had done some experimentation with building gas centrifuges. They used tubes about 300mm long, 145mm in diameter, made of hard aluminum 1.1mm thick. They did get one centrifuge to work for a while. What the US noticed a decade later, in 2000 and 2001, was that Iraq was ordering tens of thousands of tubes that they were quite different: three times as long (900mm), three times as thick (3.3mm), and much narrower (only 81mm). One junior intelligence analyst thought that although they weren't at all like modern American centrifuge rotors, nonetheless they might be usable in what are known as a Zippe centrifuge. A team of scientific experts decided otherwise. So did the senior CIA leadership. The specs were all wrong for centrifuge tubes. And the Iraqis were anodizing the tubes. That protects them from the weather if they're left outside all the time like weapons systems are, but it makes them less suitable for centrifuging uranium hexafluoride gas. Was any of that too technical for you to take in? I wouldn't think so.

All of the doubts of the scientific and intelligence community were kept from the public. What we got instead was Dick Cheney and Condoleezza Rice telling us flat-out lies.

"He has reconstituted his nuclear program," said Cheney of Saddam Hussein. The CIA reports said only that there was evidence that "could mean" this or "suggested" it.

Rice spoke of "aluminum tubes that really are only suited ... for nuclear weapons programs." The truth is that the tubes were basically hopeless for centrifuging uranium isotopes but ideal for rocket bodies. The 7075-T6 hard aluminum of which they were made is not limited in application to making centrifuge rotors; it is actually used by the US for the Mark 66 air-launched 70-millimeter rocket, and the tubes needed to provide those with a combustion chamber are very similar indeed to what the Iraqis had been openly purchasing around the world. The Iraqis had been making rockets with similar tubes for years.

No one enjoys being treated like a mushroom — kept in the dark and fed only bullshit. I hated it when CBS stonewalled over Rathergate and half got away with it. (Everyone who wanted to believe that maybe there were genuine memos by an Air National Guard colonel grumbling about the young George W. Bush's undistinguished and reluctant service followed the CBS half-truth about failure to authenticate. The New Yorker has a piece about it in the October 4 issue, "Rather Knot" by Nancy Franklin, and still it talks about how "it came to light that CBS could not authenticate the documents after all." That is not what came to light. What came to light was that the documents were crude forgeries.)

But let's face it, the forged Killian memos, faked so that some damn fool opponent of Bush could try to smear him during an election campaign, fall away into triviality when compared with lying to the country about crucial intelligence information that was to be the basis for a full-scale invasion and taking down of the regime of a sovereign country. This is not Dan Rather being fooled by a silly hoax that would have been useful only for campaign embarrassment even if true. This is serious.

My very first post on Language Log concerned a half-lie that President Bush told in his first State of the Union address, and the dishonest defense that Dr Rice provided by switching verbs, making a definite lie out of a merely implied one. That was bad. It related to whether the USA had information that the CIA trusted about Iraq attempting to purchase uranium ore in Niger. The answer is no: there was no such evidence. But the British had said there was in an intelligence document of theirs (which the CIA disputed). So Bush reported on what the British had "learned" (the verb learn cleqarly carrying the implication that the thing learned was a fact), and Rice defended him later by saying that "the British have said that", which dishonestly switches the crucial verb to say. Basically, the Bush regime lied.

And now it emerges that they have done it again, much worse. The fictional uranium ore purchase was merely mentioned in passing. But the aluminum tubes were made the central focus of Colin Powell's presentation to the UN Security Council. They were the only physical evidence the USA even claimed to have. Those tubes "really are only suited ... for nuclear weapons programs," said the former Provost of Stanford University, lying in her teeth, and contradicting detailed expert evidence that she had access to but was suppressing. (And she is still, of course, stonewalling, saying she knew there was debate but — as National Security Adviser to the President, speaking publicly on this very topic — she was not aware of its content.) The truth is that the tubes were not suited to nuclear weapons production at all. And understanding why is no more of an excursion into advanced technology than was the typographical evidence about the forged Killian memos. Read the article in the Times and see. No calculus is needed.

This is not rocket science, I want to say. Only the idiom is the reverse of the literal truth. Our government has lied to us again, and this time it was rocket science.

Posted by Geoffrey K. Pullum at 08:28 PM

The rhetoric of silence

I've pointed out that in Thursday's debate, John Kerry's sentences were 17.7% longer than George Bush's. Since the two men had the same amount of time to speak, you expect this to mean that Kerry used fewer sentences. And he did, 468 to 476. However, that's only a 1.7% difference. Kerry accounted for most of his greater sentence length not by using fewer sentences, but by packing more words into the same amount of time. 15.8% more, 7,136 to 6,165.

How'd he do that? New LexicoTardis® technology from the Rockridge Institute? Well, there are four obvious possibilities. First, Kerry might have talked faster. Second, he might have used shorter pauses. Third, he might have paused less often. Fourth, he might have used intrinsically shorter words.

A few quick and simple measurements suggest that the second of these four was the key factor. In the section of the debate that I examined, Bush used about the same number and frequency of pauses as Kerry did, but Bush's pauses were much longer. In between the pauses, Bush actually talked faster, but the pauses were so much longer that his overall speech rate was slower. This was measurably true in the beginning of the debate, and I suspect that the pattern continued or strengthened as time went on.

At least in the beginning of the debate, some of Bush's pauses also seem to derive more from cognitive factors internal to his speaking process, than from any consideration of the effects on the listener. In contrast, Kerry's early pauses seem more often to have been calculated for rhetorical effect.

In order to look at this question, I segmented each candidate's first (two-minute) answer and first (90-second) rebuttal into periods of speech and silence. I ignored silences less than 300 milliseconds long, since there can be normal phonetic events in this range, and often such short silences are not perceived as silent pauses at all. I counted each "turn" as starting when Jim Lehrer stopped talking, and ending when the candidate stopped talking for the last time. I used the official debate transcripts as an arbiter of word counts and sentence divisions, though I noted that these are slightly wrong in several places.

The basic result is striking. Across their first 210 seconds of debating, the two men were quite similar in overall number of silent pauses (57 for Bush, 60 for Kerry), and frequency of pausing (16.1 per minute for Bush, 17.3 per minute for Kerry). Both men averaged 9.7 words per silent pause. However, Bush's pauses averaged 84% longer (1.1 seconds vs. about 0.6 seconds), and so Bush spent 75% more time in silence (62.7 seconds vs. 35.9 seconds).

As a result, Bush's overall speech rate was slower (155 words per minute vs. 167 words per minute), but while the two men were actually talking, Bush talked considerably faster (220 words per minute vs. 202 words per minute).

Now, as with most other sorts of measurements, pauses are neither intrinsically good nor intrinsically bad in a debate (within reasonable limits, of course). For example, John Kerry's longest pause (1.268 seconds) was his first (answer-internal) one, and it was a rhetorically effective gesture.

Jim Lehrer: Do you believe you could do a better job than President Bush in preventing another 9/11-type terrorist attack on the United States?

John Kerry: [pause 0.278] Yes, I do. [pause 1.268] But before I answer further, let me thank you for moderating. [pause 0.588] I want to thank the University of Miami [pause 0.564] for hosting us. And I know the president will join me [pause 0.831] in ...

This 1.278-second pause underlined the gravitas and simplicity of his answer, and also set off his interpolated welcoming remarks from his answer proper. (Here's a sound file of the passage above).

George Bush's longest pause was also one of his first ones, but it was by no means such a success [audio link].

Jim Lehrer: Mr. President, you have a ninety-second rebuttal.

George W. Bush: [pause* 0.055] uh uh I- [pause* 0.165] I, too, thank the University of Miami, and [pause 0.454] and uh [pause 2.116] and say our prayers are with [speeds up] the good people of this state, who've suffered a lot. [pause 1.304] um [pause 1.507] September the eleventh {sigh} [pause 1.212] changed how America must look at the world. ...

The "pauses" marked with asterisks were not counted as pauses in my tabulation, as they were below the .3-second threshold. I've put them in here to underline the first mistake that the President made here. Rather than pausing for a short time to get ready to speak, he jumped in as soon as the moderator finished. The 55 millisecond separation is not noticeable as a pause. But what he jumps in with is the rapid disfluency "uh uh I-", followed by another sub-threshold silence of 165 msec., followed by his actual opening. The whole disfluent beginning only amounts to about 650 msec., and it would have been better to stay silent for that period of time. Kerry's first rebuttal begins with a 929-msec. silence, and that sounds fine, much better than filling the pause with uh-uh-ing.

After that shaky start, the president's continuation is worse. He says "I, too, thank the University of Miami, and", which is fine as far as it goes, but then he kind of goes off line for a while. He pauses for almost half a second, temporizing with "and uh" (which was elided from the official transcript), and then pauses for more than two seconds before continuing, very rapidly, "and say our prayers are with the good people of this state, who've suffered a lot." The whole "[pause] and uh [pause]" sequence lasts 3.322 seconds, which is really a lot of dead air in a formulaic opening.

I agree with Jay Nordlinger that Bush is capable of doing much better than he did in this first debate. In his stump speeches, he's used to using long pauses to give time for audience reaction. That may be why he tends to use longer pauses in general -- but it doesn't explain his apparent distraction and disfluency at the very start of his first turn in this debate.

This distraction apparently continued as the president turned to the set piece with which he had probably intended to open the debate, starting "September the eleventh changed how America must look at the world." It's a good passage, in my opinion. The delivery was weak, though, starting with the strange exasperated sigh at the end of "eleventh", and continuing with the even odder timing of the second sentence:

And since that day, our nation [slows down] has... been.. on... a... [pause 0.599] multi-pronged strategy to keep our country safer.

which I guess might originally have been written as something like "my administration has been implementing a multi-pronged strategy to keep our nation safer..." If so, W recovered pretty well from substituting the wrong noun phrase in subject position, but he should not have been behind that particular eight ball to start with.

Overall, it seems almost as if W was rattled by having to interpolate some words of thanks, as Kerry did, before launching into his first set of prepared remarks (yes, I know it was a rebuttal, but ...).

For those who want to delve into some deeper phonetic wonkery here, the following table summarizes the measurements I made. Note that I did this all rather quickly, so I there is probably a mistake or two; but I'm pretty confident that the overall conclusion is correct. At least it's correct with respect to the first 420 seconds of the debate -- obviously things changed somewhat over the course of the encounter, and it'd be interesting to see how. But I'm pretty sure that the basic result will hold up -- Bush used longer pauses, and didn't always use them effectively.

  Duration Sentences Words Pauses Pause time Speech time Duty Factor WPM


Words per pause Pauses per minute MSec. per pause
Answer #1
Bush Rebuttal #1
Answer #1


Rebuttal #1
Bush overall
Kerry overall

For those who want even more detail, here are the histograms of pause durations (remember this is just in the first 2-minute answer and the first 90-second response from each candidate):


I'd like to repeat my earlier comment about the growing focus on political style as opposed to content: "As a linguist, I reckon it's good for business. As a citizen, I think it's bad for the country."

There's nothing wrong with paying attention to the phonetics of rhetorical effectiveness. But this is the proper study of linguists and (advisors to) politicians, not voters at large -- except insofar as it may help to avoid being manipulated. So the rest of you should go read some policy statements and discuss them with your friends and neighbors.


Posted by Mark Liberman at 01:07 PM

October 02, 2004

Getting it wrong

Mark's post this morning about word counts from the Thursday presidential debate initially made me wonder how the word "wrong" escaped Mark's list, given my impression that Bush repeated the phrase "wrong war, wrong place, wrong time", like, a hundred times. So I did my own search of the transcript (using the Find function in my browser and counting them off by hand; I'm less sophisticated than Mark in this regard and probably a hundred others) and found the reason: Kerry used the word "wrong" 11 times to Bush's 26, which means the ratio was almost 2.5 to 1 -- way too low to make Mark's list.

21 of Bush's 26 uses of the word "wrong" (81%) were in the context of the "wrong war, wrong place, wrong time" phrase he repeated 7 times during the debate (sometimes with "at the" instead of the commas, sometimes with "wrong place" and "wrong time" reversed, etc.). Clearly, the Bush team thought it would be a great idea for him to make repeated reference to Kerry's statement in early September that the invasion of Iraq was "the wrong war in the wrong place at the wrong time" and to contrast this with the fact that Kerry "voted to authorize the use of force", that Kerry and Bush made this decision based on "the same intelligence", and so on. You know, the flip-flop thing. The mixed message/signal thing.

But was this strategy effective? According to George Lakoff, simply saying a word or phrase, whether you're just quoting it or even flat-out denying it, does a good job of reinforcing the word or phrase itself -- perhaps a better job than whatever it is that you're really trying to communicate. As Lakoff put it in his recent interview on NOW with Bill Moyers: "It's like Richard Nixon getting up there and saying, 'I am not a crook,' and people think of him as a crook."

I wonder: how many viewers of the debate now have more significant doubts about the war in Iraq given these 7 repetitions of the relevant phrase?

  1. First of all, what my opponent wants you to forget is that he voted to authorize the use of force and now says it's the wrong war at the wrong time at the wrong place.
  2. I don't see how you can lead this country to succeed in Iraq if you say wrong war, wrong time, wrong place.
  3. My opponent says help is on the way, but what kind of message does it say to our troops in harm's way, "wrong war, wrong place, wrong time"?
  4. So what's the message going to be: "Please join us in Iraq. We're a grand diversion. Join us for a war that is the wrong war at the wrong place at the wrong time?"
  5. They're not going to follow somebody who says, "This is the wrong war at the wrong place at the wrong time."
  6. They're not going to follow somebody who says this is the wrong war at the wrong place at the wrong time.
  7. And if I were to ever say, "This is the wrong war at the wrong time at the wrong place," the troops would wonder, how can I follow this guy?

These quotes are listed here in their order of appearance in the transcript. Notice how the very first one sends the apparently intended message of the Bush camp pretty strongly, but each subsequent one seems weaker than the last. By the last one, I'm thinking: Bush just can't admit he was wrong, and he's willing to sacrifice more American lives just so that he doesn't have to admit that he was wrong.

It helps that this thought in my head is basically the same message that Kerry was hammering home in one way or another in all of his 11 uses of the word "wrong":

  1. I'll never give a veto to any country over our security. But I also know how to lead those alliances. This president has left them in shatters across the globe, and we're now 90 percent of the casualties in Iraq and 90 percent of the costs. I think that's wrong, and I think we can do better.
  2. The president relied on Afghan warlords and he outsourced that job too. That's wrong.
  3. I've met kids in Ohio, parents in Wisconsin places, Iowa, where they're going out on the Internet to get the state-of-the-art body gear to send to their kids. Some of them got them for a birthday present. I think that's wrong.
  4. I believe that when you know something's going wrong, you make it right.
  5. When I came back from that war I saw that it was wrong.
  6. There was a right way to disarm him and a wrong way.
  7. And the president chose the wrong way.
  8. Now, if you break it, you made a mistake. It's the wrong thing to do. But you own it. And then you've got to fix it and do something with it.
  9. And you have to do that by beginning to not back off of the Fallujahs and other places, and send the wrong message to the terrorists.
  10. I'm interested in working with our nations and do a lot of it. But I'm not going to make decisions that I think are wrong for America.
  11. But this issue of certainty. It's one thing to be certain, but you can be certain and be wrong.

The first 2 of Bush's 5 other uses of "wrong" were, in my opinion, seriously off-message by comparison:

  1. I won't hold it against him that he went to Yale. There's nothing wrong with that.
  2. My opponent is for joining the International Criminal Court. I just think trying to be popular, kind of, in the global sense, if it's not in our best interest makes no sense. I'm interested in working with our nations and do a lot of it. But I'm not going to make decisions that I think are wrong for America.

The last 3, one right after the other, appear to be back on track:

  1. Mixed messages send the wrong signals to our troops.
  2. Mixed messages send the wrong signals to our allies.
  3. Mixed messages send the wrong signals to the Iraqi citizens.

Debate-viewing swing voters, I think, should be having serious doubts about a second Bush term. But that's already been said.

[ Comments? ]

Posted by Eric Bakovic at 02:56 PM

What "a hundred times" means

Animals are exquisitely sensitive to deviations from the expected frequency of events that matter to them. For us humans, I believe, this statistical sensitivity applies to all sorts of linguistic events, which often strike us as unusually common at counts as low as two or three, spread over hours or days of talk, or hundreds of pages of text. In a post last spring, I gave some evidence from my own reactions to dialogue in a novel, but I'm hardly an unbiased observer. So I'm happy to be able to cite some additional evidence from a journalistic source.

Jay Nordlinger, chronicling his impressions of Thursday's debate, noted Bush's repetitions:

Bush said, "We're makin' progress" a hundred times — that seemed a little desperate. He also said "mixed messages" a hundred times — I was wishing that he would mix his message. He said, "It's hard work," or, "It's tough," a hundred times. In fact, Bush reminded me of Dan Quayle in the 1988 debate, when the Hoosier repeated a couple of talking points over and over, to some chuckles from the audience (if I recall correctly).

Staying on message is one thing; robotic repetition — when there are oceans of material available — is another.

The actual counts for these phrases in Thursday's debate are given below:

Bush count
Kerry count
making progress
mixed messages
it's hard work
it's tough

Now, if I were really one of those bloggers who "shriek 'gotcha!' at tiny factual errors in articles written on short deadlines by people who actually have to leave the house to do their work", as NPR critic John Powers put it, I'd be all over Jay Nordlinger. "Ha!" I'd say, "over the National Review, I guess that their math is a little fuzzy these days. They seem to think that 100 equals 3. No wonder the budget deficit doesn't bother them..."

But I'm not, so I won't. I understand perfectly well that when Nordlinger said "a hundred times", he really meant "a lot", or more accurately "often enough that I noticed it and it annoyed me".

What's interesting is that "often enough that I noticed it and it annoyed me" turned out to be three times, in the case of Nordlinger's reaction to the phrase "making progress" in Bush's debate performance. In just the same way, "often enough that I noticed it and saw it as characteristic of a particular speaker" turned out to be three times, in the case of my reaction to the phrase "and yet" in Max Barry's novel.

It's not that our math is fuzzy, but that our linguistic reactions are sharp. Yours too, I bet.


Posted by Mark Liberman at 11:31 AM

hard opponent must hope: today even respect issues

2004 is the 40th anniversary of Gerald Salton's SMART system for full-text information retrieval, or at least of the earliest documentation of it that I've seen. One of the key insights of this system was that the content of a document can be surprisingly well approximated by nothing more than the frequency counts of the words in it. This insight is still a fundamental part of the document retrieval systems that we all use today, and in this post, I'm going to apply it to the transcript of Thursday's presidential debate.

A simple but compelling demonstration of this idea emerged a side-effect of a collaboration, about 17 years ago, between some Bell Labs researchers in my (then) department and the HarperCollins publishing company. We were using what would now be called "data mining", to find new terms for inclusion in the 5th edition of Roget's Thesaurus. As part of that effort, HarperCollins gave us the typographer's tapes for most of the books they published one year. We used statistical techniques to find frequent words and phrases that were missing from the 4th edition, and the 5th edition's editor, Robert Chapman, then looked those lists over to decide what to add.

Ron Hardin, one of my colleagues at Bell Labs (and the original author of festoon), took the texts of these 400-odd books, and implemented a simple but elegant little hack. He counted the frequency of all the words in each book, and then sorted them according to the ratio between their frequency in that book and their frequency in the overall set. The top of each books' list gave a pretty good idea of what the book was about:

    "College: the Undergraduate Experience:" undergraduate faculty campus student college academic curriculum freshman classroom professor

    "Earth and other Ethics:'' moral considerateness bison whale governance utilitarianism ethic entity preference utilitarian.

    "When Your Parents Grow Old:'' diabetes elderly appendix geriatric directory hospice arthritis parent dental rehabilitation

    "Madhur Jaffrey's Cookbook:'' peel teaspoon tablespoon fry finely salt pepper cumin freshly ginger

If we treat each candidate's (concatenated) contributions to the debate as a document, and apply a similar metric, here's what we get:

Bush count
Kerry count
Bush count
Kerry count

We can see the greater repetitiveness of Bush's language -- although he used almost 16% fewer words (6,135 to 7,136), he repeated some of those words much more often.

If I had done "stemming", some of the results would have been even more striking. For instance, we have

Bush count
Kerry count

In this particular case, Kerry's avoidance of the word is just as interesting as Bush's overuse of it. Bill Clinton was the "man from Hope" -- but lexically, John Kerry had no hope at all on Thursday evening. I wonder if that was a conscious attempt to avoid association with Clinton's theme? or perhaps the result of a desire to sound strong by avoiding any mention of hypothetical states of affairs? Maybe his handlers told him something like "not hopes, but policies".

If they didn't, perhaps they should have, given how weak Bush's use of this word turned out to be. Four times, he refered to the hopes of enemies: "their hope is that we grow weary and leave"; "hoping to shake our will"; "hoping that the world would turn a blind eye". He repeated a stock phrase "the hopes and aspirations" (of foreigners who want democracy) three times. He used the word "hope" twice to mock Kerry's proposed reliance on alliances ("...the hope that somehow resolutions and failed institutions will make this world a more peaceful place"; "...let's, you know, hope to talk him out"). However, the rest of the hopes were expressions of his own desire to experience unreal states, mostly in respect to better days in Iraq or problems of nuclear proliferation:

"I hope it's as soon as possible"; "I would hope I never have to"; "I was hopeful diplomacy would work"; "I would hope never to have to use force"; "I would hope we never have to"; "I certainly hope so"; "I hope we can do the same thing"; "I hope we can do it"; "I was hoping diplomacy would work"; "I went there hoping that..."

I share all of these hopes, as it happens, but his way of talking about them made him seem weak. Al Queda hopes to shake our will: never happen. Arab reformers hope for democracy: lots of luck, folks, it looks like a long road. Kerry hopes that the U.N., Nato and the Arab nations will bail us out in Iraq: yeah, right. And what does Bush offer us, lexically at least, when asked about bringing U.S. troops home from Iraq, or dealing with nuclear weapon in North Korea and Iran? His hopes. Um, uh, wait a minute...

I don't think that most of Bush's other lexical themes worked for him, either. His 20-fold repetition of hard, for example, was mainly about "hard work" (11 times), "how hard it is" (3 times), "working hard" (twice) and similar senses. This reminded me of the argument that I get from a student who has flat-out failed an exam, but wants to persuade me that due to ceaseless toil and dedication, they deserve a better grade. I'll confess that I'm a sucker for this kind of argument, but really, there's a difference between diligence and accomplishment.

Kerry's top two words -- today and even -- surprised me. At first I thought they would show how abstract he can be. But in fact both of these were part of attack modes that worked pretty well. Today was all about facing the unpleasant realities on the ground:

And so, today, we are 90 percent of the casualties and 90 percent of the cost...
And you go visit some of those kids in the hospitals today who were maimed because they don't have the armament.
And today, there are four to seven nuclear weapons in the hands of North Korea.
We've got a backdoor draft taking place in America today...
Now, there are terrorists trying to get their hands on that stuff today.

And even adds bite to accusations, without seeming mean-spirited:

Iraq was not even close to the center of the war on terror before the president invaded it.
They avoided even the advice of their own general.
Even the administration has admitted they haven't done the training...

Kerry used "sort of" six times, five of them being used to soften a criticism of his opponent -- and Bush used this sequence not at all. At first I interpreted this in terms of the "Kerry is weak and wishy-washy" meme, but on balance I think it adds politeness -- which many people appreciate -- without really removing the sting from his criticism:

What I think troubles a lot of people in our country is that the president has just sort of described one kind of mistake.
He cut it off, sort of arbitrarily.
Now, that, I think, is one of the most serious, sort of, reversals or mixed messages that you could possibly send.
And there, again, he sort of slid by the question.
But let me talk about something that the president just sort of finished up with.

In the end, what I've been doing here is a very limited and superficial form of analysis. It models the meaning of a text as the statistics of a "bag of words", a model that deserves the adjective Fred Jelinek is fond of applying to the (similarly simple and effective) linguistic models used in speech recognition technology: moronic.

Still, there may be some aspects of impression-formation that are not much smarter. So if political consultants are not doing this kind of analysis already, perhaps they should be.

[Update: Cameron Marlow at overstated does something similar with the debate transcripts, except that he uses an algorithm to "parse the document and extract the noun phrases", and he ranks the results by frequency for each candidate separately, rather than looking for the most different things. He also offers access to his software. (You could have mine too, except that I just wrote simple ad hoc unix scripts...)

Similar phrase-counting was done at Amy's Robot. ]


Posted by Mark Liberman at 05:59 AM

October 01, 2004

Does size matter?

Well, John Kerry's is 17.7% longer than George Bush's. We're talking about average sentence length, of course.

Yesterday I posted about the question of whether it's true, as (the NYT says that) Kathleen Hall Jamieson said, that "The language of decisiveness is subject, verb, object, end sentence." Let's pass over that issue in silence today, and try interpreting Kathleen's suggestion more charitably. Maybe the way to seem authoritative and decisive is to use simple sentence structures, of whatever kind. I'm not sure how to frame that hypothesis in a testable way, so I thought I'd look at last night's presidential debate in terms of a simple proxy measure, namely sentence length.

"Before I answer further", as Kerry put it, let me say that my overall impressions of the debate were in tune with what has emerged as the conventional wisdom. First, the debate was much better than I expected, maybe not "the best presidential debate in decades" (as conservative pundit Jonah Goldberg suggested), but not at all the boring montage of dueling campaign ads that I thought it would be.

And I think that Kerry did very well, both in content and in form. I agree with Jay Nordlinger, managing editor of the (conservative) National Review:

I thought Kerry did very, very well; and I thought Bush did poorly — much worse than he is capable of doing. Listen: If I were just a normal guy — not Joe Political Junkie — I would vote for Kerry. On the basis of that debate, I would. If I were just a normal, fairly conservative, war-supporting guy: I would vote for Kerry. On the basis of that debate.

In terms of self-presentation, Kerry seemed "succinct and sharp", as Mickey Kaus put it; "calm [and] authoritative", (Andrew Sullivan). In contrast, Bush seemed "smug and contemptuous" (Kaus); "snippy and peevish" (Sullivan); "lost" and "tense or impatient or peeved or even a bit miffed that he even had to be up there on the stage with Kerry." (Josh Marshall). If I were playing poker with George W. Bush, I'd watch for those multiple rapid-fire eye blinks, and I'm pretty sure I'd know what they meant.

But was there anything (simple) about the form of their language -- as opposed to its content -- that corresponds to those judgments? The most obvious simple thing to check is sentence length, and there was certainly a difference in that respect between the two men last night. It went the way you'd expect; but I don't think it made any real difference.

I took the debate transcript from the www.debates.org web site, and ran a few simple programs over it. (With transcripts of spoken material, there are always questions about what constitutes a sentence -- I didn't second-guess the transcribers, but just accepted the sentence divisions implied by their choice of punctuation).

Bush's contributions totalled 6,165 words in 476 sentences, for an average sentence length of 12.95. Kerry's side was 7,136 words in 468 sentences, for an average sentence length of 15.25. Thus Kerry's average sentence was about 17.7% longer.

Both men's longest sentences were in the range of 60-70 words. Within the range of sentences from one to twenty words in length, there's one striking feature: Bush used many more 3-word sentences (26) than Kerry did (11).

Here are Bush's 26 three-worders (in alphabetical order):

And I do. And I'm optimistic. And he has. And we will. Every life matters. He changes positions. I know that. It's hard work. It's hard work. It's hard work. It's incredibly hard. Let me finish. Libya has disarmed. Look at Libya. Missy understood that. Thank you, sir. That's my job. That's totally absurd. We tried diplomacy. We will succeed. We're communicating better. We're making progress. We're making progress. Why should he? Yes, let me... You know why?

A few of these are even SVO, but whatever their form, I don't think that these short, punchy phrases are really contributing to a sense of authoritativeness or decisiveness. On the contrary, many of them register with me as helping to create the impression of someone who was "snippy", "peevish", "impatient", and "miffed that he even had to be up there".

That's not because short phrases are intrinsically either authoritative and decisive, or snippy and peevish. I hope it won't shock anyone if I venture to suggest that authoritative and decisive material sounds authoritative and decisive, whether its sentences are short or long, while snippy and peevish phrases seem -- you guessed it -- snippy and peevish, whatever their length or their exact grammatical analysis. That's because "decisive", "peevish" and so on are really characteristics of people and their actions, attitudes and intentions, not characteristics of sentences and their form or size at all.

In the other direction, Kerry's eighth-longest sentence was this one:

The center is Afghanistan, where, incidentally, there were more Americans killed last year than the year before; where the opium production is 75 percent of the world's opium production; where 40 to 60 percent of the economy of Afghanistan is based on opium; where the elections have been postponed three times.

This sentence shows that John Kerry is one of those "[p]eople who speak in sentences that contain parenthetical phrases, people who begin a sentence and then deflect to add a series of illustrative examples before they end the sentences", to quote Kathleen Hall Jamieson again. Nevertheless, this 51-word sentence was part of one Kerry's more rhetorically effective passages, in my view.

Maybe certain properties of sentences are correlated with certain properties of the people that use them. No doubt we make those associations all the time. But be careful: every one of these linguistic coins has two evaluative sides. "Staying on message" can also be "repeating the same thing over and over". "Strong, simple language" can also be "brusque, contemptuous sound bites". "Beginning a sentence and then deflecting to add a series of illustrative examples" can also be "hammering home the point with a succession of facts".

[More linguistic analysis of debate #1 here, here, here, here, here., and here. ]


Posted by Mark Liberman at 04:11 PM

Herring Communication

According to papers entitled "Sounds produced by herring (Clupea harengus) bubble release" by Magnus Wahlberg and Håkan Westerberg in the journal Aquatic Living Resources (2003, 16.271-275), and "Pacific and Atlantic Herring Produce Burst Pulse Sounds" by Ben Wilson, Robert S. Batty and Lawrence M. Dill in Biology Letters (2003, 271.S95-S97) herring communicate distress by farting. The papers are perfectly serious and come complete with spectrograms of the sounds in question and an acoustic model of their generation, but it sounds so funny that last night the work was awarded the 2004 IgNobel Prize in Biology.

[Note that Language Log anticipated the IgNobel awards by proposing the key lexical innovation at issue (F.R.T., for "Fast Repetitive Tick") for Word of the Year status, last December.]

Posted by Bill Poser at 09:52 AM