October 07, 2004

"It's hard to know where to start"

I feel like Dick Cheney in Tuesday's Vice Presidential Debate. He kept beginning his answers with "it's hard to know where to start." OK, in fact he did it only twice:

Well, Gwen, it's hard to know where to start; there are so many inaccuracies there.
Well, Gwen -- I'm sorry, it's hard to know where to start.

Still, it's one of the things that stuck in my mind from the debate. Another piece of evidence that small n-gram (word-sequence) counts can be psychologically significant. But the reason that I feel like Dick Cheney -- and it's not a good feeling -- is because of what Jane Perrone wrote in the Guardian's Newsblog yesterday evening, under the heading "Language Matters".

Here's Perrone's entry:

Bloggers are a resourceful bunch: they like nothing better than to "fact check [insert name of candidate or journalist]'s ass".

MIT Media Lab graduate student Cameron Marlow has done wonders with Perl to create a tool to help bloggers analyse transcripts of the presidential debates. Just plug in a well-worn phrase - say, "war on terror" - and up pops a phrase count (Bush 11, Kerry 7, for the record).

Marlow lists the candidates' top 25 phrases during the first Bush-Kerry clash, and repeats the exercise for last night's vice-presidential debate: Cheney's top three phases were Saddam Hussein (11), fact of the matter (10) and United States (10), while Edwards' were John Kerry (36), American people (28) and tax cuts (16).

For more analysis of the candidates linguistic skills, see Language Log, which finds that John Kerry's sentences are, on average, 17.7% longer than George Bush's. Language Log's sober analysis is that, of four reasons for the statistic: "First, Kerry might have talked faster. Second, he might have used shorter pauses. Third, he might have paused less often. Fourth, he might have used intrinsically shorter words", the second is the key factor, sidestepping the well-worn debate over whether Bush is stupid, as evinced by this piece in Slate.

Perhaps I should update the old expression, and conclude that "I don't care what they say about me, as long as they spell my URL correctly." (Another connection to Dick Cheney, who sent millions of viewers to George Soros' blog by talking about factcheck.com instead of factcheck.org). However, Perrone wasn't talking about me, she was talking about linguistic analysis.

Now the whole point of Language Log, aside from having fun, is to encourage people to think and talk about language. But without being excessively pedantic, we'd also like to encourage people to think and talk about language in a way that's sensible, factual and logical.

I'm having trouble getting to the point here, because I can't understand how Perrone, an intelligent and accomplished journalist, could have got things so wrong. On her own fascinating horticultural blog, she would never confuse a comparison of beetroot weights with an analysis of the causes of caterpillar infestation. Yet in her short "Language Matters" paragraph, she manages to mix up two aspects of linguistic analysis that are just as different.

She combines quotes from two of my posts about the first presidential debate: one in which I compare the candidates in terms of the average length of their sentences in words, measured from the official transcript; and another in which I examine the reasons for a difference in the candidates' overall rate of speech, and show that the key factor was a difference in pause length, measured in an audio recording. Now, these are logically different things. You can talk slow or fast in long sentences, and slow or fast in short sentences. You can pause more or less often, for shorter or longer amounts of time, independent of how many words you put in your sentences. And my explanation for Kerry's overall faster speech rate -- that Bush's pauses were similar in number but much longer -- had nothing to do with the relative length of their sentences.

I measured these very, very simple things -- sentence length in words, overall word count per unit time, duration of silent pauses -- for two reasons. First, these things are really easy to measure. You don't have to parse the sentences or measure vowel formants or anything time consuming, so the empirical part of the research just took a few minutes. And second, these things are really easy to understand. When Geoff explains about "fronted negative adjuncts" and "long sequences of supplements and appositives", you've got to keep your wits about you. But how hard can it be to understand the count of words in a sentence, or the duration of a silence in seconds?

Too hard, apparently. Well, looking over those posts, I can see that it's my fault. I never said explicitly that there's a difference between counting words and sentences on the one hand, and counting speech time and silence time on the other. I never defined my terms: words, sentences, seconds, speech, silence. Seriously, it takes some sophistication of thought to keep these things straight. Pauses are not periods. Cabbages are not compost. I have to remember to explain that stuff.

As for Perrone's indirect swipe at President Bush's intelligence, she would have done better not to combine that with a display of intellectual carelessness on her own part.

[Update 10/8/2004: Jane Perrone has posted a gracious correction on the Guardian's newsblog site:

I concur with Dan Gillmor when he says in the introduction to his book We the Media: "I take it for granted, for example, that my readers know more than I do - and this is a liberating, not threatening, fact of journalistic life."

So I'm glad that Professor Mark Liberman of Language Log called me on my sloppy summation of his analysis of the Bush-Kerry debate. My apologies to Mark: socks are being pulled up as I type.

I feel less like Dick Cheney already. And next time, I'll try to write more clearly. ]


Posted by Mark Liberman at October 7, 2004 06:39 AM