January 14, 2004

Google sociolinguistics

Following the revelation that some heartless test designers have arranged for 50% of the scores to fall below the median, a recent article in the Daily Northwestern by Mindy Hagen says that "[i]n his weekly radio address last Saturday, Bush refuted that the law sets unreasonable standards."

This took me aback, not politically but syntactically. For me, you can refute something but not refute that such-and-such is so. I checked a few dictionaries, and learned that there's a controversy about whether refute must mean "disprove", or can be used more loosely to mean just "deny" or "repudiate". However, nothing is said about the question of sentential complements, though all the (about 20) example sentences in the four on-line dictionaries I checked had noun-phrase objects.

Not being a bigot, I checked with Google. The patterns "refuted that the", "refutes that the", "refute that the" get a total of 1,969 hits, showing that Ms. Hagen's usage is not an isolated case. The patterns "refuted the", "refutes the", "refute the" get 225,800 hits, suggesting that sentential complements for refute remain relatively rare. By comparison, "claimed that the" scores 497,000, compared to 580,000 for "claimed the".

Not being a corpus fetishist either, I took a look at the examples and their sources. Some of the sentential complement examples are in national publications, like this review in Salon Magazine of Christina Hoff Sommers' The War against Boys: "Sommers seeks to refute that the 'girl crisis' ever existed." So I'm convinced that this construction has a linguistic toe-hold. But is it coming or going? Is it going to be the norm in another generation, or is it just a sporadic idiosyncrasy?

On the first four pages of google's examples of "refuted that the", 17 of the 40 citations are from south Asia (India, Pakistan, etc.). This suggests that in "Indian English" and related varieties, sentential complements for refute have already become standard.

Several of the other citations on the first few pages are from (America) religious discussions -- perhaps this is just because refute is a relatively high-frequency word in religious debates, but maybe there are more interesting reasons.

A few of the other citations that I looked at are from other college newspapers. The Daily Illini writes: "While economics professor Fred Gottheil admitted that the nation is experiencing an economic dip, he refuted that the economy is in a recession." An article in the Grinnell paper says that "Leffler refuted that the U.S. has ignored Europe because there remained incomparable shared values and interests between the two giants." Is this an indication of "age grading"? are younger writers more likely to use sentential complements with refute?

If I cared enough, or had enough time on my hands, I could probably answer these questions. I'd have to read through a large sample of the thousands of available citations, and categorize them according to the apparent age and background of the writer, the topic of the passage, and relate such variables to the frequency of linguistic variants. If web search indices included (even automatically-derived and approximately-correct) metadata of this type, I could evaluate such hypotheses with much less work. Actually, I should say: "When web search indices include automatically-derived and approximately-correct metadata of this type, ..."

Sometime soon I hope that Philip Resnik will tell us about his Linguist's Search Engine -- I will, if he doesn't :-)... -- and you'll see that we are not talking about an imaginary day far in the future. The independent variables that Philip's system happens to focus on are lexical and grammatical, but there's no reason that genre, author characteristics and so on could not be introduced into a service of this kind.

Posted by Mark Liberman at January 14, 2004 08:53 AM