July 04, 2005

That's American

Suppose you wanted to track changes in the relative usages of syntactic variants by writers in, oh say, the past three or four decades.  You would, of course, take advantage of tagged corpora (with part-of-speech information) that have become available in the past half-century and compare rates of occurrence in earlier and later corpora.

This is just what Geoffrey Leech and Nicholas Smith (of Lancaster University) have been doing recently, looking at a collection of variables in both American and British English and using corpora made available in 1961 and 1991-92.  Overall, they find significant increases in several colloquial vs. more formal variants, with more extreme changes in American English vs. British.  On one variable, relative that vs. which, the change in American English writing has been enormous; the change in British English is in the same direction, in favor of that, but is very much smaller.

The obvious interpretation is that the forces of prescription have been winning big in the U.S., at least with respect to this one variable.  (Meanwhile, stranded prepositions, contracted auxiliaries, and contracted negatives, among other variables, have risen in frequency as against their more formal alternatives -- this despite the strictures of the advice literature on standard written English.)  American writers are, apparently, conforming more and more to what the advice books say: use that rather than which in restrictive relatives (the That Rule, a recurrent theme in the halls of Language Log Plaza, most recently discussed here).

But then a passing remark by Anne Fadiman in her valedictory editor's column in the American Scholar reminded me that there's another factor at work here, and it's hard to assess its role in these corpus statistics:  what appears in the corpora is not, exactly, what people wrote; instead, it's what got published, and in the U.S. there's an almost religious attachment to the That Rule in the editorial establishment, which intervenes between the writer's original text and the version that appears in print.

Already published is an article by Leech, "Recent grammatical change in English: data, description, theory", in K. Aijmer & B. Altenberg (eds.), Advances in Corpus Linguistics (Papers from the 23rd International Conference on English Language Research on Computerized Corpora, Göteborg 22-26 May 2002), Amsterdam: Rodopi (2004), 61-81.  Still in press is Leech & Smith, "Recent grammatical change in written English, 1961-1992: some preliminary findings of a comparison of American and British English", in Antoinette Renouf & Andrew Kehoe (eds.), The Changing Face of Corpus Linguistics, Amsterdam: Rodopi.  My summary here relies on further discussion by Leech in e-mail to Rodney Huddleston and to me.

Though restrictive vs. non-restrictive relatives are not entirely factored out, Leech did some recent quick calculations that factored out prepositional relatives (obviously a potentially important consideration, given the decline in fronted prepositions vs. stranded prepositions), and came up with, in the U.S. data, a decrease of 41.5% in the frequency of non-prepositional which relatives and a corresponding increase of 48.5% in the frequency of non-prepositional that relatives.  This is pretty stunning, and exceeds the changes in British English by roughly a FACTOR of 5.

Now, this effect is in the direction of "colloquialization", given that restrictive which is less frequent in spoken vs. written English.  But the effect is, in Leech & Smith's words, "dramatic", way beyond simple American colloquialization.  Leech & Smith conclude: "This preference [for restrictive that], amounting to an increasing taboo against which as a restrictive relativizer, is now built into grammar checking software, and we can expect it to be making even greater headway at present than in the early 1990s."

But.  But.  What we're looking at here is what comes out of the publishing enterprise.  We don't know what went into it.  Here Anne Fadiman's passing reference to copyediting suddenly becomes relevant:

"Letter from the Editor: The Thanksgiving Table", American Scholar, Autumn 2004, p. 9:

I also read through many of the folders in my twenty-two linear feet of SCHOLAR-related files.  One of them was labeled "Checking."  It contained research material faxed by Jeanie [Stipicevic, managing editor] and Sandra [Costich, associate editor], who not only format our pages and enforce the sacred distinction between which and that but also check our pieces for accuracy.

Oh dear, "the sacred distinction".  And it comes up in connection with the formatting of pages, a matter of the mechanics of publication.  As I've noted here before, U.S. publishing establishments (even those that are arms of British publishing establishments) tend to view the That Rule as a mechanical stipulation, like spellings in -or (rather than -our), rejection of the serial comma, and placing commas and periods inside (rather than outside) closing quotation marks.  It in enshrined in house style sheets, in the very influential Chicago Manual of Style, and in Microsoft Word's grammar checker.  I find this bizarre, but there it is.

What's important here is not that all these sources of advice subscribe to the That Rule -- after all, real-life writers happily, regularly, and systematically fail to adhere to the proscriptions in the manuals, as should be clear from studies like Leech's -- but that those who mediate between what people write and what gets published subscribe to the rule.  Who knows what people actually write?  Whatever you type in, Microsoft Word or a copyeditor will silently alter it, at least if you're in the U.S.  It's out of your hands.

So what do corpus linguists make of the results?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at July 4, 2005 09:36 PM