September 19, 2004

Which vs that? I have numbers!

A user who signs in as eub comments here on the topic of my recent remarks about which vs that in relative clauses:

On the "that"/"which" rantlet: okay, but it would be more fun to see stats on how often these Canonical Texts use each one in a relative and a restrictive way (and in what circumstances?), rather than flagging a single ("which", restrictive) example from each text.

and Jason (jcreed) agrees with his remarks ("Yeah, I was thinking the same thing. I imagine that would take much more effort than just pulling up some project gutenberg texts and grepping a few times"). I feel my honor on the point of being besmirched by people who imply (don't get me angry, guys! you wouldn't like me when I'm angry!) that perhaps I am too lazy to get the calculator out of the drawer and do some honest counting.

But in fact those who think they might be interested in the statistics on using which vs that in integrated ("restrictive") relative clauses don't have to wait for me to tap on the numeric keypad for an hour or two; they can find the figures in print, in the Longman Grammar of Spoken and Written English by Douglas Biber and colleagues. I confess to not using this book very much, because it has only a rather hazy theoretical basis (it uses a sort of confused amalgam of early and late Quirk terminology and does nothing to improve on earlier descriptions or clear up residual Quirkian confusedness), but I do use it for this sort of thing. For what it is worth (not very much, IMHO), here is an example of what they provide: some figures (a restatement of what Biber et al. gives on page 616, in approximate numbers of occurrences per million words) for relative clauses in American and British newpapers:

 AmE newsBrE news
integrated relatives with which 8002600
integrated relatives with that 34002200
supplementary relatives with which 14001400
supplementary relatives with that 00

The only striking figure is the last one: you virtually don't get supplementary relatives with that any more — they occur very occasionally, but the ones Biber et al. cite in their text (page 615) look to me like integrated relatives that happen to be set off by commas; the true supplementary ones can be identified unambiguously when the head noun is one that (like a proper name or a uniquely referring definite NP) doesn't take integrated relatives at all, and although I have seen examples of that sort (here's one: His heart, that had lifted at the sight of Joanna, had become suddenly heavy), they are extremely rare.

But as regards the choice between that and which in integrated relatives, which is what eub was wondering about, although there is a clear frequency difference between the dialect groups, it's obvious that both that and which are grammatical in integrated relatives in both dialect groups, in accord with my earlier discussions.

As for why you get the relativizer word that you get in each case, anyone is as capable as I am of reading a few books and noting the thats and whiches and forming semantic hypotheses, but I'm not going to do it, because it is my belief that it would amount to looking for a meaning difference that isn't there. Biber et al. speculatively attribute the difference to a culutural style thing: a greater "willingness to use a form with colloquial associations" among Americans. This is a speculation I would not endorse. One might just as well attribute it to a greater willingness on the part of Americans to accept (unwisely) the pronouncements in Strunk and White's The Elements of Style, that pox-ridden little pocketbook of pointless pontifications.

Posted by Geoffrey K. Pullum at September 19, 2004 06:51 PM