A few days ago, I rashly took up a syntactic challenge issued by Geoff Pullum.
Here's the backstory.
First, Geoff took Sidney Goldberg to task for promulgating falsehoods about English grammar, and criticized the National Review for publishing his uninformed pontifications without any linguistic fact-checking. One of the three (out of three) wrong grammatical points in Goldberg's screed was an alleged distinction between which and that. Geoff demolished the notion that "integrated" relative clauses (also known as "restrictive" relatives) require that (and prohibit which) by observing that six classic novels, the first integrated relative using which occurs on average about 3% of the way into the book.
Second, I drew Geoff's attention to a comment on a livejournal blog that said "okay, but it would be more fun to see stats on how often these Canonical Texts use each one in a ... restrictive way (and in what circumstances?), rather than flagging a single ... example from each text." Geoff responded with statistics from journalistic text, by Doug Biber and others, nailing his point beyond any reasonable doubt.
So far so good. But Geoff went on to argue that there's no point in "noting the thats and whiches and forming semantic hypotheses", because that "would amount to looking for a meaning difference that isn't there". Now, Geoff is a syntactician, and co-author of the monumental Cambridge Grammar of the English Language. I'm merely a phonetician who occasionally dabbles in practical text analysis. But my prejudice in such matters is that optional variants usually do have interestingly different distributions, and that "meaning" is usually part of the story, at least in a weak sense of the word.
There are two uncontroversial semantically-relevant distinctions between that and which in relative clauses in standard English. First, which can't be used with what CGEL calls "personal" referents -- "*the people which speak English" is not standard English. Second, that can't be used in "supplementary" (or "non-restrictive") relative clauses -- "her head, that was covered with a floppy straw hat" is unlikely if not impossible in contemporary standard English.
So I decided to look for what you might call ripples or echoes of those two distinctions, in contexts where that and which are both fully grammatical. I started by looking for evidence that the personal/non-personal distinction might have a non-trivial influence on the choice among that, which and who. I found several contexts where that is used much less often for "personal" referents than we would expect, based on the ratio of uses of who vs. which and similar considerations. This suggests, at least, that perhaps that has come to be tinged with a bit of "non-personal" meaning. I might venture (on no evidence whatsoever) to predict that this is an unstable situation, and that over time, we might find this tinge deepening and becoming categorical. At least, that's the sort of thing that sometimes happens in the history of syntax.
In this post, I'm going to take up the second idea, namely that perhaps which is tinged with a bit of "supplementarity", even in the context of integrated relatives. The idea here is to look at categories of "integrated" relative clauses that are in some sense more or less tightly "integrated", and see whether the difference in degree of integration affects the probability of using which (or who) vs. that.
Here's the idea that I started with. There are some kinds of relative clauses in which a quantifier or other operator binds the relative especially tightly to the intepretation of the syntactic head, e.g. "the only thing that trumps fear is greed". In contexts like this, which seems much less natural to me than that, though that still seems fully grammatical. Similar phrases without only seem somehow to bind the relative clause less tightly, and in consequence to be more amenable to which, e.g. "the thing that is really hard is giving up on being perfect."
Now, I can't offer any plausible logical analysis to cash in this intuitive impression of "binding more/less tightly". But it's easy enough to check the prediction about the relative probability of which and that in these contexts:
thing |
things |
total |
place |
places |
total |
grand total |
|
the only __ that | 944,000 |
82,800 |
1,026,800 |
61,100 |
5,890 |
66,990 |
1,093,790 |
the only __ which | 38,500 |
3,280 |
41,780 |
1,980 |
295 |
2,275 |
44,055 |
that/which ratio | 24.2 |
25.2 |
24.6 |
30.9 |
20.0 |
29.4 |
24.8 |
the __ that | 658,000 |
2,300,000 |
2,958,000 |
210,000 |
120,000 |
330,000 |
3,288,000 |
the __ which | 66,300 |
201,000 |
267,300 |
70,200 |
12,500 |
82,700 |
350,000 |
that/which ratio | 9.9 |
11.4 |
11.1 |
3.0 |
9.6 |
4.0 |
9.4 |
The table above shows counts for the words thing(s) and place(s) in the contexts "the only __ that/which" and "the ___ that/which". (Note that with very few exceptions, all of the relative clauses found would count as "integrated" by anyone's standard -- these results cannot be explained directly by the integrated/supplementary distinction). Across these cases, the ratio of that to which is 24.8 when only is present, and 9.4 when it isn't. Q.E.D.
This diffence seems to be something particular about that vs. which. The other personal relative pronoun, who, doesn't seem to be affected nearly as much:
people |
group |
category |
|
the only __ that | 83,500 |
15,200 |
2,320 |
the only __ who |
381,000 |
2,590 |
10 |
the only __ which | 320 |
1,640 |
301 |
that/who ratio | 0.22 |
5.9 |
232 |
that/which ratio | 260.9 |
9.3 |
7.7 |
the __ that | 1,710,000 |
635,000 |
101,000 |
the __ who | 7,740,000 |
118,000 |
585 |
the __ which | 63,900 |
210,000 |
82,900 |
that/who ratio | 0.22 |
5.4 |
173 |
that/which ratio | 26.8 |
0.56 |
1.2 |
Nouns like people, group and category can have to personal as well as non-personal referents, and so occur in reasonable numbers with who as well as which and that, as the above table shows. But the that/who ratio is only slightly increased by the presence of only (between 0 and 34% in these examples), while the that/which ratio is much more strongly affected (between 642% and 1,661%).
The table below summarizes the effects of only on the that/which ratio of five different cases:
thing(s) | place(s) | people | group | category | |
the only __ (that|which) [that/which ratio] |
24.6 |
29.4 |
260.9 |
5.4 |
7.7 |
the __ (that|which) [that/which ratio] |
11.1 |
4.0 |
26.8 |
0.56 |
1.2 |
As crude support for the idea that other sorts of quantification of the head have a similar effect, compare the following two tables.The first one looks at a variety of quantifiers with things as head and a definite article present, where the that/which ratios vary from 17.2 to 41.6:
that |
which |
that/which ratio |
|
the only things | 82,500 |
3,280 |
25.2 |
all of the things | 63,700 |
1,530 |
41.6 |
all the things | 299,000 |
15,700 |
19.0 |
some of the things | 217,000 |
7,960 |
27.3 |
few of the things | 24,100 |
633 |
38.1 |
the few things | 29,100 |
761 |
38.2 |
the three things | 10,100 |
588 |
17.2 |
Now we look at "the things" (without additional quantification) as the NP in a variety of prepositional phrases, where the that/which ratios vary from 2.7 to 13.2:
that |
which |
that/which ratio |
|
for the things | 53,500 |
9,400 |
5.7 |
to the things | 42,800 |
7,990 |
5.4 |
from the things | 18,800 |
6,860 |
2.7 |
with the things | 24,800 |
1,880 |
13.2 |
by the things | 21,400 |
6,930 |
3.1 |
because of the things | 3,860 |
498 |
7.8 |
without the things | 910 |
86 |
10.6 |
Again, nearly all of the examples in both tables are integrated relative clauses. But I think it's fairly clear that quantification of the head tends to predispose the choice away from which and towards that. At a minimum, I'd submit that this is a "semantic difference" that influences the choice between the two words, in contexts where both are fully grammatical. I hypothesize (without any evidence) that the influence arises because of some kind of psychological gradient of integration, where the process of intepreting the quantifier somehow binds the relative clause more tightly to its head, at least in processing terms, and therefore biases the choice away from which and toward that.
Posted by Mark Liberman at September 23, 2004 11:36 PM