September 23, 2004

Which vs. that: integration gradation

A few days ago, I rashly took up a syntactic challenge issued by Geoff Pullum.

Here's the backstory.

First, Geoff took Sidney Goldberg to task for promulgating falsehoods about English grammar, and criticized the National Review for publishing his uninformed pontifications without any linguistic fact-checking. One of the three (out of three) wrong grammatical points in Goldberg's screed was an alleged distinction between which and that. Geoff demolished the notion that "integrated" relative clauses (also known as "restrictive" relatives) require that (and prohibit which) by observing that six classic novels, the first integrated relative using which occurs on average about 3% of the way into the book.

Second, I drew Geoff's attention to a comment on a livejournal blog that said "okay, but it would be more fun to see stats on how often these Canonical Texts use each one in a ... restrictive way (and in what circumstances?), rather than flagging a single ... example from each text." Geoff responded with statistics from journalistic text, by Doug Biber and others, nailing his point beyond any reasonable doubt.

So far so good. But Geoff went on to argue that there's no point in "noting the thats and whiches and forming semantic hypotheses", because that "would amount to looking for a meaning difference that isn't there". Now, Geoff is a syntactician, and co-author of the monumental Cambridge Grammar of the English Language. I'm merely a phonetician who occasionally dabbles in practical text analysis. But my prejudice in such matters is that optional variants usually do have interestingly different distributions, and that "meaning" is usually part of the story, at least in a weak sense of the word.

There are two uncontroversial semantically-relevant distinctions between that and which in relative clauses in standard English. First, which can't be used with what CGEL calls "personal" referents -- "*the people which speak English" is not standard English. Second, that can't be used in "supplementary" (or "non-restrictive") relative clauses -- "her head, that was covered with a floppy straw hat" is unlikely if not impossible in contemporary standard English.

So I decided to look for what you might call ripples or echoes of those two distinctions, in contexts where that and which are both fully grammatical. I started by looking for evidence that the personal/non-personal distinction might have a non-trivial influence on the choice among that, which and who. I found several contexts where that is used much less often for "personal" referents than we would expect, based on the ratio of uses of who vs. which and similar considerations. This suggests, at least, that perhaps that has come to be tinged with a bit of "non-personal" meaning. I might venture (on no evidence whatsoever) to predict that this is an unstable situation, and that over time, we might find this tinge deepening and becoming categorical. At least, that's the sort of thing that sometimes happens in the history of syntax.

In this post, I'm going to take up the second idea, namely that perhaps which is tinged with a bit of "supplementarity", even in the context of integrated relatives. The idea here is to look at categories of "integrated" relative clauses that are in some sense more or less tightly "integrated", and see whether the difference in degree of integration affects the probability of using which (or who) vs. that.

Here's the idea that I started with. There are some kinds of relative clauses in which a quantifier or other operator binds the relative especially tightly to the intepretation of the syntactic head, e.g. "the only thing that trumps fear is greed". In contexts like this, which seems much less natural to me than that, though that still seems fully grammatical. Similar phrases without only seem somehow to bind the relative clause less tightly, and in consequence to be more amenable to which, e.g. "the thing that is really hard is giving up on being perfect."

Now, I can't offer any plausible logical analysis to cash in this intuitive impression of "binding more/less tightly". But it's easy enough to check the prediction about the relative probability of which and that in these contexts:

 
thing
things
total
place
places
total
grand total
the only __ that
944,000
82,800
1,026,800
61,100
5,890
66,990
1,093,790
the only __ which
38,500
3,280
41,780
1,980
295
2,275
44,055
that/which ratio
24.2
25.2
24.6
30.9
20.0
29.4
24.8
the __ that
658,000
2,300,000
2,958,000
210,000
120,000
330,000
3,288,000
the __ which
66,300
201,000
267,300
70,200
12,500
82,700
350,000
that/which ratio
9.9
11.4
11.1
3.0
9.6
4.0
9.4

The table above shows counts for the words thing(s) and place(s) in the contexts "the only __ that/which" and "the ___ that/which". (Note that with very few exceptions, all of the relative clauses found would count as "integrated" by anyone's standard -- these results cannot be explained directly by the integrated/supplementary distinction). Across these cases, the ratio of that to which is 24.8 when only is present, and 9.4 when it isn't. Q.E.D.

This diffence seems to be something particular about that vs. which. The other personal relative pronoun, who, doesn't seem to be affected nearly as much:

 
people
group
category
the only __ that
83,500
15,200
2,320

the only __ who

381,000
2,590
10
the only __ which
320
1,640
301
that/who ratio
0.22
5.9
232
that/which ratio
260.9
9.3
7.7
the __ that
1,710,000
635,000
101,000
the __ who
7,740,000
118,000
585
the __ which
63,900
210,000
82,900
that/who ratio
0.22
5.4
173
that/which ratio
26.8
0.56
1.2

Nouns like people, group and category can have to personal as well as non-personal referents, and so occur in reasonable numbers with who as well as which and that, as the above table shows. But the that/who ratio is only slightly increased by the presence of only (between 0 and 34% in these examples), while the that/which ratio is much more strongly affected (between 642% and 1,661%).

The table below summarizes the effects of only on the that/which ratio of five different cases:

  thing(s) place(s) people group category
the only __ (that|which)
[that/which ratio]
24.6
29.4
260.9
5.4
7.7
the __ (that|which)
[that/which ratio]
11.1
4.0
26.8
0.56
1.2

As crude support for the idea that other sorts of quantification of the head have a similar effect, compare the following two tables.The first one looks at a variety of quantifiers with things as head and a definite article present, where the that/which ratios vary from 17.2 to 41.6:

 
that
which
that/which ratio
the only things
82,500
3,280
25.2
all of the things
63,700
1,530
41.6
all the things
299,000
15,700
19.0
some of the things
217,000
7,960
27.3
few of the things
24,100
633
38.1
the few things
29,100
761
38.2
the three things
10,100
588
17.2

Now we look at "the things" (without additional quantification) as the NP in a variety of prepositional phrases, where the that/which ratios vary from 2.7 to 13.2:

 
that
which
that/which ratio
for the things
53,500
9,400
5.7
to the things
42,800
7,990
5.4
from the things
18,800
6,860
2.7
with the things
24,800
1,880
13.2
by the things
21,400
6,930
3.1
because of the things
3,860
498
7.8
without the things
910
86
10.6

Again, nearly all of the examples in both tables are integrated relative clauses. But I think it's fairly clear that quantification of the head tends to predispose the choice away from which and towards that. At a minimum, I'd submit that this is a "semantic difference" that influences the choice between the two words, in contexts where both are fully grammatical. I hypothesize (without any evidence) that the influence arises because of some kind of psychological gradient of integration, where the process of intepreting the quantifier somehow binds the relative clause more tightly to its head, at least in processing terms, and therefore biases the choice away from which and toward that.

 

Posted by Mark Liberman at September 23, 2004 11:36 PM