Language Log: Which vs. that: integration gradation

September 23, 2004

Which vs. that: integration gradation

A few days ago, I rashly took up a syntactic challenge issued by Geoff Pullum.

Here's the backstory.

First, Geoff took Sidney Goldberg to task for promulgating falsehoods about English grammar, and criticized the National Review for publishing his uninformed pontifications without any linguistic fact-checking. One of the three (out of three) wrong grammatical points in Goldberg's screed was an alleged distinction between which and that. Geoff demolished the notion that "integrated" relative clauses (also known as "restrictive" relatives) require that (and prohibit which) by observing that six classic novels, the first integrated relative using which occurs on average about 3% of the way into the book.

Second, I drew Geoff's attention to a comment on a livejournal blog that said "okay, but it would be more fun to see stats on how often these Canonical Texts use each one in a ... restrictive way (and in what circumstances?), rather than flagging a single ... example from each text." Geoff responded with statistics from journalistic text, by Doug Biber and others, nailing his point beyond any reasonable doubt.

So far so good. But Geoff went on to argue that there's no point in "noting the thats and whiches and forming semantic hypotheses", because that "would amount to looking for a meaning difference that isn't there". Now, Geoff is a syntactician, and co-author of the monumental Cambridge Grammar of the English Language. I'm merely a phonetician who occasionally dabbles in practical text analysis. But my prejudice in such matters is that optional variants usually do have interestingly different distributions, and that "meaning" is usually part of the story, at least in a weak sense of the word.

There are two uncontroversial semantically-relevant distinctions between that and which in relative clauses in standard English. First, which can't be used with what CGEL calls "personal" referents -- "*the people which speak English" is not standard English. Second, that can't be used in "supplementary" (or "non-restrictive") relative clauses -- "her head, that was covered with a floppy straw hat" is unlikely if not impossible in contemporary standard English.

So I decided to look for what you might call ripples or echoes of those two distinctions, in contexts where that and which are both fully grammatical. I started by looking for evidence that the personal/non-personal distinction might have a non-trivial influence on the choice among that, which and who. I found several contexts where that is used much less often for "personal" referents than we would expect, based on the ratio of uses of who vs. which and similar considerations. This suggests, at least, that perhaps that has come to be tinged with a bit of "non-personal" meaning. I might venture (on no evidence whatsoever) to predict that this is an unstable situation, and that over time, we might find this tinge deepening and becoming categorical. At least, that's the sort of thing that sometimes happens in the history of syntax.

In this post, I'm going to take up the second idea, namely that perhaps which is tinged with a bit of "supplementarity", even in the context of integrated relatives. The idea here is to look at categories of "integrated" relative clauses that are in some sense more or less tightly "integrated", and see whether the difference in degree of integration affects the probability of using which (or who) vs. that.

Here's the idea that I started with. There are some kinds of relative clauses in which a quantifier or other operator binds the relative especially tightly to the intepretation of the syntactic head, e.g. "the only thing that trumps fear is greed". In contexts like this, which seems much less natural to me than that, though that still seems fully grammatical. Similar phrases without only seem somehow to bind the relative clause less tightly, and in consequence to be more amenable to which, e.g. "the thing that is really hard is giving up on being perfect."

Now, I can't offer any plausible logical analysis to cash in this intuitive impression of "binding more/less tightly". But it's easy enough to check the prediction about the relative probability of which and that in these contexts:

	*thing*	*things*	total	place	places	total	grand total
the only __ that	944,000	82,800	1,026,800	61,100	5,890	66,990	1,093,790
the only __ which	38,500	3,280	41,780	1,980	295	2,275	44,055
that/which ratio	24.2	25.2	24.6	30.9	20.0	29.4	24.8
the __ that	658,000	2,300,000	2,958,000	210,000	120,000	330,000	3,288,000
the __ which	66,300	201,000	267,300	70,200	12,500	82,700	350,000
that/which ratio	9.9	11.4	11.1	3.0	9.6	4.0	9.4

The table above shows counts for the words thing(s) and place(s) in the contexts "the only __ that/which" and "the ___ that/which". (Note that with very few exceptions, all of the relative clauses found would count as "integrated" by anyone's standard -- these results cannot be explained directly by the integrated/supplementary distinction). Across these cases, the ratio of that to which is 24.8 when only is present, and 9.4 when it isn't. Q.E.D.

This diffence seems to be something particular about that vs. which. The other personal relative pronoun, who, doesn't seem to be affected nearly as much:

	*people*	*group*	*category*
the only __ that	83,500	15,200	2,320
the only __ who	381,000	2,590	10
the only __ which	320	1,640	301
that/who ratio	0.22	5.9	232
that/which ratio	260.9	9.3	7.7
the __ that	1,710,000	635,000	101,000
the __ who	7,740,000	118,000	585
the __ which	63,900	210,000	82,900
that/who ratio	0.22	5.4	173
that/which ratio	26.8	0.56	1.2

Nouns like people, group and category can have to personal as well as non-personal referents, and so occur in reasonable numbers with who as well as which and that, as the above table shows. But the that/who ratio is only slightly increased by the presence of only (between 0 and 34% in these examples), while the that/which ratio is much more strongly affected (between 642% and 1,661%).

The table below summarizes the effects of only on the that/which ratio of five different cases:

	thing(s)	place(s)	people	group	category
the only __ (that\|which) [that/which ratio]	24.6	29.4	260.9	5.4	7.7
the __ (that\|which) [that/which ratio]	11.1	4.0	26.8	0.56	1.2

As crude support for the idea that other sorts of quantification of the head have a similar effect, compare the following two tables.The first one looks at a variety of quantifiers with things as head and a definite article present, where the that/which ratios vary from 17.2 to 41.6:

	that	which	that/which ratio
the only things	82,500	3,280	25.2
all of the things	63,700	1,530	41.6
all the things	299,000	15,700	19.0
some of the things	217,000	7,960	27.3
few of the things	24,100	633	38.1
the few things	29,100	761	38.2
the three things	10,100	588	17.2

Now we look at "the things" (without additional quantification) as the NP in a variety of prepositional phrases, where the that/which ratios vary from 2.7 to 13.2:

	that	which	that/which ratio
for the things	53,500	9,400	5.7
to the things	42,800	7,990	5.4
from the things	18,800	6,860	2.7
with the things	24,800	1,880	13.2
by the things	21,400	6,930	3.1
because of the things	3,860	498	7.8
without the things	910	86	10.6

Again, nearly all of the examples in both tables are integrated relative clauses. But I think it's fairly clear that quantification of the head tends to predispose the choice away from which and towards that. At a minimum, I'd submit that this is a "semantic difference" that influences the choice between the two words, in contexts where both are fully grammatical. I hypothesize (without any evidence) that the influence arises because of some kind of psychological gradient of integration, where the process of intepreting the quantifier somehow binds the relative clause more tightly to its head, at least in processing terms, and therefore biases the choice away from which and toward that.

Posted by Mark Liberman at September 23, 2004 11:36 PM