November 25, 2005

Going it together

In a post earlier today, Geoff Pullum observed that his grammatical intuitions don't countenance the forms goes, went or gone in the idiom "go it alone". I agree with Geoff about "went it alone", but my intuitions about goes and gone are different. Thus I don't notice any grammatical difficulties when I read that Tony Blair told the House of Commons on March 12, 2003

"What is at stake here is not whether the US goes it alone or not, but whether the international community is prepared to back up the instructions it gave to Saddam Hussein..."

or when I learn that George W. Bush told CNN on August 13, 2004 that

"I think to say we've gone it alone really does denigrate the contributions of other countries."

However, I know better than to trust my grammatical intuitions very far. If Geoff were nearby, I'd suggest that we try his proposal to measure whiskey-sipping rates as a proxy for grammaticality judgments. I can see all sorts of practical difficulties in experimental design, but over the course of a weekend, I have no doubt that we could resolve them. Since he isn't here, I'll have to fall back on the easiest grammatical proxy to explore from one's armchair: corpus frequency.

Rather than fight my way through the difficulties with using Google counts for this purpose, I decided to use the counts from a corpus of 2.84 billion words of news text available at LDC Online. My method is to compare the relative frequency of various forms of to go in the idiom "go it alone" to the frequency of the same forms in other frames, such as "go shopping", "go fishing" or "go home".

 
go
going
goes
gone
went
__ it alone
3,154
693
112
34
45
__ shopping
1,547
306
129
86
676
__ fishing
888
275
99
246
314
__ home
26,222
6,030
1,409
1,718
6,226

Note that these numbers are consistent with the counts that Geoff got on a smaller news corpus: 48 for "go it alone", 8 for "going it alone", and zero for "goes it alone, "gone it alone" and "went it alone". However, with a corpus of several billion words rather than several million words, we have a more powerful grammatical telescope, so to speak.

Note also that the first row of the table provides little comfort for my intuitions -- "went it alone" (which feels ungrammatical to me) is actually 32% commoner than "gone it alone" (which feels grammatically OK to me). However, that's not the end of the story. Since the base rates of the frames are quite different, let's express the counts as a fraction of the count for "go it alone". Plotting the result, we can see that "went it alone" is enormously less frequent that we would expect, given the relatively frequency of went in other frames.

In fact, the relative frequencies of go, going, goes and gone are similar across the four frames, except that "gone fishing" (which is a fixed expression in its own right) is commoner than the others.

Does this mean that my perceptions of relative frequency are closer to the truth than Geoff's? Not really: if we look more closely at the table of relative frequencies, we can see that goes and gone are quite a bit rarer in the frame __ it alone than before shopping, fishing and home. Perhaps Geoff is just setting his judgmental thresholds at a more discriminating level than I am:

 
go
going
goes
gone
went
__ it alone 1 0.220 0.036 0.011 0.014
__ shopping 1 0.198 0.083 0.056 0.437
__ fishing 1 0.310 0.111 0.277 0.354
__ home 1 0.230 0.054 0.066 0.237

We can see this better in the graph if we use a log scale on the y axis:

[You shouldn't trust this analysis too far -- we need to investigate the cross product of more frames and more verbs, and we should fit a model, not just look at some tables and graphs. And the results should definitely be cross-validated with those whiskey-sipping rates that Geoff mentioned.]

Posted by Mark Liberman at November 25, 2005 08:14 PM