April 13, 2004


Computational linguists sometimes try to make up short sentences that would give a parser a lot of trouble by virtue of ambiguities and other seeds of confusion. TIME FLIES LIKE AN ARROW was a famous invented case. The San Francisco Chronicle headline today struck me as being just about as difficult a natural example as I'd ever seen, except for people who had exactly the right background information at the ready:


Junk bonds? Neck ties? Aprils, Mays, Junes? What's going on here? The first and second words could be either plural nouns or singular-inflected verbs. The third can be either a month name or a modal verb, but in neither capacity does it normally have an "S" on it... One can see how the parser might gag.

Somewhat to my surprise, I got the correct interpretation instantly, but then I live in the greater San Francisco Bay Area. I think some natural language processing systems and some non-Americans might have had a few CPU seconds of trouble with it.

For sports fans, the huge number "660" beside the headline told all. But for the benefit of those in England or Australasia, and the NLP systems who read Language Log, and those who possess even less knowledge about sport than I do, the key is that baseballer Barry Bonds has just hit his 660th home run, so he is now tied with his godfather, Willy Mays, at a number that only the legendary Babe Ruth and Hank Aaron have ever exceeded in the history of baseball. A lot of people had been waiting with bated breath for this to happen as Bonds lingered on the brink at 559 home runs, and for them — and for you, given the information I just supplied — pragmatics rides to the rescue. As it so often has to do.

Posted by Geoffrey K. Pullum at April 13, 2004 06:29 PM