March 02, 2004

Clairvoyance? No, just utterance processing

I hopped into the car for a ride to the UCSC campus just after 9 a.m. this morning and as Barbara started the engine the radio came up. She didn't want the radio on for her commute to San Jose, so (driver's privilege) she hit the button and popped it off immediately. There was only time to hear a familiar voice saying "vanities".

"It must be Tom Wolfe's birthday," I said. And indeed it is. And suddenly I thought, how on earth did I know that?

Well, the voice was Garrison Keillor's. At 9 a.m. every day he has a little 5-minute piece called ‘The Writer's Almanac’ on our local public radio station, KAZU, to which the car radio is usually tuned. The program begins by listing some famous authors whose birthday is that day. Just before the one word I caught, there was a hint of something like vth. It sounded like I had just heard the end of Garrison Keillor saying of the vanities.

Now, vanity is a non-count noun, only very rarely used in the plural. The only salient place anyone is likely to have heard it is in the title of the novel The Bonfire of the Vanities. The author of that novel was Tom Wolfe, who is amply famous enough to get a mention on ‘The Writer's Almanac’. What I had realized instantly, without any real processing time at all, was that Garrison must have included Wolfe in the list of birthday identifications for today.

Now that is how natural language works. Naive accounts talk about using sentences to express messages; naive teachers tell children to answer questions with full sentences; naive models of sentence processing assume we listen until the last word, use the grammar to verify grammaticality, and select one of the possible meanings as the most likely one to convey appropriate information in the context. But all I heard was Garrison Keillor's voice saying perhaps most of a preposition phrase, of the vanities, and really only the last word of it. For me there was no sentence. From a single second's access to just part of a part of a sentence, I was able to identify the speaker from the voice quality, spot the word, reconstruct the phrase, and make a comment which relied on my having guessed the truth conditions of the rest of the sentence (probably "Today is the birthday of Tom Wolfe, author of the novel The Bonfire of the Vanities, which..."). Speaker recognition, phonetic identification, phonological analysis lexical lookup, morphological analysis (spotting that vanities was in the plural), syntactic parsing, semantic interpretation, and pragmatic implications, all happening simultaneously and virtually instantly. Just a few seconds in the life of a speaker. Everyone is doing this sort of thing all the time.

If we could write a computer program to reliably model the syntactic analysis of complete sentences with no errors as presented in written form, perhaps with the literal meaning attached, but without any sensitivity to context, that would be a great achievement; but it would be nowhere near a computational account of what human beings are actually doing with their linguistic knowledge all the time, every day. As much psycholinguistic work by Ray Gibbs has shown, we bounce from partially-heard fragments in complete inferential leaps to brand new information on the basis of pragmatically conveyed propositions, as if extra reasoning over and above the grammar and the literal meaning took no time whatsoever. How we do that is something that — after fifty years of fairly intensive linguistic and psycholinguistic work and two or three decades of increasing attention to computational linguistics — we barely even have glimmerings of.

Posted by Geoffrey K. Pullum at March 2, 2004 02:51 PM