May 01, 2004

Speakers vs. hearers

It's an old idea that speech and language are a compromise between the need for a clear message and the desire to save effort and time. For example, word frequency and word length are inversely correlated -- common words tend to be short words -- so that a pronouncing dictionary is a rough sort of Huffman code.

The evolution of word pronunciation is a large-scale process, emerging from millions of particular communicational transactions between individual speakers and hearers. The same must be true for all of the other norms of speech and language. But each individual utterance is still a sort of compromise, a complex optimization on many dimensions -- how much to assume and how much to explain? what words to choose and how to combine them? what order to put things in? how fast and loud to talk? how carefully to articulate?

It's often assumed that speakers make these choices in a way that gives a lot of weight to the needs of their listeners. After all, the point is to be understood, right?

However, there's some evidence that this is often false. A recent contribution is "Avoiding Attachment Ambiguities: the role of Constituent Ordering", by Jennifer Arnold, Tom Wasow, Ash Asudeh and Peter Alrenga (to appear in Journal of Memory and Language). They studied the choice of structures and order of constituents in potentially ambiguous English sentences with both a direct and an indirect object, like "John showed the letter to Mary to her mother."

They show that speakers don't make the choices that would make things easier for listeners (and that the speakers themselves prefer when they are put in the listener's role). Instead, speakers act in their own interests, making choices that decrease the cognitive load of sentence planning.

I believe that non-specialists will find the paper easy to follow, and outsiders to psycholinguistics may find it thought-provoking for two different sorts of reasons.

First, there's the issue of how to model language evolution. There's a sort of economic problem here -- given that the utilities of cooperative speakers and listeners are different (as Arnold et al. show), how does optimization of communication continue to exert an influence, at least at a large scale, on the ways that languages develop? I don't mean that it's hard to think of ways to make this work out -- on the contrary, there are lots of choices, and the point is to explore them theoretically and empirically.

Second, there are obvious implications for teaching people how to communicate, in writing as well as in speaking. Arnold et al. point out that

Language production involves generating an utterance from a non-linguistic message. The message is never ambiguous to the speaker; the only way to identify the ambiguity is to consider how someone else would interpret the message in the current context. This would require passing the planned utterance through the comprehension system, while ignoring the known intended meaning. The production system would have to be sensitive to the degree of temporary parsing difficulty associated with an ambiguous prepositional phrase, and use that information to drive decisions about ordering and prosody. It is not clear that the language production system is built to handle this kind of task. Although language production clearly involves monitoring at some level..., the clearest application of these monitors is to the process of identifying and correcting errors. Ambiguities are not errors per se, and may require more sophisticated machinery for identifying them.

The same asymmetry between producer and consumers hold of other problems besides structural ambiguity -- reference resolution is another obvious example. Effective speakers and writers need to learn to overcome this asymmetry, or at least to compensate for it in some way.

Posted by Mark Liberman at May 1, 2004 10:50 AM