One of the most interesting things here at LKR2004 was a talk by Jerry Wright, Alicia Abella and Al Gorin, from AT&T Labs, on the topic of "Speech and Dialog Mining." Jerry (who gave the talk) surveyed a range of ways to analyze the logs of customer interactions in order to find and diagnose problems, with the goal making future interactions work better. The thing that interested me most was what they called "dialog trajectory analysis." This is interesting because of the techniques they've developed and the enormous scale of the data they've used them to explore, but it's also interesting because it highlights the apparent complexity of the problems that we humans solve every time we communicate successfully with one another.
Wright et al. build on an old idea, now commonplace -- representing a dialog schema in terms of a finite automaton. By a "dialog schema" I mean an abstract characterization of a set of possible (human-machine) dialogs. It's conventional to represent this as a network of interconnected nodes (for machine actions such as reading a prompt or querying a database) and directed arcs (for responses from users or from database queries). Any particular interaction is a path through this network. Here's a simplified form of a schema for a sub-dialog to get a phone number, from their paper:
Most dialog schemata are much more complicated that this. Here's a picture of part (less than 20%) of the "what is the problem" sub-dialog from an AT&T "trouble ticket" application (again from their paper):
The arcs highlighted in red and blue have been picked out by the "dialog trajectory analysis" I mentioned. The algorithm analyzes data from millions of passages through the network, selects particular classes of undesired outcomes, and looks (statistically) for arcs that seem to have a "causally significant" relationship to those outcomes. Those arcs may be quite far away in the graph. This kind of analysis sounds interesting -- it's too bad that no significant amount of data of this kind is generally available, outside of the companies whose systems generate it.
The more general point that I wanted to make is just how complicated even very simple, stereotyped dialog schemata quickly become. It's reasonable to suppose that human mechanisms for planning and managing communicative interaction are different -- but what are they, then, and how do they work? One natural idea, central to the "classical AI" explorations of this question, is that a kind of logical reasoning is involved, where the initial premises include one's own model of the world and communicative goals, and relevant aspects of a model of the mental state of one's interlocutor, all updated dynamically as the interaction goes forward. This has the advantage that the number of states in the corresponding transition graph (if one continues to look at things that way) can become astronomically large, or even infinite, depending on how things are parameterized, while the relevant knowledge can be succinctly respresented, and seems to be independently needed anyhow. And as new things are learned, or the situation changes, the implicit "dialog schema" graph should shift in global ways to accomodate the new information. However, as I've mentioned here before, attempts to model dialog in these terms have failed to be able to scale to handle non-trivial cases.
It feels to me as if something basic is missing from this discussion. Perhaps it's analogous to Descartes' discussions of consciousness: he wrote 350 years ago about what "clockwork" couldn't do, without any real understanding of what abstract information-processing mechanisms might really be like -- just as Chomsky wrote in 1957 about what statistical models couldn't do, without understanding such models at all. It seems silly to us today -- well, at least it seems silly to me -- to worry whether conscious intelligence can be modeled with the kind of clockwork mechanisms that 17th-century inventors built. (And yes, I know that you can build a clockwork computer, and Babbage designed one, but I think the point still stands). Will it seem just as silly to cognitive scientists of the future that in the early 21st century, we worried about whether human communication can be modeled with finite automata, or with dynamic logic, or with various other frameworks that don't really work, without having a clue about... what?
Posted by Mark Liberman at March 9, 2004 01:32 AM