December 07, 2003

Discourse: tangle or branch?

About a month ago, I posted a discussion of approaches to describing discourse structures. The occasion was the web availability of a paper by Florian Wolf and Ted Gibson, arguing that "trees do not seem adequate to represent discourse structures," and the forthcoming publication of a "treebank" (annotated corpus) embodying their own proposal for representing the network of relationships among phrases in a coherent text. Wolf and Gibson specifically argue against earlier work in Rhetorical Structure Theory (RST), including the RST Discourse Treebank by Lynn Carlson, Daniel Marcu and Mary Ellen Okurowski, which was published last year.

Now Daniel Marcu has written a thoughtful and thought-provoking response to the Wolf/Gibson paper. Anyone interested in these questions should read Marcu as well as Wolf and Gibson. And if you're interested in language, trust me, you should be interested in this stuff.

As I wrote last month, this whole situation is wonderful. Just a few years ago, although we had many interesting theories about structure and meaning above the sentence level, there was no model for discourse coherence that was defined in enough detail, and exemplified extensively enough, that someone like me could figure out how to apply it to new texts with reasonable confidence. Now we have two! And the authors of these different approaches are using their extensive descriptive work to try to address fundamental questions in an empirically responsible way, combining methods from linguistics, psychology and engineering as appropriate. They are also engaging one another's work seriously, respectfully and creatively. This is rational investigation of language as it should be done.

It's clear that neither of these approaches has all the answers, and it's quite likely that they haven't yet even found all the questions. However, this is the kind of investigation that has a chance to solve the problems in the end, while bringing further enlightenment along the way.


Posted by Mark Liberman at December 7, 2003 10:03 PM