May 19, 2004

Google and WSD

Word sense disambiguation (WSD) is one of the most venerable topics in natural language processing, going back to the earliest days of computing. Back in 1947, writing to Norbert Wiener about automatic translation, Warren Weaver commented on "the semantic difficulties because of multiple meanings" (see Hutchins's very nice historical discussion of MT systems).

Many of us who work on the topic of word sense disambiguation (and plenty of people who don't) have been frustrated by the fact that over the years WSD algorithms have had relatively little impact in real natural language applications, either because the algorithms don't perform well enough yet, or because problems of word ambiguity are dealt with implicitly rather than explicitly. (See, e.g., the deservedly well known paper by Krovetz and Croft on lexical ambiguity in information retrieval.)

We keep the faith, though, in part because new NLP applications like question answering seem to have a greater need for dividing the world up into semantic categories. A recent discussion of Google's GMail by a pilot user stirs the blood: Steve Bass writes, "So far, many of the ads I've seen have been wildly inaccurate: For example, promoting glass windows when I talk about Windows...".

WSD just has to make a difference, dammit, it just has to make a difference, it just has to...

Posted by Philip Resnik at May 19, 2004 01:55 PM