December 28, 2003

Names of smells

As a grown-up version of Bertie Botts' Every Flavor Beans, Demeter offers perfumes in fragrances like dirt, crust of bread, sawdust and laundromat, as well as tobacco, condensed milk, Earl Grey tea and cranberry, and more traditional things like patchouli, honeysuckle and sandalwood. I don't quite get it. Do people buy these as a joke gift, for the incongruity of a fancy bottle of perfume labelled mildew -- that actually smells like mildew? Or do they buy them because they really want to go around emitting wafts of turpentine or lobster? Or is it because they get a proustian rush from privately uncorking their bottle of sticky toffee pudding or stable?

Anyhow, Demeter's list of currently available fragrances suggests a problem in computational linguistics: devise an automatic algorithm that analyzes a very large text corpus to derive a comparable list of "names of things with evocative smells". (In fact one should be able to do better, since Demeter's list is not really very long, systematically omits highly offensive smells like cat piss and rotten eggs, and includes some odorless oddities like holy water ... ) This problem in itself is not important, but it's an instance of an interesting class. It would be nice, for instance, to be able to process biomedical text so as to derive a list of names of structural proteins, or diseases of domestic pets, or insects implicated as disease vectors, or whatever.

[link via join-the-dots]

Posted by Mark Liberman at December 28, 2003 09:35 AM