March 29, 2004

Perl dictionary hacking

There's an interesting-looking article at perl.com by Sean Burke on how to render a dictionary represented in Shoebox format. I think that Burke's introduction rather exaggerates the general cluelessness of field linguists, many of whom are capable programmers themselves, or have previously teamed up with programmers to do similar things; but the article (which I haven't had time to read carefully yet) looks like it offers a good tutorial on how to use HTML or RTF to render a simple dictionary database for printing or on-screen reading.

As some of the (many available) examples of prior (and perhaps better) art, take a look at Bill Poser's lecture notes on extracting fields from Shoebox dictionaries using AWK (which unlike Burke's program, handles the case where there are repeated tags within an entry), or his paper "Lexical Databases for Carrier", or his "Poor man's Web Dictionary", which provides a working example of a simple pure HTML (no CGI, no database) lexicon generated automatically from a Shoebox database, together with the code necessary to generate it. Although simple, it includes audio and images.

Posted by Mark Liberman at March 29, 2004 12:46 PM