November 13, 2003

Borges on metadata

A couple of days ago, I sketched the reasons why I don't think that the semantic web and similar efforts will get rid of the need for automatic information extraction from text. In thinking through these questions, we should ponder (or at least enjoy) what Jorge Luis Borges had to say on the subject, some time around 1929. Even if you could care less about metadata and information extraction, you should treat yourself to Borges' essay. (Note: there is an English translation down below the Spanish version).

The end of the essay:

Leaving hopes and utopias apart, probably the most lucid ever written about language are the following words by Chesterton: "He knows that there are in the soul tints more bewildering, more numberless, and more nameless than the colours of an autumn forest... Yet he seriously believes that these things can every one of them, in all their tones and semitones, in all their blends and unions, be accurately represented by an arbitrary system of grunts and squeals. He believes that an ordinary civilized stockbroker can really produce out of his own inside noises which denote all the mysteries of memory and all the agonies of desire"

Borges' piece is entitled El Idioma Analítico de John Wilkins. When Borges wrote it, Wilkins and his "Essay towards a real character and a philosophical language" had largely been forgotten. Indeed Borges starts by observing that that the 14th edition of the encyclopedia Britannica (published in 1929) had abandoned the entry on Wilkins, which was only 20 lines long in the previous edition.

Wilkins is back in the limelight today, as he is an important character in Neal Stephenson's massive new historical novel Quicksilver, which has sparked a lot of interest in the intellectual history of the 17th and 18th centuries among digerati who might otherwise not have realized that they cared about anything before 1995.

I put up the Idioma Analítico page in the winter of 2000, when a group of people from around the world (mainly the U.S. and Europe) were working through the ideas that turned into OLAC, the Open Language Archives Community. The OLAC Metadata set is a modest set of extensions to the Dublin Core, useful for cataloguing language-related archives of various types. Early OLAC suffered the usual stresses caused by enthusiasts inspired by the vision of a Philosophical Language. I thought that a small dose of Borges might help us avoid biting off more ontology than the project implementation could plausibly chew and digest.

Posted by Mark Liberman at November 13, 2003 12:18 PM