March 28, 2004

Bad named entity algorithms at the Gray Lady?

The first paragraph of a story in today's NYT by David Carr, entitled "Casting Reality TV becomes a Science", reads, in the online version, like this:

In a suite high above Columbus Circle, Rob LaPlante is looking for next season's breakout television star. There is no agent hovering nearby, no technical crew, just Mr. LaPlante, his assistant and a digital video camera, auditioning Laura Fluor, a car saleswoman from Monmouth County, N.J.

The hyperlink on Laura's last name "Fluor" leads to a page about the Fluor Corporation on the NYT business site, giving us the standard NYT "Company Research" treatment: share price and price history information, a thumbnail description of the company's business ("The Group's principal activities are to provide professional services on a global basis in the fields of engineering, procurement, construction and maintenance...") a list of the latest insider trades, and so on. A similar page is available for any company traded on the major stock exchanges.

There is absolutely nothing in the original Carr article to lead us to believe that Laura Fluor has anything at all to do with the Fluor Corporation. I can't imagine that the writer, an editor or even any human hyperlinker would think that this link was appropriate. So either someone is having a little joke, or the NYT's online site is running some company-name-recognition software that needs work. The state of the art for "entity tagging" is far from perfect, but it's better than this.

Posted by Mark Liberman at March 28, 2004 04:21 PM