September 28, 2004

331 extra linguists are not enough

According to an article by Eric Lichtblau in today's NYT, the FBI still has some serious translation problems:

Three years after the Sept. 11 attacks, more than 120,000 hours of potentially valuable terrorism-related recordings have not yet been translated by linguists at the Federal Bureau of Investigation, and computer problems may have led the bureau to systematically erase some Qaeda recordings, according to a declassified summary of a Justice Department investigation that was released on Monday.

If you do the arithmetic, this doesn't seem very surprising. The article explains that "the number of linguists at the F.B.I. rose to 1,214 as of April 2004 from 883 in 2001, with sharp increases in the number of translators of Arabic, Farsi and other languages considered critical to counterterrorism investigations," meaning 331 extra translators overall. But if there was a backlog of 120,000 hours to transcribe (?) and translate, and if it takes ten translator-hours per hour of audio to do whatever they do with it, then it would take ((1,200,000/331)/40)/50 = about 1.8 years to catch up. I have no idea what the FBI's procedures are with respect to such material, but I would guess that 10 person-hours per hour of audio is a low estimate; and 40 hours/week, 50 weeks/year are surely high estimates for the amount of time translators can spend doing their core job, as opposed to attending planning meetings and training sessions and so on. And if the system was building up a backlog before, then some of those extra 331 translators must be helping to keep up with new stuff as well. And the overall backlog is apparently more than 500,000 hours, with the 120,000 hours just in counter-terrorism operations.

Of course, the FBI's translation people are also spending time dealing with this type of stuff, discussed in an article by Lichtblau back in July. It sounds like a hard row to hoe.

A note of caution: NYT stories dealing with things that I actually know something about are often misleading, to the point sometimes of being tantamount to falsehoods. For example, this 9/13 NYT story by Steve Lohr on IBM's open-sourcing of some speech-technology software, led to this angry rebuke to IBM from Fernando Cassia at the Inquirer. But as far as I can tell, Cassia is mostly slamming IBM for things they never said, but which were implied by Lohr's misleading NYT article, which Cassia cites as his ("extremely vague") source for the claims he's complaining about. I was being polite when I called Lohr's article "somewhat misleading" in my 9/13 post -- really, looking over it again and looking at the trade-press reaction exemplified by Cassia, I would judge now that the article was incompetent, written by someone who apparently has very little understanding of the technical area that he was writing about.

I'm not singling the NYT out because I think they're especially bad -- I'd say that they're still the best single source for general news that's available to me. To express my feelings about that fact, I can only quote André Gide's famous response when asked to name the greatest French poet: "Victor Hugo, hélas."

As a result, I can't accept the facts and (especially) the implications of Lichtblau's article about FBI translation problems as necessarily being true. I don't know any particular reason to doubt what he wrote this morning, although I would have liked to have seen some information about what these recordings actually are (some are presumably interview or interrogation tapes, some are presumably wiretaps?), what it is that the FBI translators actually need to do to these recordings (presumably it varies, but what is the range of treatments needed?), some reactions from commercial translation professionals about the scope, nature and prospects of the FBI's problems, etc. It seems as if Lichtblau's main goal is to demonstrate that the FBI has serious problems, without really clarifying to any significant extent what those problems are and what would be needed to solve them. (In fairness to Lichtblau, his story was prompted by the release yesterday of an edited form of a Justice Department report on the problems, and what he presumably did was mainly to read the released report and summarize it, with some (predictably outraged) quotes from relevant politicians. On the other hand, he's been on this beat for some time now, and has written at least one earlier story -- the one back in July -- on translation at the FBI.)

In the case of something like IBM's speech technology software, there are lots of alternative sources of information, and someone who knows the field and cares to investigate can figure out what's really going on. But in the case of the FBI's translation (and other) problems, most of the crucial information is probably classified, and most of the rest is locked up behind bureaucratic walls. So all I can really do is to keep an open mind, and accept whatever information is available with a grain or two of salt. It would be nice if more journalists were less committed to advocating a chosen narrative, and more competent in evaluating the information available to them.


Posted by Mark Liberman at September 28, 2004 09:05 AM