This is a story with a moral. It shows that a simple, cheap, quantitive measure of quality -- even one that is obviously flawed -- and a commitment to improving performance on that measure -- even over a relatively short time -- leads to improvement. Real improvement, not just improvement in terms of the flawed metric
About two years ago, Salim Roukos and others at IBM suggested a remarkably simple method for evaluating translation quality: just count the number of words and word sequences ("n-grams") in common between a translation to be tested and a set of reference translations. They named this metric "Bleu", and they showed that despite its obvious flaws, it correlates well with human evaluations, not only for (generally poor) automatic translations, but even for human translations of varying quality.
DARPA researchers quickly adopted (a version of) this metric for TIDES MT research, as described in this NIST report.
As predicted by those who believe in the value of quantitative evaluation for "language engineering", the result has been an extraordinary improvement in the quality of machine translation. In the 2002 TIDES MT evaluation, the best research system for Arabic-to-English translation scored at 51% of human translation performance as measured by the NIST metric, while the best commercial system scored 57%. In the recent 2003 evaluation, the best research system scored 89%, while the best commercial system was at 58%.
The improvement can be seen in qualitative terms by reading some samples:
2002 System:
insistent Wednesday may recurred her trips to Libya tomorrow for flyingCairo 6-4 ( AFP ) - an official announced today in the Egyptian lines company for flying Tuesday is a company " insistent for flying " may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment .
And said the official " the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air , a situation her receiving replying are so a trip will pull to Libya a morning Wednesday " .
2003 System:
Egyptair Has Tomorrow to Resume Its Flights to LibyaCairo 4-6 (AFP) - said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flights to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya.
" The official said that the company had sent a letter to the Ministry of Foreign Affairs, information on the lifting of the air embargo on Libya, where it had received a response, the first take off a trip to Libya on Wednesday morning ".
Human Translation:
Egypt Air May Resume its Flights to Libya Tomorrow
Cairo, April 6 (AFP) - An Egypt Air official announced, on Tuesday, that Egypt Air will resume its flights to Libya as of tomorrow, Wednesday, after the UN Security Council had announced the suspension of the embargo imposed on Libya.
The official said that, "the company sent a letter to the Ministry of Foreign Affairs to inquire about the lifting of the air embargo on Libya, and in the event that it receives a response, then the first flight to Libya, will take off, Wednesday morning."
Posted by Mark Liberman at July 30, 2003 08:51 AM