April 30, 2006

Arabic machine translation from Google Labs

Franz Och at Google Labs has announced interactive sites where you can try Arabic-English and English-Arabic machine translation.

I tried a random story on the Al-Hayat web site. Cutting and pasting from the story worked pretty well: the first two paragraphs came out as

In a step considered a retreat in a Tehran crisis of the nuclear file, Mohamed Saidi, deputy head of the Iranian Atomic Energy Agency, his country's readiness to "answer all questions of the International Atomic Energy Agency, including a return to the application of the Additional Protocol to the sudden inspection of nuclear facilities in Iran ".

This coincided with the Organization of the Islamic Conference "deep concern" developments in the Iranian nuclear file. and the adoption of its claim "dialogue and peaceful means in resolving the dispute". As called for the Russian Foreign Ministry Iran suspended active enrichment, and to ensure "full cooperation" with the IAEA.

Since I hadn't read the news since the story came out about Iran's conditional offer of concession, I learned from reading this translation that something had happened, and could see roughly what the new development was.

For some reason, however, submitting the URL for that page didn't work for me, producing garbled text. However, submitting the URL of a random BBC Arabic page worked well, producing fairly readable output. Among the remaining problems, I noted a tendency to fail to deal appropriately with some VSO sentences:

He said the US delegate to the United Nations John Bolton that it is urgent to take firm action in this regard.

He warned Richard Brook former official in the Clinton Administration, The expected nomination for the office of the Ministry of Foreign Affairs if the Democrats win the presidency. to transform Iran into a nuclear power would mean exposing global stability at risk.

You can read some of the background of this work in these early Language Log posts:

The value of evaluation (7/30/2003)
NYT story on DARPA MT ... doesn't mention DARPA! (7/31/2003)

At that time, Franz Ochs was working with Kevin Knight at ISI, who is featured in the second post; and I believe (though I'm not certain) that the MT system whose output was quoted in the second post was one that Franz played a key role in creating.

In any case, you can see that there has been more progress since 2003.

[Full disclosure: Partha Pratim Talukdar, a Penn graduate student, was an intern at Google Labs last summer, working with Och's colleague Thorsten Brants, and I'm one of the authors of a paper resulting from that work: Partha Pratim Talukdar, Thorsten Brants, Mark Liberman and Fernando Pereira, "A Context Pattern Induction Method for Named Entity Extraction", CoNLL-X, June 8-9, NYC. ]

[Update: more detail about the Verb-Subject-Object issue can be found here.]

Posted by Mark Liberman at April 30, 2006 02:03 AM