July 03, 2006

We are the Inge Borg. You will be Inge borrowing.

I hope that Google's excellent machine-translation researchers will soon apply to German-English MT the talents that have produced such rapid progress in Arabic-English translation. In composing a post on the award of the Ingeborg Bachmann Prize to Kathrin Passig, I considered linking to Google Language Tools translation of the German-language Wikipedia entry "Ingeborg-Bachmann-Preis". Unfortunately, the results are pretty confusing, starting with the title, "Ingeborg-Bachmann-Preis", which should be "Ingeborg Bachmann Prize", but comes out as "Inge borrowing brook man price".

The translation system has decided that Ingeborg, rather than being just a plain old name, is actually a compound of Inge and some (I think morphologically unlikely) form of borgen "to borrow". Likewise the name Bachmann is broken into Bach+mann and translated as "brook man". (This is yet another piece of evidence that MT systems would do well to make use of what computational linguists call "named entity tagging"). The decision to translate preis as "price" rather than "prize" can be seen as a failure in another traditional dimension of computational linguistics, namely "sense disambiguation" -- though for modern statistical MT systems, all dimensions of language are sometimes viewed through the same set of algorithmic spectacles.

The name-translation problems continue in the first sentence of the article, where the phrase "jährlich in Klagenfurt (Kärnten) in einer mehrtägigen Live-Veranstaltung ermittelt" (which should be something like "awarded yearly in a multiple-day live presentation held in Klagenfurt (Kärnten)"), comes out as "annually in complaint ford (Kärnten) in a live-meeting of several days determined".

And things don't get better in the rest of the translated article, so I wound up linking to the corresponding English-language Wikipedia entry instead. "Complaint ford" is an appropriate venue for discussing current MT technology, alas.

I think (I hope!) this means that the German-English translation engine is still whatever commercial off-the-shelf ("COTS") system Google has licensed for this purpose. Come on, Franz, you folks can do better!

[Stefano Taschini writes:

Leaving aside the political and sociological issues of toponym translation, I think that Kärnten is usually referred to as Carinthia in English.

Indeed.]

Posted by Mark Liberman at July 3, 2006 08:11 AM