March 24, 2008

Why Austria is Ireland

There has been a lot of activity here today in the great research center at One Language Log Plaza. People are running up and down the corridors showing each other new examples of Google's purportedly eccentric translation behavior. The Google translation algorithms perform strange substitutions involving European country names and language names. Among these are replacements of Ireland for Austria, and also sometimes Canada for Austria. I am rather surprised that none of the excited people falling over themselves in the corridors have noticed the obvious generalization.

The Google translation engine is of course a brute-force statistical scheme based on massive amounts of compared bilingual text, and it is quite insensitive to actual meaning. Notice that in one case the algorithm produced a text asserting that the Parliament of Canada meets in Vienna and in another the output text said that Vienna is in Ireland, but only if there were three question marks after the word Austria in the input. The translation algorithms clearly know nothing of politics, geography, or sober punctuation.

In my opinion, what is being statistically detected by the pseudo-translation algorithms is the blindingly obvious relation that holds between the relevant pairs. Think about it: In what respect is it that Ireland is to the UK (for British English speakers) as Austria is to Germany (for Germans), and also as Canada is to the USA (for American English speakers)?

The relevant relation is the one that country A bears to country B when (a) the two are adjacent, (b) A is somewhat looked down on by B, and (b) A uses the same language as B, but in what is regarded (by the citizens of B) as a recognizably different and inferior, or risible form.

I would therefore predict Google translation errors involving other such pairs: A = Belgium, B = France; A = Belgium, B = Holland; and A = New Zealand, B = Australia.

Certain language-name substitutions have also been noted; among them is Spanish for German, produced in a German-to-English translation performed on English text. Notice that Google thought it was reading a German text (about Austria) and translating it for English speakers, though the alleged German was really English. Now, in what way is Spanish for English speakers like German for German speakers?

It should be obvious. The relation is the one holds between a language L and a nationality N when L is the language people of nationality N are most likely to hear spoken around them when they are away on a foreign vacation.

These questions are simple enough if you just think logically.

Posted by Melvyn Quince at March 24, 2008 01:53 PM