September 27, 2005

Multilingual Google

I've mostly been using the Catalan version of Firefox as my browser until recently, but when I decided to install a new version a few days ago I decided to try another language, so I'm now using the Japanese version. Which version you use is independent of the language preferences you set, though, so I'm experiencing a bit of linguistic dissonance with the menu labels and status bar messages in Japanese but most other things, including web pages if the choice is available, still in Catalan.

One site that is available in many languages is Google. With my preferences set the way they are, my Google interface is in Catalan. Here's what it looks like:

The Catalan Google interface in the Japanese localization of Firefox

If you click on Eines d' idioma ("language tools"), you get to a page that lets you choose what languages you want to search pages in and what language you want for the interface.

The list of Google interface languages in Catalan

The list is pretty impressive - there are 116 entries There are a few that aren't real natural languages: Elmer Fudd , Hacker, and Pig Latin, and one that isn't exactly a human language: Klingon, as well as two artificial languages, Esperanto and Interlingua, but that still leaves over 100 natural human languages, some of them not so well known, such as Kazakh and Tongan. There is even one Native American language, Guarani, the language spoken by most Paraguayans. On the Catalan page for some reason Guarani is called Tupi-Guarani, which is the name of the language family to which it belongs. I don't think I've ever read anything in Catalan about native American languages so I can't say for sure that Guarani isn't called Tupi-Guarani in Catalan, but I doubt it. The English, Spanish, and Kazakh pages just call it Guarani. This looks like a mistake to me.

Using Google in another language is a fun way to try out a language you don't know real well. It's easy to switch to a language you do know well if you get stuck and it isn't all that complicated.

I do have one small complaint (beyond the fact that they don't yet have all of my favorite languages), which is that they are evidently sorting the list of languages the same way no matter what language they are in, in the order of the Unicode codepoints. This yields unexpected results.

For example, on the Catalan list Arabic comes last, after Zulu, because the Catalan word for Arabic is Àrab and the À, whose Unicode codepoint is 0x00C0, follows all of the ASCII letters. Z is 0x005A. If Google really wanted to do things right, they would sort the names using the appropriate collating rules for each language.

Posted by Bill Poser at September 27, 2005 11:20 PM