October 30, 2003

English, Portuguese, Polish, Farsi, French,...

Could it be true?

A note at Language Hat references the NITLE blog census histogram of the languages used (in the roughly 1.5M weblogs surveyed), and expresses surprise that Russian is so far down the list (in 18th place, between Danish and (?) Latin).

I'm more struck by the fact that Portuguese is in second place, and that Polish and Farsi are next, ahead of French, Spanish and German -- and with twice as many Farsi blogs as French ones! Tonnerre de Brest!

[Update: comments on the Language Hat site give details of the LID algorithm used, and explain some of the oddities (e.g. "Latin" is null blogs with "Lorem Ipsum" text, "Breton" is usually misidentified French or Spanish, etc.). It still seems likely that Farsi is well ahead of French.]

[Update 11/03/2003: Boingboing cites Hossein Derakhshan on Cafe Blog in Teheran.]

Posted by Mark Liberman at October 30, 2003 09:32 PM