December 30, 2003

Neat stuff at NITLE

There's all kinds of interesting information at the NITLE Blog Census. The (top of the) surprising rank ordering of languages has not changed since it was discussed here back in October: English, Portuguese, Polish, Farsi, French. And onwards: Spanish, German, Italian, Chinese-big5, Catalan, Dutch, Icelandic, Indonesian. I'm still somewhat skeptical about what the TextCat language classification algorithm is doing: it's hard to believe that Russian is nowhere in the top 25, but Breton is... And that there are really almost 5 times more Icelandic bloggers than Japanese... However, the top of the list seems likely to be right.

See blogcount for more blogospheric information. Blogdex just now has's "Top Ten Words of 2003" as the sixth "most contagious information currently spreading in the weblog community".

Posted by Mark Liberman at December 30, 2003 01:13 PM