April 12, 2007

Watch out for accents

This is a small correction, but an important one, so I'll make it in a separate post in addition to adding it to the end of the original post. On March 31, in "Liberté, égalité, néologie", I took a look at the French political blogosphere. I was inspired by an article in Le Monde that focused somewhat breathlessly on the blogs hosted by newspapers and magazines. "The most spectacular because the most massive and the most prestigious" was said to be the initiative of Le Novel Obs, where the most popular blogger is Michel Onfray, whose work was said (in the Le Monde article) to have averaged about 3,115 visitors a days during the month of March. This seemed to me to be surprisingly few readers, for a site that is advertised on every news kiosk in Paris -- the larger American political weblogs, like Instapundit or Daily Kos or Talking Points Memo, are in the range of 100,000-500,000 visitors a day.

But I thought maybe Le Monde's focus on Onfray was due to Old Media self-involvement, so I went out looking for the "real" French political blogs. And I found some, but not a lot, and not at the level of popularity that I would expect given the hotly contested presidential election. In particular, I searched technorati for {presidentielle} in "all blogs" in "any language" with "a lot of authority", and got only 18 results. But I assumed without checking that technorati has a property that I knew to be true of Google and most other web search engines: ignoring accents. And it turns out that I was wrong.

Jean Véronis wrote to me yesterday:

I just read your post-scriptum on your "Liberté, égalité, néologie" post (thanks for citing me, by the way).

Just a small remark about Technorati. You say:

if I go to technorati and search for {presidentielle} in "all blogs" in "any language" with "a lot of authority", I get only 18 results.

However, Technorati is (unfortunately) accent-sensitive. If you type "présidentielle", you get 1,015 results. In addition, some people tend to put an "s" at the end of "présidentielles", which returns 179 results. Altogether (if we can trust boolean queries ;-) "presidentielle OR presidentielles OR présidentielle OR présidentielles" returns 1,188 results with "a lot of authority" (and 102,607 with "any authority").

And he's right. Here's a link to the {présidentielle} search with "a lot of authority".

(By the way, technorati's idea of "authority" is the number of incoming links that it knows about, though I'm not sure what the numerical thresholds of the categories are.)

It was sloppy on my part not to have tested this.

This doesn't affect my original point, namely that the Le Monde writer was enthusing about the transformative popular importance of a political weblog with about 1/100 the readership of the large American political blogs (and less than half the readership of Language Log, for that matter). And there is still a difference in the size and density of the Anglophone and Francophone political blogospheres, even in proportion to population -- a crude indication of this is that a technorati search for {presidential} in blogs with "a lot of authority" returns 28,659 results, or about 25 times more.

Still, the French political blogosphere is clearly much broader and deeper than my careless search suggested. And Jean Véronis, in addition to his extensive investigations of search engines and other topics in computational linguistics (here is a recent example), is also right in the middle of "Les Politiques mis au Net".

[Anatol from Bremen writes:

I can't believe I'm defending French bloggers (when they have never done anything for me), but I think you're somewhat downplaying the relevance of French political blogs. The figures you mention have to be put into proportion. First, in the U.S., there are 50.5 million households with Internet access compared to only 3 million in France ; a relation of roughly 17 to 1. This means that a French blog with 3,000 readers per day corresponds to a U.S.-based blog with a bit more than 50,000 readers -- not quite the 100,000 to 500,000 you mention, but much more in the same ballpark. Second, there are between 900 million and 1.5 billion L1 and L2 speakers of English worldwide, compared with only 250 million of French. If we take a mid-range estimate of 1.2 billion, this is a relation of roughly 5/1. Assuming that many people read U.S. and French blogs even though they are neither American nor French (I, for example, read both), the visits per day have to be seen relative to the number of people who speak the respective language. Again, by itself this would mean that a French blog with 3000 readers corresponds to an English blog with roughly 15,000 readers. Clearly, the two proportions would have to be combined such that the number of English speakers with Internet access is set in proportion to the number of French speakers with Internet access. I don't know where to find this piece of information, but if anyone does, that would be nice. Anyway, relatively speaking, French blogs seem to be much more important than you make them sound. Let's take U.S. blogs as a standard and define a unit "standard blog reader" (sbr), which corresponds to one reader of an American blog seen in proportion to the number of English speakers or the number of Americans with Internet access, or both. We can the provisionally take the two figures as the lower and upper bound of a confidence interval and say that French blogs have between 15,000 and 50,000 sbr, roughly 1/10 of American blogs, not 1/100, as you suggest. Whether these blogs are anything to enthuse about in Le Monde is, of course, an entirely different matter...

Perhaps the main point here is that the proportion of people on the net in France seems to be 2 or 3 times lower than the proportion in the U.S.

[Update 4/15/2007 -- However, Rod Tye refers us to internetworldstats.com, which says that in 2006 in France, about 50.3% of the population, or 30.8 million people, had internet access, compared to 69.6% of the population, or 210.1 million people, in the U.S.. That's 6.8 to 1, not 17 to 1.]

I wonder, also, what fraction of the readership of U.S. politically-oriented blogs (e.g. Talking Points Memo) comes from outside the U.S.? I guess I'd be surprised if it were anything close to being in proportion to the number of L1+L2 English speakers worldwide.

However, Tako Schotanus testifies that in the Netherlands, at least, many people are interested in U.S. politics:

I just wanted to ask if you had thought about other possible factors that could affect the discrepancy that you have observed. One of the things that came to my mind for example is the fact that the political systems of the US and France are quite different, I have always had the feeling that the US is very much focused on the actual person that is to become president while this seems to be less the case in France (and I imagine this to be even less in those countries where the person for the highest office is appointed instead of elected). So it could be possible that the word "présidentielle" is just used less in political articles in France, couldn't it?

Also, you didn't restrict your search of the word "presidential" to the countries where English is the official language (not usre if that's even possible), so you are bound to pick up a lot more "noise" in your searches (for all those non-native english bloggers, I doubt there are many non-native french bloggers in the world). The same holds true of course when counting the number of hits for a website, there might be many foreign hits on english websites, while I would expect this to be much less so for french websites. (Trivia: the last presidential elections in the US where almost as closely followed in The Netherlands as were our own elections. Probably because people saw the influence Bush was having on foreign relations all over the world. It's unlikely that the french elections will have the same effect).

I wouldn't be surprised if there was actually a significant difference, but maybe the only way to be sure is to take some kind of sample of blog entries containing either the words "présidentielle" or "presidential" to see what they're actually about. Accounting for population differences and "noise" I think you would need to see much more than 10x as much english blog entries than french blog entries to be sure.

Ah well, no doubt you thought of all this yourself , I just wondered if you could draw too many conclusions from a couple of simple searches.

I continue to think that French political blogging remains rather different from its counterpart in America, in nature as well as in scale. As for what the differences really are, where they come from, and how they will develop in the future, I'm not sure -- stay tuned for more as both countries work through their presidential elections and the consequences.]

Posted by Mark Liberman at April 12, 2007 07:13 AM