February 23, 2005

Blogs disgoogled?

The river of information flows onward, and Language Log is no longer #1 for stupid ideas. That's fine with me. Last April, as I explained, "Google lists 1,260,000 pages in response to a query about stupid ideas, and and Geoff Nunberg's post about Samuel Huntington is the first of all of them". Now there are 3,940,000 hits for the same query, and Geoff Nunberg's post on Samuel Huntington is #16. Autres temps, autres bêtises.

But has there been a more serious and systematic downgrading of (some) blogs at Google? Mithras at Fables of the Reconstruction observed on 2/15/2005 that

In the past few hours, Google apparently updated its database and moved blogs way, way down the list of search results. First Atrios mentioned it and thought it strange, and then Sisyphus Shrugged found it had happened to her. That prompted me to google my own site, and it's true for me, too: this blog was the first search result for "Fables of the reconstruction" and the third (I recall) for "Mithras" until now. Now, it is the 53rd result for "Fables of the reconstruction" and does not appear at all in the first 200 results for "Mithras" (although, strangely, my profiles on other websites do show up in those results.)

Mithras' dire prediction: "No google hits, no blogs. It's that simple."

The consensus seems to be that this a reaction to the problem of comment spam, involving either a general downgrading of links in blog comments, or the implementation of the rel="nofollow" attribute for hyperlinks, or something along those lines. We're still #1 for Language Log, #2 for Dan Brown, and #3 for "Marriage Vowels", so we have apparently not been much affected yet. Some people have suggested that blogs on hosted sites like blogspot and typepad are the main victims. If this is true, then I imagine that the balance will be restored before long.

Paul Goyette at Locussolus has also noticed the effect,

[...] In my case, this blog is no longer the first hit for "locussolus" or "paul goyette" -- enter the latter right now and you won't find this page in the top 100 results. [...]

and looking beyond this (probably temporary) disturbance in cyberspace, has some more general thoughts:

Search engines, and in particular Google, are really becoming the gatekeepers for the world's information. [...]

What's so magical about the folks at Google is that even when they were tweaking their algorithm back in the early '90s, they foresaw the potential for these deep issues of speech and access. So instead of relying exclusively on content analysis, they built their model to incorporate the implicit views of the internet's readers and writers: they counted links, and they used their count to estimate a given site's authority on a particular search term, or in general. This was a profound and elegant achivement. Yes, it made search more accurate. But more than that, it codified the web's already democratic ethos -- tying search results to the actions of writers demystified search and gave content creators more power in the form of links. And while Google's new algorithm was somewhat prone to manipulation -- that's what's precipitated this whole crisis -- the very reason it could be manipulated was that it was transparent.

Today we have the blog, a phenomenon that's emerged largely because of the authority given to links and Google's transparency with respect to that authority. It's a phenomenon that takes that democratic ethos to the next level by removing virtually all the costs (financial, but also in terms of the required technical knowledge) associated with self-publication. Mix in Google's method of ranking search results, and you have a situation where millions of people have been moved to new acts of speech and are engaged in a worldwide discourse. If you hold freedom of expression dear, this is a monumental achievement.

Of course, there are also the spammers, who take the same democratizing elegance of Google's system and turn it on its head: by flooding my comment section (and yours) with links, they're able to increase their (or their client's) page rank, which means more traffic and presumably more sales.

For more on the same topic, read Jay Rosen's 2/20/2005 commentary on the second life of content:

"Frankly, they bring a lot of competencies to us. They're the leaders in search-engine optimization."

That's from an interview with Martin Nisenholtz, Senior Vice President for Digital Operations at the New York Times, who spoke with Staci Kramer of Paid Content about his company's recent acquisition of About.com for $410 million. In a conference call with stock analysts, Nisenholtz again mentioned search. He talked about "some very useful synergies such as cross marketing and search optimization expertise."

Why is the New York Times Company interested in acquiring this expertise with search engines that About.com is said to have? Ordinarily, I leave the analysis of deals to those who know the market, but the logic of this portion of the transaction intrigued me. They know how to show up in search; we don't. Let's buy them. Then we'll know too. "We own you now. Tell us what you know."

Jay argues that the secret has been right under their nose all along:

You rarely find New York Times articles in the top ten results of any Google search. The reason is simple: Search works by counting the quantity and quality of links to a page. In most cases, links to the New York Times expire after a week, the url's (web addresses) change, and the content moves behind a pay wall. Bye-bye Google. [...]

The second life of content, made possible by search, is of critical importance to journalists whose work is on the Web. (That's almost all journalists.) The very phrase "on" the Web tells us that things may land on the surface of the network and not get woven into it. These stand a very poor chance of surviving and having a second life, where there are probably more readers available than in the first.

Because NYT content is "on" the web but not "in" the web,

their work is lost to Google, lost to online forums and conversation, lost to the long tail where value is built up-- and in many ways lost to cultural memory.

This is also the reason why Open Access to the scholarly and scientific literature is, in the end, an irresistible idea.


Posted by Mark Liberman at February 23, 2005 06:58 AM