April 19, 2005

Could language be more popular than porn?

I intend this question in a rather limited sense, as I'll explain below.

By now you must know that if you go to amaztype™, you can see the word of your choice spelled out in letters made up of thumbnails of the publications whose titles contain it. (You can also ask to collect the works by authors rather than titles, or use thumbnails from the covers of music CDs or video/DVDs rather than books.) But now, amaztype™ zeitgeist lists for you the most popular requests.

The current lists (valid as of Apr. 19, 2005, 8:10:01 GMT) have some surprises. For example, the TITLE in ALL MEDIA category is

1 sex 2529 hits
2 fuck 902 hits
3 harry potter 541 hits
4 porn 496 hits
5 flash 474 hits
6 boobs 382 hits
7 love 348 hits
8 php 303 hits
9 cat 270 hits
10 superman 172 hits

Looking at the frequency first, we see that this is one of the few phenomena in the natural or social world that doesn't show a power law distribution, as indicated in the plot on the right. Alert Per Bak! (Note: this is a feeble joke -- Per Bak is dead, and doesn't seem to have been very interested in contrary evidence while he was alive. So please don't send me lists of other examples, unless they're really interesting ones.)

The top-ten words themselves divide naturally into six groups: (1) sex, fuck, porn, boobs; (2) Harry Potter; (3) flash, php; (4) love; (5) cat; (6) Superman. The categories themselves are not surprising, but the choices within the groupings are not always what I would have guessed.

In category (1), where are all the bodily fluids, waste products and rude noises? Not many fans of Dave Barry here, apparently. There are some other features of this category that we'll pass over in silence.

In category (2), is Harry Potter really the only actual book title that users care enough about to spell out? (Dan Brown doesn't make the authors' list either...)

In category (3), what happened to python, perl, java, C++? Are the partisans of lisp too old to register with the zeitgeist anymore? What about OCaml? Is there no pocket of 300 rebel forthians, or hypercardites, still holding out on some far planet of the empire? I won't even ask about C#.

I'm happy to leave category (4) alone, and I guess that category (5) doesn't surprise me either -- dog would be next, but far behind these days, and hamster, ferret etc. are just not in the same class. But who would have guessed that Superman would make the cut, when Spiderman, Wonder Woman and even God are missing?

However, the most striking thing about this list is how small the current counts are. Look, people, more than 3,000 of you read this weblog every day*! If ten percent of you went to amaztype™ and asked for "language", it would rank higher in the amaztype™ zeitgeist than PHP does! If all of you did it, language would outrank sex...

That would probably be inappropriate. But somewhere north of Harry Potter would be nice. So get busy! Tell your friends! Zeit early and zeit often!

[Update: as of 4/21/2005 17:30 GMT, "language" is in third place, with 515 hits, behind only "sex" (2596 hits) and "fuck" (1078 hits), and ahead of "flash", "harry potter", "porn", "love", "boobs", "cat" and "php".]

*At least, sitemeter registers more than 3,000 visitors on an average day. As I understand it, they count visitors in terms of distinct IP addresses within certain time windows. This is an imperfect measure, since some ways of accessing the internet may channel many users through the same apparent IP address, while in other cases, a single user may show up from different IP addresses at different times.

Posted by Mark Liberman at April 19, 2005 05:11 AM