February 13, 2007

Whatever happened to the millionth word?

The commemoration of Language Log's ten millionth page view reminds me of another decimalized milestone that was supposed to be forthcoming. Readers might recall the self-publicized claim of one Paul J.J. Payack, which was preposterous enough to earn him runner-up status in the first annual Becky Awards: Payack announced that the lexicon of the English language, as measured by the Global Language Monitor's super-sekrit algorithm, was rapidly approaching one million words!

We first got wind of this tomfoolery a little over a year ago when Payack informed a gullible New York Times reporter that "as of Jan. 26 [2006] at 10:59 a.m. Eastern time, the number of words in the English language was 986,120." When this "news" was dutifully picked up by the Times of London a week later, Payack was predicting that "the one millionth word is likely to be formed this summer." Well, summer rolled around, and we heard nothing about the millionth word. Instead, Payack pushed back the lexical schedule, telling a columnist for the Times of London in August that the million-word mark would come in late November. Then November came and went with nary a peep from the previously vociferous Mr. Payack.

So what the heck happened? Here at Language Log Plaza, the party hats I ordered a year ago are gathering dust down in Storage Room B.

Turns out the English language must have gotten a bit stalled last year. According to Payack's wondrous algorithm, the lexicon is still inching its way to the million mark, but the progress is looking increasingly asymptotic. It takes some sleuthing to chart the increments in the lexicon claimed on Payack's Global Language Monitor site, since that darned algorithm is as shrouded in mystery as the Big Mac special sauce. But with the help of the Internet Archive Wayback Machine, I was able to piece together these data points:

11/16/03: 816,167
11/28/04: 823,481
3/30/05: 856,435
5/19/05: 866,349
11/3/05: 895,479
1/16/06: 985,955
1/26/06: 986,120
3/21/06: 988,968
4/1/06: 989,614
1/31/07: 991,833

We haven't seen much movement since the addition of a whopping 90,000 words in about two months at the end of 2005 (just in time for Payack's media blitz). The latest figure of 991,833 was retrieved from the Global Language Monitor site's front page on Jan. 31, 2007, and two weeks later the number hasn't budged. Perhaps the algorithm is just building up steam, readying for another burst that will take us over the million-word barrier. Or perhaps we're witnessing the equivalent of what physicists call the quantum Zeno effect (a.k.a. "a watched pot never boils"): in a continuously observed quantum system, an unstable particle will never decay. Ever since Payack's unsupportable claims were exposed to observation in such venues as Slate, NPR's "Fresh Air," and right here on Language Log, the Payackian lexicon has grown at a snail's pace. I think the party hats are going to get mildewy soon.

Posted by Benjamin Zimmer at February 13, 2007 01:14 AM