December 23, 2003

Twenty thousand new words a year

I don't know whether the book by Don Watson that Mark Liberman recently mentioned contains anything at all to justify its hysterical claims that the English language "is being mangled by the globalising forces of obfuscation." But if it does, it is puzzling that nothing that could begin to justify such claims is quoted or mentioned in the article about it in Melbourne's newspaper The Age. We get a couple of noun phrases with hyphenated compound prenominal attributive modifiers like outcome-related, real-world, and whole-of-organisation, and that's just about it. The rest is all frothing and flaming about the noble English language being done to death, desecrated, doomed. The article makes it look like a more ridiculous and extreme demise-of-the-language polemic than any I've ever seen.

Only one thing caused a flicker of interest for me: an actual figure is given for the likely number of new words added to English in the course of a year. The figure cited is 20,000. I'm wondering what the source was.

Don Watson's book (which I have not seen) may not tell us. The article about him says he has no wish to keep the language static: "The genius of English is the way it updates itself every day, with 20,000 new words a year, Watson read somewhere." He read it somewhere? Thanks a lot, Don; that narrows it down a bit.

Watson, of course, like just about every non-linguist who ever writes about language, presupposes that a language is just a big bag of words. Barbara Scholz and I have attacked that idea (in Nature 413, 27 September 2001, p.367), but it's not that we think anyone will listen or anything will change. Everybody thinks that the key thing about a language is which words it has -- and above all, how many. Now, Scholz and I think that the answer is that it's inherently and profoundly indeterminate, for a very deep reason: we think natural languages do not have closed lexicons at all. (This is an idea due to Paul Postal; there is a discussion of it in Chapter 14 of Arc Pair Grammar by Paul Postal and David Johnson, Princeton University Press, 1980.) Natural languages are much better thought of as systems of conditions on the structure of expressions (words, phrases, sentences). Some of the well-established conditions apply to word-sized units (it really is well established that dog denotes Canis domesticus, and that the is the only acceptable form for the definite article), but the constraints do not entail a roof on the number of words or prescribe which ones are genuinely in the language.

This makes neologisms (brand-new coinages of words) important: while closed-lexicon models of language would suggest that sentences containing new words are not part of the language and cannot possibly be understood, so the introduction of new words should be a rare and tricky business, Scholz and I (like Postal) are saying that there is absolutely nothing linguistically wrong with sentences containing novel words, and that sort of suggests they would occur often, perhaps every day, all the time. And that seems right: if you really take note of everything linguistic that happens to you today, the chances that you will not come across a word you hadn't ever seen or heard before are very low, and you may even encounter a word that no one had ever used before (though that's harder to check).

But even Scholz and I did not think the evidence of lexical openness would be as bountiful as 20,000 words a year. That's really a lot. It's 55 a day. That means two or three new words becoming established every hour, day and night. It could be true. We'd sort of like to know whether it is, and if so, what definition of word is being used. (To make the question interesting, you need to make sure you don't count words in a silly way. For example, since we talk about RS232 ports and Intel 80486 chips and the year 2004 and the Boeing 767 and so on, you could count all digit strings as words, which immediately tells you there must be a countable infinity of them. But that can't be what we mean if we're talking about adding 20,000 new words each year.)

Just about all we know right now is that Don Watson read it somewhere. Give us a source, Don. I mean something checkable. The closest I've got is that I've seen the 20K words claim attributed to the New York Times in a Powerpoint presentation from the University of Kentucky's journalism school that I found on the web, but I'm looking for something more specific than just the name of a newspaper. Because of course the claim could be just another urban legend, like the 5 exabyte mistake about word tokens uttered in human history, much repeated but known to be completely false.

Posted by Geoffrey K. Pullum at December 23, 2003 02:20 PM