Language Log: Words and other lexical entries

December 25, 2003

Words and other lexical entries

On the question of the number of new English words per year, Language Hat writes:

Liberman rightly (in my opinion) discounts the trademarks, but I think he's too quick to dismiss the scientific terms. As rebarbative as "GDP-L-fucose synthase" may be, I don't see any principled way to distinguish it from the long line of terms that have preceded it, from atmosphere through phlogiston and quark. The OED has from the beginning tried to include scientific terminology, and although it's probably impossible by now to keep up with the details of every specialty, if they're used in the normal course of events by the specialists concerned, they're bona fide English words and deserve to be counted. Whether it's possible to do an accurate count, of course, is another matter altogether.

There's some truth in this, but for the sake of clarity, let me argue the other side for a while.

First, I don't entirely discount the trademarks, any more than dictionary-makers do. The OED's most recent update includes Bluetooth, Nomex, Norplant, Noryl and Swiss Army knife, among other trademarked words, and they were quite right to include these. Margaret Marks lists a small sample from the International Trademark Associaton's list, and many of her examples are plausible candidates for inclusion, if they're not already there (as Grand Marnier and Grape-Nuts are).

It's just that most of the 100,000 new trademarks registered in the U.S. every year (and I assume in other places as well) are simply names (of businesses, products, etc.) that someone happens to have registered according to a certain legal procedure. This legal registration doesn't privilege them lexicographically over the tens or hundreds of millions of new names created in the Anglosphere every year that aren't trademarked (like Perl, which also made the OED's most recent update, and has not been registered as a trademark). All names are lexical entries, in the sense that they are morphophonological patterns with a conventional (if sometimes very local) meaning, which is not predictable from the meaning of their parts (if any). My brother's childhood imaginary friend was named "Clocktho" (rhymes with "block know"); our current cat is named "Tickle"; I'm co-director of an outfit whose acronym is "IRCS" (often pronounced "irks"); I often eat at the "Class of 1920 commons" (often abbreviated as "1920 commons" or just "1920"). These are all part of my mental lexicon, and I share each of them with some other people as well; but none of them are in any general dictionaries of the English language, nor should they be. The OED's most recent update includes Nipmuc, referring to "several Algonquian-speaking North American Indian peoples formerly inhabiting parts of central Massachusetts and adjacent Connecticut and Rhode Island," who gave their name to several landmarks of my childhood such as the Nipmuc Trail. The difference between Nipmuc (which was long overdue to be included) and Clocktho (which never will be) is not narrowly linguistic but rather historical, sociological, and quantitative.

Second, there is a difference worth noting between scientific terms like quark and those like trimethylamine-N-oxide reductase. The latter is a kind of a phrase, composed according to a certain grammar or at least pattern, which lends itself to the construction of a very large number of additional strings that are not necessarily part of the scientific lexicon. In principle we could have dimethylamine or monobutylamine at the start, etc. The choice among instantiations of these linguistic patterns is then a matter of what chemical configurations are possible and which of them biology uses. Scientists need standard databases for what is known about these facts of chemistry and biology, and also for the associated linguistic choices, such as the acronyms, abbreviations and other nicknames for the chosen entities. The Enzyme Commission provides such a standard. But only a few of the names that it catalogues -- whether the full phrasal names or the nicknames -- belong in a dictionary.

This is not specific to scientific vocabulary -- in fact, it's a lot like the problem of street addresses. Ware College House, where I live, is now officially at 3650 Spruce Street. Three years ago, it was officially at 3700 Spruce Street; and then for a couple of years, it was officially 3615 Hamilton Walk. By "officially" I mean that the address was registered in those changing ways with the U.S. Postal Service ( though the buildings have been in the same place since 1902). There are many similar strings -- e.g. "3615 Spruce Street" or "3650 Hamilton Walk" that are not valid addresses at all. These facts -- that 3615 Spruce Street isn't a valid address in Philadelphia, but 3650 Spruce Street is, and furthermore that as of 2003 it is the address of Ware College House -- are not facts about the English language, exactly. They're facts about (the U.S. Postal Service's official view of) the way we've decided to use the English language to talk about Philadelphia. You can look such facts up in an appropriate reference, but (except perhaps for a few like 221B Baker St.) the appropriate reference is not a dictionary.

Streets and buildings exist independently of how we choose to address them, but the question of which streets in which cities have which numbering schemes, and which institutions and buildings are officially designated with which street addresses, is to a large extent a question about linguistic convention (I understand that street numbers in some Japanese cities are assigned in the order of building construction!). However, the kind of linguistic convention involved is not one that we usually regard as being part of the responsibility of dictionary makers. The same thing can be said about the question of how to form complex chemical names, how to abbreviate these names or otherwise form shorter and more convenient versions, etc. It's a good thing that we have efforts like the Enzyme Commission to keep track of specific areas of scientific terminology, just as it's a good thing that the U.S. Postal Service keeps track of U.S. street addresses. Both are lexicographical enterprises, in some sense; but ...

My only real conclusion here is that the terms "new", "English" and "word" are too vague in ordinary use for the question "how many new English words are there each year" to have a well-defined answer. And in fact we've only scratched the surface of the kinds of vagueness that would have to be remedied in order to give a meaningful answer :-)...

Posted by Mark Liberman at December 25, 2003 03:18 PM