January 05, 2007

The specialness of English

Speakers of English writing about their language are likely to trumpet the specialness of the language, in particular its enormous vocabulary.  We've returned repeatedly to the Vocabulary Size trope, most recently in a posting by Geoff Pullum:

Despite the fact that we have virtually no idea of how to measure vocabulary size rigorously and fairly (which is one thing differentiating vocabulary size from penis length), nobody cares: people are prepared (it would seem) to accept imaginary facts about how many words are known by groups of people about whom they know nothing (or about themselves, as with the Payack claims concerning English) as a reliable assay of intelligence level, or even the sophistication level of a whole language or culture, and to accept any kind raving nonsense anyone comes up with by way of vocabulary counting.

English is said to have a humongous vocabulary, as a result of several factors: the combination of Germanic and Romance sources; within the latter, layers of earlier borrowings and later ones, based more directly on Latin (and Greek); and the willingness of English speakers to take in loans from a great variety of languages.  All this is commonplace, though annoying.  Now it's taken to the next level, in Sol Steinmetz and Barbara Ann Kipfer's The Life of Language (2006), on English words.  After a discussion of doublets like legal/loyal, regal/royal, and tradition/treason, Steinmetz and Kipfer conclude:

This is partly why English is the only language that has books of synonyms like Roget's Thesaurus.

Whoa!  English must be REALLY special, with so many words that it needs a special resource to catalogue them.

Background comment: the specialness of English stands along with claims about the specialness of other languages, Japanese and French most famously, but also a number of others; I have had speakers of Persian go on at length about the marvels of their language, including the ease with which it can be learned and its special suitability for poetry.

Another background comment: Roget's is not just a synonym dictionary (though it can be used as one), it's a thesaurus, a conceptual taxonomy (of the furniture of the world and the organization of thought).  It is one of the monuments of such large-scale taxonomies: Bishop John Wilkins's Real Character in the 17th century, the French Encyclopedists in the 18th, Roget in the 19th, and Carl Darling Buck's Dictionary of Selected Synonyms in the 20th.  [Addendum: to which we can now add the developing WordNet, a combination of dictionary and thesaurus organized on a number of dimensions.]  All are organized conceptually, and the last two are designed to supply lists of words for each of the conceptual categories (for English alone in Roget's case, for the Indo-European languages as a group in Buck's case).

Now, on the the central claim: that English is the only language with a resource like Roget's Thesaurus.  You would have thought that someone making such a broad-brush claim would have at least tried to check it out.  It takes only a few moments to find thesauruses and synonym dictionaries for a variety of languages; here, for example, is a review of four such books for Japanese, all in Japanese only.

Meanwhile, for Chinese, Dan Jurafsky points to two popular modern Chinese thesauruses, organized as semantic taxonomies with synonym lists:

Mei Jia-Ju,, Zhu Yi-Ming, Gao Yun-Qi and Yin Hong-Xiang.  1986.  TongYi Ci CiLin [Synonym Dictionary/Thesaurus]. Hong Kong.  Commercial Press.

HOWNET AND THE COMPUTATION OF MEANING (With CD-Rom).  By Zhendong Dong & Qiang Dong (Chinese Academy of Sciences, China) ISBN 981-256-491-8.  (link)   

and Mark Liberman notes that thesaurus-making has a very long history in Chinese, going back to the Erya (links here and here), said to date from the 3rd century B.C. and organized mostly in terms of a semantic taxonomy.

No doubt there are many more examples to be found, especially given the many modern languages with several strata of vocabulary from different sources (Swahili, for example).  Even languages without such obvious stratification in their vocabulary have synonym dictionaries and/or thesauruses; there's a Duden synonyms dictionary for German, for instance.

My purpose here is not to start an inventory of thesauruses and synonym dictionaries -- please don't bombard me with further examples -- but just to show that it takes almost no work to discover that there are languages other than English with such resources, in some cases significantly antedating Roget's project.  Unfortunately, the idea of English as a special case was so powerfully attractive for Steinmetz and Kipfer that they didn't even make an effort.

Semantic taxonomies are very old indeed, usually organized in a kind of outline fashion, though in prose: things are animal, vegetable, or mineral; of the animals, there are the animals of the sky, the animals of the water, and the animals of the land; of the animals of the land, there are those that go on two feet, those that go on four, those that go on six, those that go on eight, and those that creep upon the land; etc.

It's a very natural idea to attach lists of words to the categories at the bottom level.  So it's not really a surprise that people were doing this a couple of millennia ago.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at January 5, 2007 12:47 PM