April 29, 2007

Arabs and camel words: go ahead, just make stuff up

Saudi tribe holds camel beauty pageant’ says the headline over a Reuters news story by Andrew Hammond (filed Friday, April 27, 9:23 AM ET). It begins thus:

GUWEI'IYYA, Saudi Arabia (Reuters) - The legs are long, the eyes are big, the bodies curvaceous.

Contestants in this Saudi-style beauty pageant have all the features you might expect anywhere else in the world, but with one crucial difference — the competitors are camels.

And so attuned am I to the ways of the journalistic world and its snowclones that when Marilyn Martin sent me this story I found that I could actually predict the drift of what would come up in the following paragraph before I even looked. Sure enough — I never doubted it for one moment that it would be there (though I would not have been able to guess the number):

The camels are divided into four categories according to breed -- the black majaheem, white maghateer, dark brown shi'l and the sufur, which are beige with black shoulders. Arabic famously has over 40 terms for different types of camel.

Of course it does, of course it does. And I for my part have over 57 different words for lazy journalists who repeat snowclones about vocabulary size in languages they know absolutely nothing about and cite warrantless lexeme-count figures taken from sources they cannot name or even vaguely recall.

What gets up my nose is not so much that lexical traveler's tales of this sort are so often false. Some seem to be sort of true. At least for Somali (not too far away from where Arabs live) Mark Liberman actually listed 46 genuine camel words, earning only sarcasm from me, but winning the enormous gratitude of the blogosphere, which has joyously repeated the figure many times.

And it's not even that these unsubstantiated myths about lexical counts mostly float around without backing — unsourced and undefended because journalists know that no one (except me on Language Log) will call them on claims about languages, regardless of how ridiculous the claims are.

No, what gets me most about these lexeme-count claims is that they are presented as if they were profound and significant and clearly supportive of exoticizing claims about far-away nomadic peoples like Arabs and Eskimos, when in fact even if they were true they would be utterly unsurprising.

Think how many names for breeds of dogs you could list. Why? Because we (in the West) have been domesticating and breeding types of dog for thousands of years and they mean something to us. Think how many names of paint colors you've seen on paint shop color charts. Think how many makes and models of cars you could name. It is totally boring and obvious that one will have a variety of specialized terms for things that one's culture has taken a long-term interest in.

The difference is, your knowledge of 40 different words for automobile models is not passed around as a gem of wisdom about the English language and worldview. For the Arabs and their camel terms or the Eskimos and their snow words, things are very different: the lexical count becomes a putative nugget of insight into their mysterious nature as a people.

And with numbers made up entirely at random, that's the other thing that drives me up the wall. Among those appearing on the web before such phrases as "words for camel in Arabic" (as you can easily verify) are: 9; 20; 40; 160 (this one is quite common); 400; 1,000; 3,000; 5,000; "a gajillion"; and of course various different quantifiers like "several", "numerous", "many", and "a whole bunch".

One of the most strangely specific is by P. L. Heath in Philosophical Quarterly, 1955 (in a review of Ernst Cassirer's The Philosophy of Symbolic Forms, Volume 1, JSTOR link here). Heath says (and he may be paraphrasing Cassirer): "Arabic, for instance, has 5744 words for different kinds of camel and none for camels in general."

Of course it does, of course it does. Exactly five thousand seven hundred and forty-four. Or perhaps nine; or forty; or four hundred; or a thousand; whatever. Don't stop to figure out a defensible number, just babble on about it as if the random number you picked was important and well backed up by linguistic research.

Maybe if you write some kinds of stuff for Reuters they may want to do fact-checking; maybe some of what you write for Philosophical Quarterly will be subjected to refereeing; but not if it's about size of subsets of the nouns in a randomly chosen language spoken in an area of the world where they still have "tribes". On that topic you will never be queried; so go ahead, just make stuff up.

[Update: Lane Greene has pointed out to me that although one can imagine someone being unable to evaluate a claim he read somewhere about 5,744 words for camel in a language he could not read, it is easy to answer the question of whether there is a single general word for camel. Just pick up an etymological dictionary. German Kamel and English camel come from Arabic jamal and Hebrew gamal via Greek kamelos, meaning of course "camel". To encounter the 5,744 figure and swallow it may be regarded as a misfortune; to overlook the existence of jamal in Arabic looks like carelessness.

It could perhaps be argued that jamal only refers to male camels, so it isn't fully general. But in that case, what about ’ibil, which is general as regards sex (though it can only be used in the plural). For a more informed discussion of the story about camel words in Arabic, see Lameen Souag's excellent blog post on the topic, which makes the point that we shouldn't equate "Arab" with "highly expert Arab camel breeder who knows all the relevant technical vocabulary associated with that trade". That's what I feel people are so often doing when they tell these many-words-for-X tales.]

Posted by Geoffrey K. Pullum at April 29, 2007 12:36 AM