Language Log: The Emperor's Clothes

April 18, 2006

The Emperor's Clothes

A few days ago in suggesting reliable sources for information on linguistic classification I described Merritt Ruhlen's approach as "unreliable at higher-levels due to its reliance on an unsound and subjective approach to linguistic classification". Those who have some familiarity with the debate over what sort of evidence is necessary to establish linguistic relationships no doubt assumed that I was referring to the fact that Ruhlen is an advocate of an approach dubbed "mass lexical comparison" that is known to be unsound. That is indeed part of what I meant, but it isn't the whole story.

There are two aspects to classifying languages. The one that attracts most popular interest is showing that languages are related in the first place. The second, harder, task is deciding exactly how they are related, that is, working out the family tree. This is called "subgrouping".

There are a number of ways of doing this, and on-going debates as to which way is best. The classical approach requires reconstructing the proto-language, working out the sequence of changes by which each of the daughter languages is derived from the proto-language, and structuring the tree in such a way as to minimize the number of changes that must have happened independently. In other words, this approach groups languages, and intermediate proto-languages, on the basis of shared innovations. It is essentially the same approach as that of biological classification. Other approaches, which fall generally under the heading of "lexicostatistics", are based entirely on lexical replacement. What all of these approaches have in common is that they provide an objective basis for claiming that language A is more closely related to language B than to language C. These techniques also have in common the fact that they all depend on establishing phonological correspondences among the languages.

"Mass lexical comparison" consists of setting down lists of words that one considers to resemble each other in sound and meaning and declaring that such similarities could not be due to chance and so must reflect common descent. That's it. No phonological correspondences are established. No reconstruction is done. That's why I prefer to call it "superficial lexical juxtaposition" (SLJ). Even if you believe that this method works (which it doesn't), what basis does it provide for taking the second step and working out the family tree? You can't use any of the usual techniques, even lexicostatistics, since they all rely on phonological correspondences, so there must be some other technique that proponents of SLJ use for subgrouping.

You'd think so, but you'd be wrong. Nowhere in the writings of people like Merritt Ruhlen or Joseph Greenberg will you find an exposition of an objective technique for subgrouping. The closest thing you'll find are tables listing words from several languages, where a few of the languages are closely related and can be seen to be obviously different from the others. In such easy cases the subgrouping may be obvious on inspection, but much of the time it isn't so obvious. The fact of the matter is, SLJ-ers have no technique for subgrouping.

The absence of a technique for subgrouping explains why SLJ-ers don't present evidence for their subgroupings. In a book like Ruhlen's Guide to the World's Languages you don't expect to see the evidence, but if you look at primary works by SLJ-ers you'll find that there isn't any evidence there either. To take a really prominent example, consider Joesph Greenberg's Language in the Americas (LIA), which purports to demonstrate that all of the native languages of the Americas other than those belonging to the Eskimo-Aleut and Na-Dene families belong to a single language family dubbed "Amerind". LIA also claims that Amerind consists of 11 subgroups and gives a detailed family tree.

The controversy about LIA has been devoted almost entirely to whether Greenberg is justified in claiming that almost all the languages of the Americas are related, but at least he gives an argument of sorts for that, albeit one that is badly flawed. What is the evidence for the 11 subgroups of Amerind, and more generally for his family tree? There isn't any. Really.

NO EVIDENCE WHATEVER IS PRESENTED.

None. Nada. Zilch.

Nowhere in LIA, nor anywhere else in Greenberg's oeuvre, will you find any evidence or argument for his subgrouping of Amerind. For each of the 11 putative subgroups Greenberg gives a list of "etymologies" similar to the one he gives for Amerind as a whole, but at best what that shows is that the languages included in each of his subgroups are related to each other. It provides no evidence whatever that the languages in one of his subgroups are more closely related to each other than to languages in other subgroups.

The upshot is that even if you believe SLJ-ers' claims about remote linguistic relationships, you shouldn't believe their claims about subgrouping because they have no technique for carrying out subgrouping and offer no evidence in favour of the subgroupings that they present. It's all a scam.

That's what makes it so ironic that Luca Cavalli-Sforza claims that mainstream historical linguists are unable to classify languages because they don't recognize that there are degrees of relationship. As I pointed out previously, his claim is entirely unfounded - he actually admits that he has no evidence for it and is just repeating what people like Ruhlen tell him. In fact, it is the SLJ-ers who have no method for determining degree of relationship and hence are incapable of classifying languages. Cavalli-Sforza has got it backward.

This doesn't mean that Ruhlen's Guide is useless. To a large extent, if you ignore the claims about remote relationships you can still use it since it is largely a compendium of work that other people have done. The difficulty is that where he has to choose among competing classifications you can't have much confidence in his choice because you have no idea on what basis he would choose. Since he has no method for classifying languages, on what basis will he evaluate the proposals of others? That means that you have to rely on your own knowledge of the reliability of the people he cites or look up the references yourself.

Posted by Bill Poser at April 18, 2006 01:44 AM